Class AutomatonURLFilter
- java.lang.Object
-
- org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- All Implemented Interfaces:
Configurable
,URLFilter
,Pluggable
public class AutomatonURLFilter extends RegexURLFilterBase
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.- Author:
- Jérôme Charron
- See Also:
- dk.brics.automaton
-
-
Field Summary
Fields Modifier and Type Field Description static String
URLFILTER_AUTOMATON_FILE
static String
URLFILTER_AUTOMATON_RULES
-
Fields inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
hasHostDomainRules
-
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description AutomatonURLFilter()
AutomatonURLFilter(String filename)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected RegexRule
createRule(boolean sign, String regex)
Creates a newRegexRule
.protected RegexRule
createRule(boolean sign, String regex, String hostOrDomain)
Creates a newRegexRule
.protected Reader
getRulesReader(Configuration conf)
Rules specified as a config property will override rules specified as a config file.static void
main(String[] args)
-
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
-
-
-
-
Field Detail
-
URLFILTER_AUTOMATON_FILE
public static final String URLFILTER_AUTOMATON_FILE
- See Also:
- Constant Field Values
-
URLFILTER_AUTOMATON_RULES
public static final String URLFILTER_AUTOMATON_RULES
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
AutomatonURLFilter
public AutomatonURLFilter()
-
AutomatonURLFilter
public AutomatonURLFilter(String filename) throws IOException, PatternSyntaxException
- Throws:
IOException
PatternSyntaxException
-
-
Method Detail
-
getRulesReader
protected Reader getRulesReader(Configuration conf) throws IOException
Rules specified as a config property will override rules specified as a config file.- Specified by:
getRulesReader
in classRegexURLFilterBase
- Parameters:
conf
- is the current configuration.- Returns:
- the name of the resource containing the rules to use.
- Throws:
IOException
- if there is a fatal error obtaining theReader
-
createRule
protected RegexRule createRule(boolean sign, String regex)
Description copied from class:RegexURLFilterBase
Creates a newRegexRule
.- Specified by:
createRule
in classRegexURLFilterBase
- Parameters:
sign
- of the regular expression. Atrue
value means that any URL matching this rule must be included, whereas afalse
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.- Returns:
RegexRule
-
createRule
protected RegexRule createRule(boolean sign, String regex, String hostOrDomain)
Description copied from class:RegexURLFilterBase
Creates a newRegexRule
.- Specified by:
createRule
in classRegexURLFilterBase
- Parameters:
sign
- of the regular expression. Atrue
value means that any URL matching this rule must be included, whereas afalse
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.hostOrDomain
- the host or domain to which this regex belongs- Returns:
RegexRule
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-