Package org.apache.nutch.indexer
Class IndexingFiltersChecker
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.util.AbstractChecker
-
- org.apache.nutch.indexer.IndexingFiltersChecker
-
- All Implemented Interfaces:
Configurable
,Tool
public class IndexingFiltersChecker extends AbstractChecker
Reads and parses a URL and run the indexers on it. Displays the fields obtained and the first 100 characters of their value Tested with e.g.echo "http://www.lemonde.fr" | $NUTCH_HOME/bin/nutch indexchecker -stdin
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
checkRobotsTxt
protected boolean
doIndex
protected boolean
dumpText
protected boolean
followRedirects
protected HashMap<String,String>
metadata
protected URLNormalizers
normalizers
-
Fields inherited from class org.apache.nutch.util.AbstractChecker
keepClientCnxOpen, stdin, tcpPort, usage
-
-
Constructor Summary
Constructors Constructor Description IndexingFiltersChecker()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
main(String[] args)
protected int
process(String url, StringBuilder output)
int
run(String[] args)
-
Methods inherited from class org.apache.nutch.util.AbstractChecker
getProtocolOutput, parseArgs, processSingle, processStdin, processTCP, run
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
normalizers
protected URLNormalizers normalizers
-
dumpText
protected boolean dumpText
-
followRedirects
protected boolean followRedirects
-
checkRobotsTxt
protected boolean checkRobotsTxt
-
doIndex
protected boolean doIndex
-
-
Method Detail
-
process
protected int process(String url, StringBuilder output) throws Exception
- Specified by:
process
in classAbstractChecker
- Throws:
Exception
-
-