Package org.apache.nutch.indexer.anchor
Class AnchorIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- All Implemented Interfaces:
Configurable
,IndexingFilter
,Pluggable
public class AnchorIndexingFilter extends Object implements IndexingFilter
Indexing filter that offers an option to either index all inbound anchor text for a document or deduplicate anchors. Deduplication does have it's con's, SeeanchorIndexingFilter.deduplicate
in nutch-default.xml.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description AnchorIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocument
filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheAnchorIndexingFilter
filter object which supports boolean configuration settings for the deduplication of anchors.Configuration
getConf()
Get theConfiguration
objectvoid
setConf(Configuration conf)
Set theConfiguration
object
-
-
-
Method Detail
-
setConf
public void setConf(Configuration conf)
Set theConfiguration
object- Specified by:
setConf
in interfaceConfigurable
-
getConf
public Configuration getConf()
Get theConfiguration
object- Specified by:
getConf
in interfaceConfigurable
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheAnchorIndexingFilter
filter object which supports boolean configuration settings for the deduplication of anchors. SeeanchorIndexingFilter.deduplicate
in nutch-default.xml.- Specified by:
filter
in interfaceIndexingFilter
- Parameters:
doc
- TheNutchDocument
objectparse
- The relevantParse
object passing through the filterurl
- URL to be filtered for anchor textdatum
- TheCrawlDatum
entryinlinks
- TheInlinks
containing anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException
- if an error occurs during during filtering
-
-