Package org.apache.nutch.any23
Class Any23IndexingFilter
- java.lang.Object
-
- org.apache.nutch.any23.Any23IndexingFilter
-
- All Implemented Interfaces:
Configurable
,IndexingFilter
,Pluggable
public class Any23IndexingFilter extends Object implements IndexingFilter
This implementation of
IndexingFilter
adds a triple(s) field to theNutchDocument
.Triples are extracted via Apache Any23.
- See Also:
Any23ParseFilter
-
-
Field Summary
Fields Modifier and Type Field Description static String
STRUCTURED_DATA
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description Any23IndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocument
filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse.Configuration
getConf()
Get theConfiguration
objectvoid
setConf(Configuration conf)
Set theConfiguration
object
-
-
-
Field Detail
-
STRUCTURED_DATA
public static final String STRUCTURED_DATA
- See Also:
- Constant Field Values
-
-
Method Detail
-
getConf
public Configuration getConf()
Get theConfiguration
object- Specified by:
getConf
in interfaceConfigurable
- See Also:
Configurable.getConf()
-
setConf
public void setConf(Configuration conf)
Set theConfiguration
object- Specified by:
setConf
in interfaceConfigurable
- See Also:
Configurable.setConf(org.apache.hadoop.conf.Configuration)
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Description copied from interface:IndexingFilter
Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.- Specified by:
filter
in interfaceIndexingFilter
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the page (fetch datum from segment containing fetch status and fetch time)inlinks
- page inlinks- Returns:
- filtered NutchDocument
- Throws:
IndexingException
- if there is a fatl error whilst indexing- See Also:
IndexingFilter.filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks)
-
-