Package org.apache.nutch.segment
Class SegmentMergeFilters
- java.lang.Object
-
- org.apache.nutch.segment.SegmentMergeFilters
-
public class SegmentMergeFilters extends Object
This class wraps allSegmentMergeFilter
extensions in a single object so it is easier to operate on them. If any of extensions returnsfalse
this one will returnfalse
as well.
-
-
Constructor Summary
Constructors Constructor Description SegmentMergeFilters(Configuration conf)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
Iterates over allSegmentMergeFilter
extensions and if any of them returns false, it will return false as well.
-
-
-
Constructor Detail
-
SegmentMergeFilters
public SegmentMergeFilters(Configuration conf)
-
-
Method Detail
-
filter
public boolean filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
Iterates over allSegmentMergeFilter
extensions and if any of them returns false, it will return false as well.- Parameters:
key
- the segment record keygenerateData
- directory and data produced by the generation phasefetchData
- directory and data produced by the fetch phasesigData
- directory and data produced by the parse phasecontent
- directory and data produced by the parse phaseparseData
- directory and data produced by the parse phaseparseText
- directory and data produced by the parse phaselinked
- all LINKED values from the latest segment- Returns:
true
values for thiskey
(URL) should be merged into the new segment.
-
-