Class SegmentMergeFilters


  • public class SegmentMergeFilters
    extends Object
    This class wraps all SegmentMergeFilter extensions in a single object so it is easier to operate on them. If any of extensions returns false this one will return false as well.
    • Constructor Detail

      • SegmentMergeFilters

        public SegmentMergeFilters​(Configuration conf)
    • Method Detail

      • filter

        public boolean filter​(Text key,
                              CrawlDatum generateData,
                              CrawlDatum fetchData,
                              CrawlDatum sigData,
                              Content content,
                              ParseData parseData,
                              ParseText parseText,
                              Collection<CrawlDatum> linked)
        Iterates over all SegmentMergeFilter extensions and if any of them returns false, it will return false as well.
        Parameters:
        key - the segment record key
        generateData - directory and data produced by the generation phase
        fetchData - directory and data produced by the fetch phase
        sigData - directory and data produced by the parse phase
        content - directory and data produced by the parse phase
        parseData - directory and data produced by the parse phase
        parseText - directory and data produced by the parse phase
        linked - all LINKED values from the latest segment
        Returns:
        true values for this key (URL) should be merged into the new segment.