Class TLDScoringFilter

    • Constructor Detail

      • TLDScoringFilter

        public TLDScoringFilter()
    • Method Detail

      • indexerScore

        public float indexerScore​(Text url,
                                  NutchDocument doc,
                                  CrawlDatum dbDatum,
                                  CrawlDatum fetchDatum,
                                  Parse parse,
                                  Inlinks inlinks,
                                  float initScore)
                           throws ScoringFilterException
        Description copied from interface: ScoringFilter
        This method calculates a indexed document score/boost.
        Specified by:
        indexerScore in interface ScoringFilter
        Overrides:
        indexerScore in class AbstractScoringFilter
        Parameters:
        url - url of the page
        doc - indexed document. NOTE: this already contains all information collected by indexing filters. Implementations may modify this instance, in order to store/remove some information.
        dbDatum - current page from CrawlDb. NOTE:
        • changes made to this instance are not persisted
        • may be null if indexing is done without CrawlDb or if the segment is generated not from the CrawlDb (via FreeGenerator).
        fetchDatum - datum from FetcherOutput (containing among others the fetching status)
        parse - parsing result. NOTE: changes made to this instance are not persisted.
        inlinks - current inlinks from LinkDb. NOTE: changes made to this instance are not persisted.
        initScore - initial boost value for the indexed document.
        Returns:
        boost value for the indexed document. This value is passed as an argument to the next scoring filter in chain. NOTE: implementations may also express other scoring strategies by modifying the indexed document directly.
        Throws:
        ScoringFilterException - if there is a fatal error whilst calculating the indexed document score/boost