Package org.apache.nutch.scoring.tld
Class TLDScoringFilter
- java.lang.Object
-
- org.apache.nutch.scoring.AbstractScoringFilter
-
- org.apache.nutch.scoring.tld.TLDScoringFilter
-
- All Implemented Interfaces:
Configurable
,Pluggable
,ScoringFilter
public class TLDScoringFilter extends AbstractScoringFilter
Scoring filter to boost top-level domains (TLDs).
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description TLDScoringFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description float
indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
This method calculates a indexed document score/boost.-
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFilter
distributeScoreToOutlinks, generatorSortValue, getConf, initialScore, injectedScore, passScoreAfterParsing, passScoreBeforeParsing, setConf, updateDbScore
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.nutch.scoring.ScoringFilter
orphanedScore
-
-
-
-
Method Detail
-
indexerScore
public float indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore) throws ScoringFilterException
Description copied from interface:ScoringFilter
This method calculates a indexed document score/boost.- Specified by:
indexerScore
in interfaceScoringFilter
- Overrides:
indexerScore
in classAbstractScoringFilter
- Parameters:
url
- url of the pagedoc
- indexed document. NOTE: this already contains all information collected by indexing filters. Implementations may modify this instance, in order to store/remove some information.dbDatum
- current page from CrawlDb. NOTE:- changes made to this instance are not persisted
- may be null if indexing is done without CrawlDb or if the segment is generated not from the CrawlDb (via FreeGenerator).
fetchDatum
- datum from FetcherOutput (containing among others the fetching status)parse
- parsing result. NOTE: changes made to this instance are not persisted.inlinks
- current inlinks from LinkDb. NOTE: changes made to this instance are not persisted.initScore
- initial boost value for the indexed document.- Returns:
- boost value for the indexed document. This value is passed as an argument to the next scoring filter in chain. NOTE: implementations may also express other scoring strategies by modifying the indexed document directly.
- Throws:
ScoringFilterException
- if there is a fatal error whilst calculating the indexed document score/boost
-
-