Package org.apache.nutch.scoring.orphan
Class OrphanScoringFilter
- java.lang.Object
-
- org.apache.nutch.scoring.AbstractScoringFilter
-
- org.apache.nutch.scoring.orphan.OrphanScoringFilter
-
- All Implemented Interfaces:
Configurable
,Pluggable
,ScoringFilter
public class OrphanScoringFilter extends AbstractScoringFilter
Orphan scoring filter that determines whether a page has become orphaned, e.g. it has no more other pages linking to it. If a page hasn't been linked to after markGoneAfter seconds, the page is marked as gone and is then removed by an indexer. If a page hasn't been linked to after markOrphanAfter seconds, the page is removed from the CrawlDB.
-
-
Field Summary
Fields Modifier and Type Field Description static Text
ORPHAN_KEY_WRITABLE
-
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description OrphanScoringFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
orphanedScore(Text url, CrawlDatum datum)
This method may change the score or status of CrawlDatum during CrawlDb update, when the URL is neither fetched nor has any inlinks.void
setConf(Configuration conf)
void
updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinks)
Used for orphan control.-
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFilter
distributeScoreToOutlinks, generatorSortValue, getConf, indexerScore, initialScore, injectedScore, passScoreAfterParsing, passScoreBeforeParsing
-
-
-
-
Field Detail
-
ORPHAN_KEY_WRITABLE
public static Text ORPHAN_KEY_WRITABLE
-
-
Method Detail
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
- Overrides:
setConf
in classAbstractScoringFilter
-
updateDbScore
public void updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinks) throws ScoringFilterException
Used for orphan control.- Specified by:
updateDbScore
in interfaceScoringFilter
- Overrides:
updateDbScore
in classAbstractScoringFilter
- Parameters:
url
- of the recordold
- CrawlDatumdatum
- new CrawlDatuminlinks
- list of inlinked CrawlDatums- Throws:
ScoringFilterException
- there is a fatal error calculating a new score ofCrawlDatum
during CrawlDb update
-
orphanedScore
public void orphanedScore(Text url, CrawlDatum datum)
Description copied from interface:ScoringFilter
This method may change the score or status of CrawlDatum during CrawlDb update, when the URL is neither fetched nor has any inlinks.- Parameters:
url
- URL of the pagedatum
- CrawlDatum for page
-
-