Uses of Class
org.apache.nutch.scoring.ScoringFilterException
-
Packages that use ScoringFilterException Package Description org.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph
.org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.orphan Scoring filter to modify score or status of orphaned pages (no inlinks found for a configurable amount of time).org.apache.nutch.scoring.similarity org.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Plugin -
-
Uses of ScoringFilterException in org.apache.nutch.scoring
Methods in org.apache.nutch.scoring that throw ScoringFilterException Modifier and Type Method Description CrawlDatum
AbstractScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
CrawlDatum
ScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
Distribute score value from the current page to all its outlinked pages.CrawlDatum
ScoringFilters. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
float
AbstractScoringFilter. generatorSortValue(Text url, CrawlDatum datum, float initSort)
float
ScoringFilter. generatorSortValue(Text url, CrawlDatum datum, float initSort)
This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.float
ScoringFilters. generatorSortValue(Text url, CrawlDatum datum, float initSort)
Calculate a sort value for Generate.float
AbstractScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
float
ScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
This method calculates a indexed document score/boost.float
ScoringFilters. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
void
AbstractScoringFilter. initialScore(Text url, CrawlDatum datum)
void
ScoringFilter. initialScore(Text url, CrawlDatum datum)
Set an initial score for newly discovered pages.void
ScoringFilters. initialScore(Text url, CrawlDatum datum)
Calculate a new initial score, used when adding newly discovered pages.void
AbstractScoringFilter. injectedScore(Text url, CrawlDatum datum)
void
ScoringFilter. injectedScore(Text url, CrawlDatum datum)
Set an initial score for newly injected pages.void
ScoringFilters. injectedScore(Text url, CrawlDatum datum)
Calculate a new initial score, used when injecting new pages.default void
ScoringFilter. orphanedScore(Text url, CrawlDatum datum)
This method may change the score or status of CrawlDatum during CrawlDb update, when the URL is neither fetched nor has any inlinks.void
ScoringFilters. orphanedScore(Text url, CrawlDatum datum)
Calculate orphaned page score during CrawlDb.update().void
AbstractScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
void
ScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process.void
ScoringFilters. passScoreAfterParsing(Text url, Content content, Parse parse)
void
AbstractScoringFilter. passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
void
ScoringFilter. passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
This method takes all relevant score information from the current datum (coming from a generated fetchlist) and stores it intoContent
metadata.void
ScoringFilters. passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
void
AbstractScoringFilter. updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
void
ScoringFilter. updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages.void
ScoringFilters. updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
Calculate updated page score during CrawlDb.update(). -
Uses of ScoringFilterException in org.apache.nutch.scoring.depth
Methods in org.apache.nutch.scoring.depth that throw ScoringFilterException Modifier and Type Method Description CrawlDatum
DepthScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
float
DepthScoringFilter. generatorSortValue(Text url, CrawlDatum datum, float initSort)
float
DepthScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
void
DepthScoringFilter. initialScore(Text url, CrawlDatum datum)
void
DepthScoringFilter. injectedScore(Text url, CrawlDatum datum)
void
DepthScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
void
DepthScoringFilter. passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
void
DepthScoringFilter. updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
-
Uses of ScoringFilterException in org.apache.nutch.scoring.link
Methods in org.apache.nutch.scoring.link that throw ScoringFilterException Modifier and Type Method Description float
LinkAnalysisScoringFilter. generatorSortValue(Text url, CrawlDatum datum, float initSort)
float
LinkAnalysisScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
void
LinkAnalysisScoringFilter. initialScore(Text url, CrawlDatum datum)
void
LinkAnalysisScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
void
LinkAnalysisScoringFilter. passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
-
Uses of ScoringFilterException in org.apache.nutch.scoring.metadata
Methods in org.apache.nutch.scoring.metadata that throw ScoringFilterException Modifier and Type Method Description CrawlDatum
MetadataScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
This will take the metadata that you have listed in your "scoring.parse.md" property, and looks for them inside the parseData object. -
Uses of ScoringFilterException in org.apache.nutch.scoring.opic
Methods in org.apache.nutch.scoring.opic that throw ScoringFilterException Modifier and Type Method Description CrawlDatum
OPICScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.float
OPICScoringFilter. generatorSortValue(Text url, CrawlDatum datum, float initSort)
float
OPICScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
Dampen the boost value by scorePower.void
OPICScoringFilter. initialScore(Text url, CrawlDatum datum)
Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level.void
OPICScoringFilter. injectedScore(Text url, CrawlDatum datum)
void
OPICScoringFilter. updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
Increase the score by a sum of inlinked scores. -
Uses of ScoringFilterException in org.apache.nutch.scoring.orphan
Methods in org.apache.nutch.scoring.orphan that throw ScoringFilterException Modifier and Type Method Description void
OrphanScoringFilter. updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinks)
Used for orphan control. -
Uses of ScoringFilterException in org.apache.nutch.scoring.similarity
Methods in org.apache.nutch.scoring.similarity that throw ScoringFilterException Modifier and Type Method Description CrawlDatum
SimilarityScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
void
SimilarityScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
-
Uses of ScoringFilterException in org.apache.nutch.scoring.tld
Methods in org.apache.nutch.scoring.tld that throw ScoringFilterException Modifier and Type Method Description float
TLDScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
-
Uses of ScoringFilterException in org.apache.nutch.scoring.urlmeta
Methods in org.apache.nutch.scoring.urlmeta that throw ScoringFilterException Modifier and Type Method Description CrawlDatum
URLMetaScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object.
-