Package org.apache.nutch.parse.metatags
Class MetaTagsParser
- java.lang.Object
-
- org.apache.nutch.parse.metatags.MetaTagsParser
-
- All Implemented Interfaces:
Configurable
,HtmlParseFilter
,Pluggable
public class MetaTagsParser extends Object implements HtmlParseFilter
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'. Metatags are matched ignoring case.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description MetaTagsParser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseResult
filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.Configuration
getConf()
void
setConf(Configuration conf)
-
-
-
Method Detail
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
filter
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Description copied from interface:HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.- Specified by:
filter
in interfaceHtmlParseFilter
- Parameters:
content
- theContent
for a given responseparseResult
- the result of running on or moreParser
's on the content.metaTags
- a populatedHTMLMetaTags
objectdoc
- aDocumentFragment
(DOM) which can be processed in the filtering process.- Returns:
- a filtered
ParseResult
- See Also:
Parser.getParse(Content)
-
-