Package org.apache.nutch.any23
This packages uses the Apache Any23 library
for parsing and extracting structured data in RDF format from a
variety of Web documents. The supported formats can be found
at Apache Any23.
-
Class Summary Class Description Any23IndexingFilter This implementation ofIndexingFilter
adds a triple(s) field to theNutchDocument
.Any23ParseFilter This implementation ofHtmlParseFilter
uses the Apache Any23 library for parsing and extracting structured data in RDF format from a variety of Web documents.