Class Any23ParseFilter

  • All Implemented Interfaces:
    Configurable, HtmlParseFilter, Pluggable

    public class Any23ParseFilter
    extends Object
    implements HtmlParseFilter

    This implementation of HtmlParseFilter uses the Apache Any23 library for parsing and extracting structured data in RDF format from a variety of Web documents. The supported formats can be found at Apache Any23.

    In this implementation triples are written as Notation3 and triples are identified within output triple streams by the presence of '\n'. The presence of the '\n' is a characteristic specific to N3 serialization in Any23. In order to use another/other writers implementing the TripleHandler interface, we will most likely need to identify an alternative data characteristic which we can use to split triples streams.