@ComponentSpecification(plugin=true) public interface ContentParser
extracts (meta-)data from the content of an InputStream. parsing. See also
ContentParserOptions.getMaximumBufferSize().| Modifier and Type | Field and Description |
|---|---|
static String |
VARIABLE_NAME_CREATOR
This is the name of the
variable with the creator (also called author, artist,
composer, etc.) of the content from the parsed GenericContext. |
static String |
VARIABLE_NAME_KEYWORDS
This is the name of the
variable with the keywords (also called tags) of the content
from the parsed GenericContext. |
static String |
VARIABLE_NAME_LANGUAGE
|
static String |
VARIABLE_NAME_TEXT
|
static String |
VARIABLE_NAME_TITLE
|
| Modifier and Type | Method and Description |
|---|---|
String |
getExtension()
This method gets the default filename extension excluding the dot (e.g.
|
String |
getMimetype()
This method gets the default mimetype (e.g.
|
Set<String> |
getPrimaryKeys()
This method gets the primary
keys used to register this ContentParser. |
Set<String> |
getSecondaryKeys()
This method gets the secondary
keys used to register this ContentParser. |
GenericContext |
parse(InputStream inputStream,
long filesize)
This method parses the document given as
inputStream and
extracts text and metadata returned as
GenericContext. |
GenericContext |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options)
This method parses the document given as
inputStream and
extracts text and metadata returned as
GenericContext. |
static final String VARIABLE_NAME_TEXT
variable with the plain text of the content from the
parsed GenericContext. String and should always be set (not
null).static final String VARIABLE_NAME_TITLE
variable with the title of the content from the
parsed GenericContext. String and is optional (may be null).static final String VARIABLE_NAME_KEYWORDS
variable with the keywords (also called tags) of the content
from the parsed GenericContext. String and is optional (may be null).static final String VARIABLE_NAME_CREATOR
variable with the creator (also called author, artist,
composer, etc.) of the content from the parsed GenericContext. String and is optional (may be null).static final String VARIABLE_NAME_LANGUAGE
variable with the language of the content from the
parsed GenericContext. String and is optional (may be null).GenericContext parse(InputStream inputStream, long filesize) throws Exception
inputStream and
extracts text and metadata returned as
GenericContext.inputStream - is the fresh input stream of the content to parse. It
will be closed by this method (on
success and in exceptional state).filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.GenericContext containing the extracted metadata from
the parsed inputStream. See the
VARIABLE_NAME_* constants (e.g.
VARIABLE_NAME_TEXT) for the default keys. Please note that
an implementation may use individual names for additional
variables.Exception - if the parsing failed for a technical reason. There can
be arbitrary implementations for this interface that can throw any
Exception from this method. Declaring a specific
ParseException here would cause the overhead of
additional encapsulation of exceptions without any advantage. The
user of this interface has to catch for Exception what
includes RuntimeExceptions and excludes Errors. He
has to handle the problem anyways (also for
RuntimeExceptions) and has all contextual information
required to enhance the exception message. This is NOT a matter of bad design.GenericContext parse(InputStream inputStream, long filesize, ContentParserOptions options) throws Exception
inputStream and
extracts text and metadata returned as
GenericContext.inputStream - is the fresh input stream of the content to parse. It
will be closed by this method (on
success and in exceptional state).filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options - are the ContentParserOptions.GenericContext containing the extracted metadata from
the parsed inputStream. See the
VARIABLE_NAME_* constants (e.g.
VARIABLE_NAME_TEXT) for the default keys. Please note that
an implementation may use individual names for additional
variables.Exception - if the parsing failed for a technical reason. There can
be arbitrary implementations for this interface that can throw any
Exception from this method. Declaring a specific
ParseException here would cause the overhead of
additional encapsulation of exceptions without any advantage. The
user of this interface has to catch for Exception what
includes RuntimeExceptions and excludes Errors. He
has to handle the problem anyways (also for
RuntimeExceptions) and has all contextual information
required to enhance the exception message. This is NOT a matter of bad design.String getExtension()
ContentParser.null if this is the
generic parser.String getMimetype()
ContentParser.null if this is the
generic parser.Set<String> getPrimaryKeys()
keys used to register this ContentParser. This set contains
extension and mimetype and
maybe other alternatives. getExtension() but can
also include "htm" as primary key.extension "tar.gz" may also
include "tgz" as primary key.mimetype "text/xml" may also
include "application/xml" as primary key.ContentParser.ContentParserService.getParser(String),
getSecondaryKeys(),
Collections.emptySet()Set<String> getSecondaryKeys()
keys used to register this ContentParser. If an other (more
specific) ContentParser defines such key as
primary key, that ContentParser is chosen
first. Otherwise this implementation will be used. getExtension() but can
define "xhtml" and "application/xhtml+xml" as secondary key.extension "txt" may return
"java", "php", "c", "cpp", etc. as secondary
keys.ContentParser.ContentParserService.getParser(String),
getPrimaryKeys(),
Collections.emptySet()Copyright © 2001–2016 mmm-Team. All rights reserved.