public abstract class AbstractContentParser extends AbstractLoggableComponent implements ContentParser
ContentParser. It
handles the delegation methods, the creation of the
MutableGenericContext and the safe closing of the InputStream.| Modifier and Type | Field and Description |
|---|---|
private GenericContextFactory |
genericContextFactory |
private Set<String> |
primaryKeys |
private Set<String> |
secondaryKeys |
VARIABLE_NAME_CREATOR, VARIABLE_NAME_KEYWORDS, VARIABLE_NAME_LANGUAGE, VARIABLE_NAME_TEXT, VARIABLE_NAME_TITLE| Constructor and Description |
|---|
AbstractContentParser()
The constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
doInitialize()
This method performs the actual
initialization. |
String[] |
getAlternativeKeyArray()
|
Set<String> |
getPrimaryKeys()
This method gets the primary
keys used to register this ContentParser. |
String[] |
getSecondaryKeyArray()
This method gets the
secondary keys as array. |
Set<String> |
getSecondaryKeys()
This method gets the secondary
keys used to register this ContentParser. |
GenericContext |
parse(InputStream inputStream,
long filesize)
This method parses the document given as
inputStream and
extracts text and metadata returned as
GenericContext. |
GenericContext |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options)
This method parses the document given as
inputStream and
extracts text and metadata returned as
GenericContext. |
protected abstract void |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options,
MutableGenericContext context) |
void |
setGenericContextFactory(GenericContextFactory genericContextFactory) |
createLogger, getLoggerdoInitialized, getInitializationState, initializeclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetExtension, getMimetypeprivate GenericContextFactory genericContextFactory
private final Set<String> primaryKeys
getPrimaryKeys()private final Set<String> secondaryKeys
getSecondaryKeys()@Inject public void setGenericContextFactory(GenericContextFactory genericContextFactory)
genericContextFactory - is the genericContextFactory to setprotected void doInitialize()
initialization. It is called when AbstractComponent.initialize() is
invoked for the first time. super.AbstractComponent.doInitialize().doInitialize in class AbstractLoggableComponentpublic final Set<String> getPrimaryKeys()
keys used to register this ContentParser. This set contains
extension and mimetype and
maybe other alternatives. ContentParser.getExtension() but can
also include "htm" as primary key.extension "tar.gz" may also
include "tgz" as primary key.mimetype "text/xml" may also
include "application/xml" as primary key.getPrimaryKeys in interface ContentParserContentParser.ContentParserService.getParser(String),
ContentParser.getSecondaryKeys(),
Collections.emptySet()public String[] getAlternativeKeyArray()
getPrimaryKeys()public String[] getSecondaryKeyArray()
secondary keys as array.
This is just a convenience to make it easier for the implementors of
individual parsers not to deal with creating a Set and make it
unmodifiable.getPrimaryKeys()public final Set<String> getSecondaryKeys()
keys used to register this ContentParser. If an other (more
specific) ContentParser defines such key as
primary key, that ContentParser is chosen
first. Otherwise this implementation will be used. ContentParser.getExtension() but can
define "xhtml" and "application/xhtml+xml" as secondary key.extension "txt" may return
"java", "php", "c", "cpp", etc. as secondary
keys.getSecondaryKeys in interface ContentParserContentParser.ContentParserService.getParser(String),
ContentParser.getPrimaryKeys(),
Collections.emptySet()public final GenericContext parse(InputStream inputStream, long filesize) throws Exception
inputStream and
extracts text and metadata returned as
GenericContext.parse in interface ContentParserinputStream - is the fresh input stream of the content to parse. It
will be closed by this method (on
success and in exceptional state).filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.GenericContext containing the extracted metadata from
the parsed inputStream. See the
VARIABLE_NAME_* constants (e.g.
ContentParser.VARIABLE_NAME_TEXT) for the default keys. Please note that
an implementation may use individual names for additional
variables.Exception - if the parsing failed for a technical reason. There can
be arbitrary implementations for this interface that can throw any
Exception from this method. Declaring a specific
ParseException here would cause the overhead of
additional encapsulation of exceptions without any advantage. The
user of this interface has to catch for Exception what
includes RuntimeExceptions and excludes Errors. He
has to handle the problem anyways (also for
RuntimeExceptions) and has all contextual information
required to enhance the exception message. This is NOT a matter of bad design.public GenericContext parse(InputStream inputStream, long filesize, ContentParserOptions options) throws Exception
inputStream and
extracts text and metadata returned as
GenericContext.parse in interface ContentParserinputStream - is the fresh input stream of the content to parse. It
will be closed by this method (on
success and in exceptional state).filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options - are the ContentParserOptions.GenericContext containing the extracted metadata from
the parsed inputStream. See the
VARIABLE_NAME_* constants (e.g.
ContentParser.VARIABLE_NAME_TEXT) for the default keys. Please note that
an implementation may use individual names for additional
variables.Exception - if the parsing failed for a technical reason. There can
be arbitrary implementations for this interface that can throw any
Exception from this method. Declaring a specific
ParseException here would cause the overhead of
additional encapsulation of exceptions without any advantage. The
user of this interface has to catch for Exception what
includes RuntimeExceptions and excludes Errors. He
has to handle the problem anyways (also for
RuntimeExceptions) and has all contextual information
required to enhance the exception message. This is NOT a matter of bad design.protected abstract void parse(InputStream inputStream, long filesize, ContentParserOptions options, MutableGenericContext context) throws Exception
inputStream - is the fresh input stream of the content to parse.filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options - are the ContentParserOptions.context - is the MutableGenericContext where the extracted
metadata from the parsed inputStream will be
added to.Exception - if the operation fails for arbitrary reasons.ContentParser.parse(InputStream, long)Copyright © 2001–2016 mmm-Team. All rights reserved.