public abstract class AbstractContentParser extends AbstractLoggableComponent implements ContentParser
ContentParser
. It
handles the delegation methods, the creation of the
MutableGenericContext
and the safe closing
of the InputStream
.Modifier and Type | Field and Description |
---|---|
private GenericContextFactory |
genericContextFactory |
private Set<String> |
primaryKeys |
private Set<String> |
secondaryKeys |
VARIABLE_NAME_CREATOR, VARIABLE_NAME_KEYWORDS, VARIABLE_NAME_LANGUAGE, VARIABLE_NAME_TEXT, VARIABLE_NAME_TITLE
Constructor and Description |
---|
AbstractContentParser()
The constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
doInitialize()
This method performs the actual
initialization . |
String[] |
getAlternativeKeyArray()
|
Set<String> |
getPrimaryKeys()
This method gets the primary
keys used to register this ContentParser . |
String[] |
getSecondaryKeyArray()
This method gets the
secondary keys as array. |
Set<String> |
getSecondaryKeys()
This method gets the secondary
keys used to register this ContentParser . |
GenericContext |
parse(InputStream inputStream,
long filesize)
This method parses the document given as
inputStream and
extracts text and metadata returned as
GenericContext . |
GenericContext |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options)
This method parses the document given as
inputStream and
extracts text and metadata returned as
GenericContext . |
protected abstract void |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options,
MutableGenericContext context) |
void |
setGenericContextFactory(GenericContextFactory genericContextFactory) |
createLogger, getLogger
doInitialized, getInitializationState, initialize
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getExtension, getMimetype
private GenericContextFactory genericContextFactory
private final Set<String> primaryKeys
getPrimaryKeys()
private final Set<String> secondaryKeys
getSecondaryKeys()
@Inject public void setGenericContextFactory(GenericContextFactory genericContextFactory)
genericContextFactory
- is the genericContextFactory to setprotected void doInitialize()
initialization
. It is called when AbstractComponent.initialize()
is
invoked for the first time. super.
AbstractComponent.doInitialize()
.doInitialize
in class AbstractLoggableComponent
public final Set<String> getPrimaryKeys()
keys
used to register this ContentParser
. This set contains
extension
and mimetype
and
maybe other alternatives. ContentParser.getExtension()
but can
also include "htm" as primary key
.extension
"tar.gz" may also
include "tgz" as primary key
.mimetype
"text/xml" may also
include "application/xml" as primary key
.getPrimaryKeys
in interface ContentParser
ContentParser
.ContentParserService.getParser(String)
,
ContentParser.getSecondaryKeys()
,
Collections.emptySet()
public String[] getAlternativeKeyArray()
getPrimaryKeys()
public String[] getSecondaryKeyArray()
secondary keys
as array.
This is just a convenience to make it easier for the implementors of
individual parsers not to deal with creating a Set
and make it
unmodifiable.getPrimaryKeys()
public final Set<String> getSecondaryKeys()
keys
used to register this ContentParser
. If an other (more
specific) ContentParser
defines such key as
primary key
, that ContentParser
is chosen
first. Otherwise this implementation will be used. ContentParser.getExtension()
but can
define "xhtml" and "application/xhtml+xml" as secondary key
.extension
"txt" may return
"java", "php", "c", "cpp", etc. as secondary
keys
.getSecondaryKeys
in interface ContentParser
ContentParser
.ContentParserService.getParser(String)
,
ContentParser.getPrimaryKeys()
,
Collections.emptySet()
public final GenericContext parse(InputStream inputStream, long filesize) throws Exception
inputStream
and
extracts text
and metadata returned as
GenericContext
.parse
in interface ContentParser
inputStream
- is the fresh input stream of the content to parse. It
will be closed
by this method (on
success and in exceptional state).filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.GenericContext
containing the extracted metadata from
the parsed inputStream
. See the
VARIABLE_NAME_*
constants (e.g.
ContentParser.VARIABLE_NAME_TEXT
) for the default keys. Please note that
an implementation may use individual names for additional
variables.Exception
- if the parsing failed for a technical reason. There can
be arbitrary implementations for this interface that can throw any
Exception
from this method. Declaring a specific
ParseException
here would cause the overhead of
additional encapsulation of exceptions without any advantage. The
user of this interface has to catch for Exception
what
includes RuntimeException
s and excludes Error
s. He
has to handle the problem anyways (also for
RuntimeException
s) and has all contextual information
required to enhance the exception message
. This is NOT a matter of bad design.public GenericContext parse(InputStream inputStream, long filesize, ContentParserOptions options) throws Exception
inputStream
and
extracts text
and metadata returned as
GenericContext
.parse
in interface ContentParser
inputStream
- is the fresh input stream of the content to parse. It
will be closed
by this method (on
success and in exceptional state).filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options
- are the ContentParserOptions
.GenericContext
containing the extracted metadata from
the parsed inputStream
. See the
VARIABLE_NAME_*
constants (e.g.
ContentParser.VARIABLE_NAME_TEXT
) for the default keys. Please note that
an implementation may use individual names for additional
variables.Exception
- if the parsing failed for a technical reason. There can
be arbitrary implementations for this interface that can throw any
Exception
from this method. Declaring a specific
ParseException
here would cause the overhead of
additional encapsulation of exceptions without any advantage. The
user of this interface has to catch for Exception
what
includes RuntimeException
s and excludes Error
s. He
has to handle the problem anyways (also for
RuntimeException
s) and has all contextual information
required to enhance the exception message
. This is NOT a matter of bad design.protected abstract void parse(InputStream inputStream, long filesize, ContentParserOptions options, MutableGenericContext context) throws Exception
inputStream
- is the fresh input stream of the content to parse.filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options
- are the ContentParserOptions
.context
- is the MutableGenericContext
where the extracted
metadata from the parsed inputStream
will be
added
to.Exception
- if the operation fails for arbitrary reasons.ContentParser.parse(InputStream, long)
Copyright © 2001–2016 mmm-Team. All rights reserved.