public abstract class AbstractContentParserOoxml extends AbstractContentParser
ContentParser
interface for parsing
binary Microsoft office documents using apache POI.Modifier and Type | Field and Description |
---|---|
static String |
KEY_MIMETYPE_GENERIC
The generic mimetype for OOXML.
|
VARIABLE_NAME_CREATOR, VARIABLE_NAME_KEYWORDS, VARIABLE_NAME_LANGUAGE, VARIABLE_NAME_TEXT, VARIABLE_NAME_TITLE
Constructor and Description |
---|
AbstractContentParserOoxml()
The constructor.
|
Modifier and Type | Method and Description |
---|---|
protected org.apache.poi.POIXMLTextExtractor |
createExtractor(org.apache.poi.openxml4j.opc.OPCPackage opcPackage)
This method creates the
POIXMLTextExtractor for the given
opcPackage . |
protected String |
extractText(org.apache.poi.POIXMLTextExtractor extractor,
long filesize)
This method extracts the text from the office document given by
poiFs . |
void |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options,
MutableGenericContext context) |
doInitialize, getAlternativeKeyArray, getPrimaryKeys, getSecondaryKeyArray, getSecondaryKeys, parse, parse, setGenericContextFactory
createLogger, getLogger
doInitialized, getInitializationState, initialize
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getExtension, getMimetype
public static final String KEY_MIMETYPE_GENERIC
protected org.apache.poi.POIXMLTextExtractor createExtractor(org.apache.poi.openxml4j.opc.OPCPackage opcPackage) throws Exception
POIXMLTextExtractor
for the given
opcPackage
.opcPackage
- is the OPCPackage
.POIXMLTextExtractor
.Exception
- if something goes wrong.public void parse(InputStream inputStream, long filesize, ContentParserOptions options, MutableGenericContext context) throws Exception
parse
in class AbstractContentParser
inputStream
- is the fresh input stream of the content to parse.filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options
- are the ContentParserOptions
.context
- is the MutableGenericContext
where the extracted
metadata from the parsed inputStream
will be
added
to.Exception
- if the operation fails for arbitrary reasons.ContentParser.parse(InputStream, long)
protected String extractText(org.apache.poi.POIXMLTextExtractor extractor, long filesize) throws Exception
poiFs
.extractor
- is the POIXMLTextExtractor
.filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.Exception
- if something goes wrong.Copyright © 2001–2016 mmm-Team. All rights reserved.