public abstract class AbstractContentParserPoi extends AbstractContentParser
ContentParser interface for parsing
binary Microsoft office documents using apache POI.| Modifier and Type | Field and Description |
|---|---|
static String |
POIFS_EXCEL_DOC
name of the entry for a excel document in the POI filesystem
|
static String |
POIFS_POWERPOINT_DOC
name of the entry for a powerpoint document in the POI filesystem
|
static String |
POIFS_WORD_DOC
name of the entry for a word document in the POI filesystem
|
VARIABLE_NAME_CREATOR, VARIABLE_NAME_KEYWORDS, VARIABLE_NAME_LANGUAGE, VARIABLE_NAME_TEXT, VARIABLE_NAME_TITLE| Constructor and Description |
|---|
AbstractContentParserPoi()
The constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected abstract String |
extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFs,
long filesize,
ContentParserOptions options)
This method extracts the text from the office document given by
poiFs. |
void |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options,
MutableGenericContext context) |
doInitialize, getAlternativeKeyArray, getPrimaryKeys, getSecondaryKeyArray, getSecondaryKeys, parse, parse, setGenericContextFactorycreateLogger, getLoggerdoInitialized, getInitializationState, initializeclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetExtension, getMimetypepublic static final String POIFS_WORD_DOC
public static final String POIFS_POWERPOINT_DOC
public static final String POIFS_EXCEL_DOC
public void parse(InputStream inputStream, long filesize, ContentParserOptions options, MutableGenericContext context) throws Exception
parse in class AbstractContentParserinputStream - is the fresh input stream of the content to parse.filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options - are the ContentParserOptions.context - is the MutableGenericContext where the extracted
metadata from the parsed inputStream will be
added to.Exception - if the operation fails for arbitrary reasons.ContentParser.parse(InputStream, long)protected abstract String extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFs, long filesize, ContentParserOptions options) throws Exception
poiFs.poiFs - is the POI filesystem of the office document.filesize - is the size (content-length) of the content to parse in
bytes or 0 if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options - are the ContentParserOptions.Exception - if something goes wrong.Copyright © 2001–2016 mmm-Team. All rights reserved.