public abstract class AbstractContentParserPoi extends AbstractContentParser
ContentParser
interface for parsing
binary Microsoft office documents using apache POI.Modifier and Type | Field and Description |
---|---|
static String |
POIFS_EXCEL_DOC
name of the entry for a excel document in the POI filesystem
|
static String |
POIFS_POWERPOINT_DOC
name of the entry for a powerpoint document in the POI filesystem
|
static String |
POIFS_WORD_DOC
name of the entry for a word document in the POI filesystem
|
VARIABLE_NAME_CREATOR, VARIABLE_NAME_KEYWORDS, VARIABLE_NAME_LANGUAGE, VARIABLE_NAME_TEXT, VARIABLE_NAME_TITLE
Constructor and Description |
---|
AbstractContentParserPoi()
The constructor.
|
Modifier and Type | Method and Description |
---|---|
protected abstract String |
extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFs,
long filesize,
ContentParserOptions options)
This method extracts the text from the office document given by
poiFs . |
void |
parse(InputStream inputStream,
long filesize,
ContentParserOptions options,
MutableGenericContext context) |
doInitialize, getAlternativeKeyArray, getPrimaryKeys, getSecondaryKeyArray, getSecondaryKeys, parse, parse, setGenericContextFactory
createLogger, getLogger
doInitialized, getInitializationState, initialize
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getExtension, getMimetype
public static final String POIFS_WORD_DOC
public static final String POIFS_POWERPOINT_DOC
public static final String POIFS_EXCEL_DOC
public void parse(InputStream inputStream, long filesize, ContentParserOptions options, MutableGenericContext context) throws Exception
parse
in class AbstractContentParser
inputStream
- is the fresh input stream of the content to parse.filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options
- are the ContentParserOptions
.context
- is the MutableGenericContext
where the extracted
metadata from the parsed inputStream
will be
added
to.Exception
- if the operation fails for arbitrary reasons.ContentParser.parse(InputStream, long)
protected abstract String extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFs, long filesize, ContentParserOptions options) throws Exception
poiFs
.poiFs
- is the POI filesystem of the office document.filesize
- is the size (content-length) of the content to parse in
bytes or 0
if NOT available (unknown). If available,
the parser may use this value for optimized allocations.options
- are the ContentParserOptions
.Exception
- if something goes wrong.Copyright © 2001–2016 mmm-Team. All rights reserved.