@ComponentSpecification public interface EncodingUtil
Character
s of a Charset
to Byte
s and
vice versa.EncodingUtilImpl
Modifier and Type | Field and Description |
---|---|
static String |
ENCODING_CP_437
The encoding
CP437 also called DOS-US . |
static String |
ENCODING_CP_737
The encoding
CP737 . |
static String |
ENCODING_CP_850
The encoding
CP850 . |
static String |
ENCODING_CP_852
The encoding
CP852 . |
static String |
ENCODING_CP_855
The encoding
CP855 . |
static String |
ENCODING_CP_857
The encoding
CP857 . |
static String |
ENCODING_CP_858
The encoding
CP857 . |
static String |
ENCODING_CP_860
The encoding
CP860 . |
static String |
ENCODING_CP_861
The encoding
CP861 . |
static String |
ENCODING_CP_863
The encoding
CP863 . |
static String |
ENCODING_CP_865
The encoding
CP865 . |
static String |
ENCODING_CP_866
The encoding
CP866 . |
static String |
ENCODING_CP_869
The encoding
CP869 . |
static String |
ENCODING_ISO_8859_1
The encoding
ISO-8859-1 also called Latin-1 . |
static String |
ENCODING_ISO_8859_10
The encoding
ISO-8859-10 also called Latin-6 . |
static String |
ENCODING_ISO_8859_11
The encoding
ISO-8859-11 . |
static String |
ENCODING_ISO_8859_12
Deprecated.
|
static String |
ENCODING_ISO_8859_13
The encoding
ISO-8859-13 also called Latin-7 . |
static String |
ENCODING_ISO_8859_14
The encoding
ISO-8859-14 also called Latin-8 . |
static String |
ENCODING_ISO_8859_15
The encoding
ISO-8859-15 also called Latin-9 . |
static String |
ENCODING_ISO_8859_16
The encoding
ISO-8859-16 also called Latin-10 . |
static String |
ENCODING_ISO_8859_2
The encoding
ISO-8859-2 also called Latin-2 . |
static String |
ENCODING_ISO_8859_3
The encoding
ISO-8859-3 also called Latin-3 . |
static String |
ENCODING_ISO_8859_4
The encoding
ISO-8859-4 also called Latin-4 . |
static String |
ENCODING_ISO_8859_5
The encoding
ISO-8859-5 . |
static String |
ENCODING_ISO_8859_6
The encoding
ISO-8859-6 . |
static String |
ENCODING_ISO_8859_7
The encoding
ISO-8859-7 . |
static String |
ENCODING_ISO_8859_8
The encoding
ISO-8859-8 . |
static String |
ENCODING_ISO_8859_9
The encoding
ISO-8859-9 also called Latin-5 . |
static String |
ENCODING_KOI8_R
The encoding
KOI8-R . |
static String |
ENCODING_KOI8_U
The encoding
KOI8-U . |
static String |
ENCODING_US_ASCII
The encoding
US-ASCII (American Standard Code for Information Interchange) also just called
ASCII . |
static String |
ENCODING_UTF_16
The encoding
UTF-16 . |
static String |
ENCODING_UTF_16_BE
The encoding
UTF-16, big-endian . |
static String |
ENCODING_UTF_16_LE
The encoding
UTF-16, little-endian . |
static String |
ENCODING_UTF_32
The encoding
UTF-32 . |
static String |
ENCODING_UTF_32_BE
The encoding
UTF-32, big-endian . |
static String |
ENCODING_UTF_32_LE
The encoding
UTF-32, little-endian . |
static String |
ENCODING_UTF_8
The encoding
UTF-8 . |
static String |
ENCODING_WINDOWS_1250
The encoding
CP1250 also called Windows-1250 . |
static String |
ENCODING_WINDOWS_1251
The encoding
CP1251 also called Windows-1251 . |
static String |
ENCODING_WINDOWS_1252
The encoding
CP1252 also called Windows-1252 . |
static String |
ENCODING_WINDOWS_1253
The encoding
CP1253 also called Windows-1253 . |
static String |
ENCODING_WINDOWS_1254
The encoding
CP1254 also called Windows-1254 . |
static String |
ENCODING_WINDOWS_1255
The encoding
CP1255 also called Windows-1255 . |
static String |
ENCODING_WINDOWS_1256
The encoding
CP1256 also called Windows-1256 . |
static String |
ENCODING_WINDOWS_1257
The encoding
CP1257 also called Windows-1257 . |
static String |
ENCODING_WINDOWS_1258
The encoding
CP1258 also called Windows-1258 . |
static String |
SYSTEM_DEFAULT_ENCODING
The default encoding used by this JVM as fallback if no explicit encoding is specified.
|
Modifier and Type | Method and Description |
---|---|
EncodingDetectionReader |
createUtfDetectionReader(InputStream inputStream,
String nonUtfEncoding)
This method creates a new
Reader for the given inputStream . |
static final String SYSTEM_DEFAULT_ENCODING
static final String ENCODING_US_ASCII
US-ASCII
(American Standard Code for Information Interchange) also just called
ASCII
. lib/rt.jar
.static final String ENCODING_UTF_8
UTF-8
. It is an 8-bit Unicode Transformation Format. lib/rt.jar
.static final String ENCODING_UTF_16
UTF-16
. It is an 16-bit Unicode Transformation Format. The byte-order is determined
by an optional ByteOrderMark
. lib/rt.jar
.static final String ENCODING_UTF_16_LE
UTF-16, little-endian
. It is an 16-bit Unicode Transformation Format. lib/rt.jar
.static final String ENCODING_UTF_16_BE
UTF-16, big-endian
. It is an 16-bit Unicode Transformation Format. lib/rt.jar
.static final String ENCODING_UTF_32
UTF-32
. It is an 32-bit Unicode Transformation Format. The byte-order is determined
by an optional ByteOrderMark
. static final String ENCODING_UTF_32_LE
UTF-32, little-endian
. It is an 32-bit Unicode Transformation Format. static final String ENCODING_UTF_32_BE
UTF-32, big-endian
. It is an 32-bit Unicode Transformation Format. static final String ENCODING_ISO_8859_1
ISO-8859-1
also called Latin-1
. It is covering most Western European
languages. lib/rt.jar
.static final String ENCODING_ISO_8859_2
ISO-8859-2
also called Latin-2
. It is covering the Central and Eastern
European languages that use the Latin alphabet. lib/rt.jar
.static final String ENCODING_ISO_8859_3
ISO-8859-3
also called Latin-3
. It is covering the South European languages.
lib/charsets.jar
.static final String ENCODING_ISO_8859_4
ISO-8859-4
also called Latin-4
. It is covering the North European languages.
lib/rt.jar
.static final String ENCODING_ISO_8859_5
ISO-8859-5
. It is covering mostly Slavic languages that use a Cyrillic alphabet.
lib/rt.jar
.static final String ENCODING_ISO_8859_6
ISO-8859-6
. It is covering common Arabic language characters. lib/charsets.jar
.static final String ENCODING_ISO_8859_7
ISO-8859-7
. It is covering modern Greek. lib/rt.jar
.static final String ENCODING_ISO_8859_8
ISO-8859-8
. It is covering modern Hebrew (used in Israel). lib/charsets.jar
.static final String ENCODING_ISO_8859_9
ISO-8859-9
also called Latin-5
. It is covering Turkish and Kurdish. lib/rt.jar
.static final String ENCODING_ISO_8859_10
ISO-8859-10
also called Latin-6
. It is used for Nordic languages. static final String ENCODING_ISO_8859_11
ISO-8859-11
. The canonical name
however is
x-iso-8859-11
. It is covering common Thai language characters.@Deprecated static final String ENCODING_ISO_8859_12
ISO-8859-12
. The work on this encoding for Devanagari was stopped so it does NOT
exist at all.static final String ENCODING_ISO_8859_13
ISO-8859-13
also called Latin-7
. It is covering Baltic languages. lib/rt.jar
.static final String ENCODING_ISO_8859_14
ISO-8859-14
also called Latin-8
. It is covering Celtic languages. static final String ENCODING_ISO_8859_15
ISO-8859-15
also called Latin-9
. It is very similar to
Latin-1
but adds the euro-sign and 7 other characters by replacing rarely
used ones. lib/rt.jar
.static final String ENCODING_ISO_8859_16
ISO-8859-16
also called Latin-10
. It is covering South-Eastern European
languages and includes the euro-sign. static final String ENCODING_KOI8_R
KOI8-R
. It is covering Russian and Bulgarian. It is therefore related to
ENCODING_ISO_8859_5
and ENCODING_WINDOWS_1251
. lib/rt.jar
.static final String ENCODING_KOI8_U
KOI8-U
. It is covering Ukrainian. It is related to ENCODING_KOI8_R
,
ENCODING_ISO_8859_5
and ENCODING_WINDOWS_1251
. static final String ENCODING_CP_437
CP437
also called DOS-US
. It is used by MS-DOS and is based on
ENCODING_US_ASCII
but NOT completely compatible.static final String ENCODING_CP_737
CP737
. It is used by MS-DOS for Greek and is therefore related to
ENCODING_CP_869
and ENCODING_ISO_8859_7
.static final String ENCODING_CP_850
CP850
. It is used by MS-DOS for Western European languages and is therefore related
to ENCODING_ISO_8859_1
.static final String ENCODING_CP_852
CP852
. It is used by MS-DOS for Central European languages and is therefore related
to ENCODING_ISO_8859_2
.static final String ENCODING_CP_855
CP855
. It is used by MS-DOS for Cyrillic letters and is therefore related to
ENCODING_ISO_8859_5
.static final String ENCODING_CP_857
CP857
. It is used by MS-DOS for Turkish and is therefore related to
ENCODING_ISO_8859_9
.static final String ENCODING_CP_858
CP857
. It is used by MS-DOS for Western European languages and is like
ENCODING_CP_850
but replaces one character with the euro-sign. It is therefore related to
ENCODING_ISO_8859_15
.static final String ENCODING_CP_860
CP860
. It is used by MS-DOS for Portuguese and is therefore related to
ENCODING_ISO_8859_1
.static final String ENCODING_CP_861
CP861
. It is used by MS-DOS for Nordic languages especially for Icelandic and is
therefore related to ENCODING_ISO_8859_10
.static final String ENCODING_CP_863
CP863
. It is used by MS-DOS for French and is therefore related to
ENCODING_ISO_8859_15
.static final String ENCODING_CP_865
CP865
. It is used by MS-DOS for Nordic languages except Icelandic for which
ENCODING_CP_861
is used. It is therefore related to ENCODING_ISO_8859_10
.static final String ENCODING_CP_866
CP866
. It is used by MS-DOS for Cyrillic letters and is therefore related to
ENCODING_CP_855
and ENCODING_ISO_8859_5
.static final String ENCODING_CP_869
CP869
. It is used by MS-DOS for Greek and is therefore related to
ENCODING_CP_737
and ENCODING_ISO_8859_7
.static final String ENCODING_WINDOWS_1250
CP1250
also called Windows-1250
. It is used by Microsoft Windows for Central
European languages and is similar to ENCODING_ISO_8859_2
. lib/rt.jar
.static final String ENCODING_WINDOWS_1251
CP1251
also called Windows-1251
. It is used by Microsoft Windows for
Cyrillic letters and is similar to ENCODING_ISO_8859_5
. lib/rt.jar
.static final String ENCODING_WINDOWS_1252
CP1252
also called Windows-1252
. It is used by Microsoft Windows for Western
European languages and is similar to ENCODING_ISO_8859_1
. lib/rt.jar
.static final String ENCODING_WINDOWS_1253
CP1253
also called Windows-1253
. It is used by Microsoft Windows for Greek
and is similar to ENCODING_ISO_8859_7
. lib/rt.jar
.static final String ENCODING_WINDOWS_1254
CP1254
also called Windows-1254
. It is used by Microsoft Windows for Turkish
and is similar to ENCODING_ISO_8859_9
. lib/rt.jar
.static final String ENCODING_WINDOWS_1255
CP1255
also called Windows-1255
. It is used by Microsoft Windows for Hebrew
and is similar to ENCODING_ISO_8859_8
.static final String ENCODING_WINDOWS_1256
CP1256
also called Windows-1256
. It is used by Microsoft Windows for Arabic
and is similar to ENCODING_ISO_8859_6
.static final String ENCODING_WINDOWS_1257
CP1257
also called Windows-1257
. It is used by Microsoft Windows for Baltic
languages and is similar to ENCODING_ISO_8859_13
. lib/rt.jar
.static final String ENCODING_WINDOWS_1258
CP1258
also called Windows-1258
. It is used by Microsoft Windows for
Vietnamese and is similar to ENCODING_WINDOWS_1252
.EncodingDetectionReader createUtfDetectionReader(InputStream inputStream, String nonUtfEncoding)
Reader
for the given inputStream
. The
EncodingDetectionReader
automatically detects UTF (Unicode Transformation Format) encodings. If
the data provided by inputStream
is NOT in such encoding, it will use the given
nonUtfEncoding
as fallback. EncodingDetectionReader
will behave like InputStreamReader
but with an
encoding that is automatically detected whilst reading. It will use a lookahead buffer to detect the
encoding. As long as no UTF characteristic was detected and only ASCII-characters (<128
) are hit,
the encoding remains ENCODING_US_ASCII
. As soon as an UTF sequence was detected (e.g.
ENCODING_UTF_8
or ENCODING_UTF_16_BE
), the encoding switches to that encoding. If a
non-ASCII character is hit and no UTF encoding is detected, the EncodingDetectionReader
switches
to the given nonUtfEncoding
.inputStream
- is the InputStream
to decode and read.nonUtfEncoding
- is the encoding to use in case the data is NOT encoded in UTF (e.g.
ENCODING_ISO_8859_15
). It is pointless to use an UTF-based encoding or
ENCODING_US_ASCII
here.EncodingDetectionReader
that can be used to read the inputStream
.Copyright © 2001–2016 mmm-Team. All rights reserved.