MIME / IANA | ISO-8859-2 |
---|---|
Alias(es) | iso-ir-101, csISOLatin2, latin2, l2, IBM1111 |
Language(s) | (see below) |
Standard | ECMA-94:1986, ISO/IEC 8859 |
Classification | Extended ASCII, ISO/IEC 8859 |
Extends | US-ASCII |
Based on | ISO-8859-1 |
Other related encoding(s) | Windows-1250, MacCroatian |
ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central[1] or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions.[2] Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8 (on the web).
ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. Less than 0.04% of all web pages use ISO-8859-2 as of October 2022.[3][4] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned code page 912 to ISO 8859-2,[5] until that code page was extended in 1999.[6] Code page 1111 is similar, but replaces byte B0 ° (degree sign) with U+02DA ? (ring above).
Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).
Language coverage
[edit]These code values can be used for the following languages:
- ^ The missing letter ? is officially a part of the Finnish alphabet, however it has no native use and its usage is limited to foreign names only.
- ^ In 2017, the Council for German Orthography officially added a capital ?, but is not actually required as SS can be used instead.
- ^ This character set unifies ? and ? (S,T with commas below) with ? and ? (S, T with cedillas), as did virtually all other character sets including Microsoft's Windows-1250 and the first version of Unicode. Unicode subsequently disunified them however, this complicated processing of Romanian data; pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.[citation needed]
Code page layout
[edit]Differences from ISO-8859-1 have the Unicode code point number underneath.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | ? 0104 |
? 02D8 |
? 0141 |
¤ | ? 013D |
? 015A |
§ | ¨ | ? 0160 |
? 015E |
? 0164 |
? 0179 |
SHY | ? 017D |
? 017B |
Bx | ° | ? 0105 |
? 02DB |
? 0142 |
′ | ? 013E |
? 015B |
ˇ 02C7 |
? | ? 0161 |
? 015F |
? 0165 |
? 017A |
? 02DD |
? 017E |
? 017C |
Cx | ? 0154 |
á | ? | ? 0102 |
? | ? 0139 |
? 0106 |
? | ? 010C |
é | ? 0118 |
? | ě 011A |
í | ? | ? 010E |
Dx | ? 0110 |
? 0143 |
? 0147 |
ó | ? | ? 0150 |
? | × | ? 0158 |
? 016E |
ú | ? 0170 |
ü | Y | ? 0162 |
? |
Ex | ? 0155 |
á | a | ? 0103 |
? | ? 013A |
? 0107 |
? | ? 010D |
é | ? 0119 |
? | ě 011B |
í | ? | ? 010F |
Fx | ? 0111 |
ń 0144 |
ň 0148 |
ó | ? | ? 0151 |
? | ÷ | ? 0159 |
? 016F |
ú | ? 0171 |
ü | y | ? 0163 |
˙ 02D9 |
See also
[edit]References
[edit]- ^ "Microsoft Outlook Message Encodings". 10 January 2017.
- ^ "The Czech and Slovak Character Encoding Mess Explained". luki.sdf-eu.org. Retrieved 2025-08-04.
- ^ "Usage Statistics and Market Share of ISO-8859-2 for Websites, October 2022". w3techs.com. Retrieved 2025-08-04.
- ^ "Historical trends in the usage statistics of character encodings for websites, February 2022".
- ^ "Icu-data/Charset/Data/XML/Ibm-912_P100-1995.XML at main · unicode-org/Icu-data". GitHub.
- ^ "Icu-data/Charset/Data/Ucm/Ibm-912_P100-1999.ucm at main · unicode-org/Icu-data". GitHub.
External links
[edit]- ISO/IEC 8859-2:1999
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- ISO-IR 101 Right-Hand Part of Latin Alphabet No.2 (February 1, 1986)
- ISO 8859-2 (Latin 2) Resources