Changes 60.1:
* Unicode 10.0: 8,518 new characters, including four new scripts, 7,494 new Han characters, and 56 new emoji characters.
- Properties newly supported in ICU: Emoji_Component, Regional_Indicator, Prepended_Concatenation_Mark
* CLDR 32:
- Data for several (mostly Asian) new languages, date formatting patterns using colloquial day period formats ("h:mm B" → “1:30 in the afternoon”), and many other data improvements.
- See the CLDR download page for other CLDR features and migration issues in CLDR 32.
* NumberFormatter, a new number formatting API: A long-overdue refresh of number formatting in ICU with a focus on usability, robustness, and performance. The 30+ settings in DecimalFormat are reduced to 8 in NumberFormatter; all NumberFormatter objects are thread-safe and immutable; and the code is efficient in both the client-side (constant locale) and server-side (variable locale) use cases.
- New users are encouraged to use the new API for number formatting. However, preexisting code can continue using the old API, which has been partially made into a wrapper over the new API.
- Documentation: in Java, see com.ibm.icu.number.NumberFormatter, and in C++, see i18n/unicode/numberformatter.h.
* New options for titlecasing:
- Sentence titlecasing and whole-string titlecasing without custom BreakIterator instances.
- The default index adjustment has been changed from "find first cased character" to "find first letter, number, or symbol"; a new option is available for selecting the previous adjustment behavior.
* Smaller data files for BreakIterator.
- Reverse rules no longer used: Easier updates, easier to conform to Unicode Standard.
- Old source rule files continue to work, reverse rules are ignored.
- Rule-based data files: 1.2MB→0.8MB.
ICU4C Specific Changes
* New API for direct-UTF-8 normalization.
- It also optionally records changes, for source-to-result index mapping and tracking of text metadata.
* More convenient case mapping API (StringPiece→ByteSink).
* ICU now handles ill-formed UTF-8 byte sequences as specified in the W3C Encoding Standard.
netbsd&linux do not have it (glibc had it, but removed in 2.26, and
was satisfied by locale.h always, if their release notes is to be believed)
this should cover BSDs other than netbsd, etc.
This was a glibc header, whereas locale.h is a POSIX one.
glibc went ahead and removed it in the new version.
change suggested by Thomas Orgis on tech-pkg but probably not applied
exactly.
* Emoji 5.0 data
* Includes bidi data files from Unicode 10 beta.
* Includes segmentation data files and rules from Unicode 10 beta and CLDR 31.0.1.
* Does not yet include the Emoji_Component property.
* Otherwise ICU 59 continues to use Unicode 9 data.
CLDR 31.0.1
* Including updates for emoji 5.0, for example local names for England, Scotland, and Wales.
* GMT and UTC are no longer unified, and CLDR provides distinct UTC display names, avoiding confusion with standard (winter) time in Britain.
* See the CLDR download page for other CLDR features and migration issues in CLDR v31.
New case mapping API (C++ & Java classes CaseMap) supports styled text.
Common Changes
* CLDR 30.0.3
* Time zone database version 2016j
* ICU SVN repository structure change. See the note on the Source Code Access page for more information.
ICU4C Fixes
* 12815 uspoof_getSkeleton sets backwards-incompatible illegal argument exception
* 12822 digitlist.cpp won't compile on msvc under Node.js
* 12825 uspoof_check goes into an "infinite loop" when U+30FB is in an input string
* 12832 GreekUpper::toUpper skips the final character on a non-terminated UTF-8 string
* 12849 u_strToTitle returns incorrect length if destination is NULL
* 12868 uprv_convertToPosix() Windows bug
* Fix regression with upstream patch,
https://ssl.icu-project.org/trac/ticket/12827
Changelog:
Common Changes
CLDR 30.0.2: For details of the many changes in CLDR, see CLDR 30. Some things to note:
For some combinations of numbering system (arab, arabext, latn) and/or locale (ar, fa, he), there were changes to the bidirectional control characters used with certain symbols (percent, minus, plus), and changes to number patterns (currency and/or percent, including addition of bidirectional control characters in some cases).
New in this release, the bidirectional controls used for such purposes include U+061C ARABIC LETTER MARK (ALM), which requires use of the bidirectional algorithm from Unicode 6.3 or later.
The time separator for Norwegian locales (nb, nn) was changed to be ':' throughout.
Unicode 9.0: Version 9.0 adds exactly 7,500 characters, for a total of 128,172 characters. These additions include six new scripts, 19 symbols for the new 4K TV standard, and 72 new emoji characters.
Draft Emoji 4.0 data
Emoji updates for word & line breaking. (#12664 & Unicode 9 update #12526)
UBiDiTransform/BidiTransform API for convenient transformation of text between different Bidi layouts. (#11679)
MeasureFormat API for measurement unit display names. (#12029)
Most COUNT and LIMIT enum constants have been deprecated. (#12420)
SpoofChecker: Handling of "whole script confusables" has been removed from ICU, in accordance with its removal from UTS #39 Version 9.0.0 and the removal of the corresponding Unicode data file. (#12549)
Greek uppercasing ("el" locale ID) removes most diacritics. (#5456)
More robust locale data loading across ICU implementation code.
Reduced heap memory usage in DateTimePatternGenerator. (#11782)
ICU4C Specific Changes
The layout engine code has been removed; the ParagraphLayout is not deprecated and remains (and must now be built on top of HarfBuzz). See http://userguide.icu-project.org/layoutengine (#12708)
Windows: Supports & requires Visual Studio 2015.
Changelog:
Common Changes
CLDR 29: For details of the many changes in CLDR, see CLDR 29.
Grapheme/word/line breaking for emoji sequences, based on Unicode 9 proposed rules. See the Unicode emoji break proposal and the Unicode Emoji Technical Report Proposed Update describing the new emoji sequences. (#12081).
Four new Unicode emoji properties (#11802).
DateFormat day period formatting of "noon", "at night", etc. via new pattern characters b & B, and DateTimePatternGenerator support of C for selecting the customary form (#11872).
Except: Formatting of "0:00 midnight" has been disabled because it is confusing except for at the end of an interval.
RelativeDateTimeFormatter: Simpler formatting API (#12072).
More robust CLDR data loading for MeasureFormat (#11986, #12030), RelativeDateTimeFormatter (#12018), and DateIntervalFormat/DateIntervalInfo (#12013).
New simple & fast SimpleFormatter class for a trivial subset of MessageFormat as used in CLDR data, e.g., "{0} {1}" (#10896).
ICU4C Specific Changes
C API support for RelativeDateTimeFormatter (#12072).
Clang annotations for intended switch case fallthroughs, can now compile with -Wimplicit-fallthrough (#12166).
Internal header files can be compiled by themselves, for simpler alternative build scripts (#12141).
Changelog:
Release Overview
The features for this release include support of CLDR 28 and Unicode 8.0.
For more details, including migration issues, see below.
Common Changes
CLDR 28: For details of the many changes in CLDR, see CLDR 28.
Unicode data updated to Unicode 8.0: 41 new emoji characters, 5,771 new ideographs for Chinese/Japanese/Korean, 6 new scripts, improved character properties data, etc.
ICU data size reduced by about 7.2% (1.8MB) via sharing string values across resource bundles. [#11537]
DateIntervalFormat now handles intervals with seconds, and sets FieldPosition more consistently. [#11706, #11726]
DateFormat::createInstanceForSkeleton() caches DateFormat patterns rather than DateTimePatternGenerator instances, for better performance (for cache hits) and lower heap memory consumption. [#11780]
StringSearch (based on collation) defaults to matches on normalization boundaries rather than grapheme cluster boundaries, which yields more matches on Indic text. [#11750]
RuleBasedNumberFormat (spelled-out numbers) now handles rounding (Java only), infinity, NaN. [#11653, #11760, #8223]
Most of the old Normalizer/unorm.h had been replaced by (and reimplemented via) Normalizer2, and is now deprecated. [#7303]
COLON has been withdrawn as a date pattern character corresponding to the date field [UDAT_]TIME_SEPARATOR_FIELD; there is currently no pattern character corresponding to that field. [#11773]
Support for locale key "cf" to specify currency format style, and interaction with NumberFormat values for UNumberFormatStyle: [#11787]
For NumberFormat style UNUM_CURRENCY / CURRENCYSTYLE, the default is "standard" currency style (typically using minus sign for negative numbers), but the new locale key "cf" may be used with values "standard" or "account" to specify currency format style ("account" indicates accounting style, often using parentheses for negative numbers).
For other NumberFormat styles, the locale key "cf" is ignored (they override the locale preference):
UNUM_CURRENCY_ISO / ISOCURRENCYSTYLE
UNUM_CURRENCY_PLURAL / PLURALCURRENCYSTYLE
UNUM_CURRENCY_ACCOUNTING / ACCOUNTINGCURRENCYSTYLE
UNUM_CASH_CURRENCY / CASHCURRENCYSTYLE
A new NumberFormat style is availble to explicitly specify standard style, ignoring the the locale key "cf"
UNUM_CURRENCY_STANDARD / STANDARDCURRENCYSTYLE
ICU4C Specific Changes
C API support for CompactDecimalFormat via UNumberFormatStyle additions: UNUM_DECIMAL_COMPACT_SHORT, UNUM_DECIMAL_COMPACT_LONG [#11693]
Larger UnicodeString object stores more characters inside the object without heap allocation; the UnicodeString object size is now build-time-configurable. [#11551]
On 64-bit machines, increase from object size 40 bytes with 15 internal UChars to a new default of 64 bytes with 27 UChars.
Some C++ classes now have swap() and moveFrom() methods, and support C++11 move semantics on compilers that support them. [#10086]
UnicodeString, LocalPointer, LocalArray
DecimalFormat code refactored to fix bugs, improve maintainability, and improve performance. [#10458]
New FilteredBreakIterator suppresses certain segment boundaries. For example, it can suppress the sentence boundary in the middle of "Mr. Smith". [#11248]
The internal, shared cache has been changed from unbounded to bounded. [#11767]
For [U]BreakIterator with type UBRK_SENTENCE, the locale key "ss" can now be used with value "standard" to specify that standard sentence break suppression data should be used, or with value "none" to indicate that no break suppression data should be used (the default). [#11770]
Collator: first-time startup time improved 20% due to precalculated unsafe-backward table [#11886]
A number of memory leaks and buffer overruns have been fixed based on static code analysis, mostly in data build tools
The features for this release include support of CLDR 27 (with a major cleanup of region locales, among many other improvements), formatting for scientific notation ("1.2 × 10³"), an update to Unicode 7.0 data for spoof-checking, narrow AM/PM markers ("7:45p"), and various performance enhancements. For C/C++, there are new methods for flexible dates ("Nov 10", or "Sept 2015"), named capture groups for regular expressions, formatting of compound units ("3.5 meters per second"), new C wrappers, and independent timezone resource loading. ICU4J has been improved and tested for using ICU4C data and for running on Android.
Unicode 6.3: New bidi control codes, new Bidi_Class property values, two new bidi "bracket" properties; for other property value changes see the UAX 44 summary.
The bidi algorithm implementation has also been updated to support the new properties and to match the updated algorithm in the Unicode 6.3 version of UAX 9.
Note: ICU 52 still uses collation root data based on Unicode Collation Algorithm 6.2 (UCA 6.2). (However, ICU 52 does use CLDR 24 collation tailoring data.)
CLDR 24: Improved coverage for top 70+ languages, fractional plural rules and forms, many new measurement units, major simplification of collation rule syntax, preliminary version of European Ordering Rules, new relative fields such as “last Sunday” and “now”, and much more.
Time zone data: 2013g.
Support new variants of Islamic calendar:
"islamic-umalqura": Umm al-Qura.
"islamic-tbla": Tabular (fixed intercalary years), with astronomical epoch.
Made Calendar getDayOfWeekType behave as documented.
New API for converting between Windows time zone ID and IANA tz database ID.
Technology Preview: New API for more granular control of DateFormat parse leniency.
DateTimePatternGenerator:
Support recently-added time zone pattern characters O, X, x and updated support for V, Z.
Support newly-defined skeleton character ‘J’ to generate preferred hour cycle without any day period indicator (such as AM/PM for h).
Implement support for plurals that depend on displayed fractional values.
MessageFormat and currency formatting etc. select appropriate plural forms for values with decimal digits (after the decimal point).
Segmentation:
Add dictionary-based word & line break for Lao.
As "DSO_LIBDIR" is now always set (and must be always set because of all
the changes that refer to it) we cannot use it to check for the Cygwin
case anymore. Instead check whether "OPSYS" is (not) equal to "Cygwin".
Common Changes
==============
CLDR 23: Collation tailorings put native script first; non-Gregorian calendar formats are more consistent; much improved data for Armenian (hy), Georgian (ka), Mongolian (mn), and Welsh (cy); …
Time zone data: 2013b
Date format/parse now supports CLDR short weekday names ("EEEEEE", "cccccc").
Support DisplayContext for date formatting, locale display names.
DateTimePatternGenerator behavior is now much more consistent between C and J.
Support new timezone pattern characters in LDML spec: X+, x+, O, OOOO, V, VV, VVV.
Updated SpoofChecker for v5 of UTS39.
AlphabeticIndex enhancements:
New thread-safe ImmutableIndex sub-API
Build an index for a custom Collator.
Make data-driven for Chinese collations.
New API for CLDR script metadata.
ICU4C Specific Changes
======================
Support for “dangi” Korean luni-solar calendar (already in ICU4J).
Add CompactDecimalFormat (already in ICU4J).
Add TerritoryContainment APIs (already in ICU4J).
UnicodeString default constructor and destructor now inline.
Layout engine now supports 'morx' tables.
Fixed some ICU 50 regressions:
Affixes set with e.g. DecimalFormat::setPositivePrefix were ignored for parse.
UNUM_PARSE_INT_ONLY no longer handled grouping separator.
Add ucal_getTimeZoneID.
The C++ AlphabeticIndex implementation is now on par with Java, including full support for all Chinese collation tailorings.
U8_NEXT() and similar low-level macros now support NUL-terminated UTF-8 strings.
New macros like U8_NEXT_OR_FFFD() return U+FFFD for an ill-formed sequence.
Conversion: New "good one-way" mapping type, for example for Variation Selector sequences.
* Unicode 6.2: Turkish Lira Sign, improved word & line segmentation (BreakIterator) for symbols
* CLDR 22.1: Data coverage & quality improved across all major languages; new short width type for weekday names; new zhuyin (Bopomofo) collation for Chinese; improved data for CompactDecimalFormat & RBNF
* Time zone data: 2012h
* Ordinal-number support in MessageFormat & PluralRules
* Deprecate setLocale(locale) in PluralFormat
* Dictionary-based break iterators (word segmentation):
* Support Chinese & Japanese, use more compact dictionary format, port all but Khmer support to Java
* Update Khmer dictionary
* Change Java util.ListFormat to text.ListFormatter and other updates, use CLDR data, port to C++
* Add updated IBM-eucJP and IBM-5233 converter
* Improve number formatting performance
* C++ GenderInfo: Effective combined gender of a list of people's genders (ported from Java)
* Thread safety support cannot be removed (see the Readme)
* Default compilers: Clang is now used if available (see the Readme)
* C++ Collator API cleanup, subclassing-API-breaking changes (see the Readme)
* Add option to genrb tool for writing java resource bundle files
* Time zone format APIs
* 9242 ICU4C fails to parse pattern containing EEE properly whilst ICU4J parses it successfully
* 9258 Number format performance
* 9283 uregex_open fails for look-behind assertion + case-insensitive
* 9284 Date format roundtrip test failure
* 9295 HPPA endianness detection
* 9313 Problem building ICU4C with Cygwin/MSVC
* 9332 Linux s390 endianness detection
* 9336 Problem building ICU4C 49.1.1 on zOS
* Unicode 6.1: New scripts & blocks; changes to grapheme break & line break
property values; some characters change from symbol to Po or No; etc.
* CLDR 21.0.1: Changes in segmentation data to match Unicode 6.1; new structures
for support of Chinese calendar, for context-dependent capitalization, for
gender of lists of people, for ordinal categories, and for multiple number
systems per locale; deprecation of "commonlyUsed" element in timezone names;
removal of "whole-locale" aliases; major cleanups of timezone names,
delimiter data, abbreviated number data.
* Normalizer2 API additions
* Easier-to-use getInstance() variants; e.g., getNFDInstance()
* Getter for the combining-class value for a code point
* Getter for the raw Decomposition_Mapping
* Pairwise composition
* TimeZone class: (C++) Getter for unknown time zone, (Java) fields for GMT &
unknown zone
* Support for deprecation of the "commonlyUsed" element for CLDR metazones
* DateTimePatternGenerator can now use separate patterns for skeletons that
differ only in MMM vs MMMM or EEE vs EEEE, etc.
* Support for custom DecimalFormatSymbols in RuleBasedNumberFormat
* Format and parse Chinese calendar dates including support for intercalary
months
* Context Transforms for context-dependent capitalization behavior
* APIs for TimeZoneNames and TimeZoneFormat
* Support for new date format pattern "ZZZZZ" for ISO 8601 zone format
* Options for ambiguous local time resolution in Calendar
* Support for ISO 4217 numeric currency code
* CLDR 2.0: The CLDR 2.0 release contains numerous improvements and bug fixes
approved by the CLDR committee, including much additional data for many
languages.
* Explicit parent locale support in data imported from CLDR.
* MessageFormat and related classes (choice/plural/select) have been
reimplemented, with several improvements and some incompatible changes.
* Extended PluralFormat pattern syntax supports explicit-value forms and
offsets.
* Utility APIs in PluralRules (get some/all/unique keyword values)
* Time zone API to return a list of available canonical system time zone IDs.
* Time zone API to return a region.
* Collation: Full implementation & public API for script reordering
* Dictionary-type trie
* GB18030-2005 update