pkgsrc/textproc/icu/PLIST

207 lines
5.1 KiB
Text
Raw Normal View History

@comment $NetBSD: PLIST,v 1.17 2009/07/25 13:02:05 jdolecek Exp $
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
bin/derb
bin/genbrk
bin/gencnval
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
bin/genctd
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
bin/genrb
bin/icu-config
bin/makeconv
bin/pkgdata
bin/uconv
include/layout/LEFontInstance.h
include/layout/LEGlyphFilter.h
update to icu-3.0 major changes: ICU 3.0 includes the latest bug fixes, locale/charset updates, and performance/build/porting enhancements. - Collation Collation data is in a separate data tree, allowing for easier modularization and maintenance. getFunctionalEquivalent API allows for better caching and UI support. - Unicode 4.0.1 ICU is updated to the latest version of Unicode standard, which had significant property changes. - CLDR 1.1 Updates to CLDR 1.1, with many updates to locale data, and special emphasis on collation data. - Formatting As an aid to migration of traditional C (stdio) and C++ (iostream) formatting, the POSIX-like input/output library, icuio, is officially supported. Significant digits now supported in DecimalFormat, for general use and %g support. - RFC822 time zone format support in DateFormat for compatibility. - Currency formatting/parsing improvements Allows parsing multiple currencies with one formatter, without knowing the currency in advance. Much cleaner design allowing extensibility to other measurement units in the future. - Regular expressions (C) The regular expressions framework now features a C API, instead of just C++. - Locales Locale canonicalization spec defined and implemented. Provides interoperability with POSIX and .NET locale IDs, more RFC 3066 support. - Layout engine Layout engine now supports using different canonically-equivalent Unicode forms of the same text: e.g. a + ´ or á. This is especially important for non-Latin scripts. - Build Environment ICU can now build its data library much faster on most platforms. For a complete list see: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html?tag=release-3-0
2004-06-26 22:18:50 +02:00
include/layout/LEGlyphStorage.h
include/layout/LEInsertionList.h
include/layout/LELanguages.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/layout/LEScripts.h
include/layout/LESwaps.h
include/layout/LETypes.h
include/layout/LayoutEngine.h
include/layout/ParagraphLayout.h
include/layout/RunArrays.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/layout/loengine.h
include/layout/playout.h
include/layout/plruns.h
include/unicode/basictz.h
include/unicode/bms.h
include/unicode/bmsearch.h
include/unicode/brkiter.h
include/unicode/calendar.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/caniter.h
include/unicode/chariter.h
include/unicode/choicfmt.h
include/unicode/coleitr.h
include/unicode/coll.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/colldata.h
update to icu-3.0 major changes: ICU 3.0 includes the latest bug fixes, locale/charset updates, and performance/build/porting enhancements. - Collation Collation data is in a separate data tree, allowing for easier modularization and maintenance. getFunctionalEquivalent API allows for better caching and UI support. - Unicode 4.0.1 ICU is updated to the latest version of Unicode standard, which had significant property changes. - CLDR 1.1 Updates to CLDR 1.1, with many updates to locale data, and special emphasis on collation data. - Formatting As an aid to migration of traditional C (stdio) and C++ (iostream) formatting, the POSIX-like input/output library, icuio, is officially supported. Significant digits now supported in DecimalFormat, for general use and %g support. - RFC822 time zone format support in DateFormat for compatibility. - Currency formatting/parsing improvements Allows parsing multiple currencies with one formatter, without knowing the currency in advance. Much cleaner design allowing extensibility to other measurement units in the future. - Regular expressions (C) The regular expressions framework now features a C API, instead of just C++. - Locales Locale canonicalization spec defined and implemented. Provides interoperability with POSIX and .NET locale IDs, more RFC 3066 support. - Layout engine Layout engine now supports using different canonically-equivalent Unicode forms of the same text: e.g. a + ´ or á. This is especially important for non-Latin scripts. - Build Environment ICU can now build its data library much faster on most platforms. For a complete list see: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html?tag=release-3-0
2004-06-26 22:18:50 +02:00
include/unicode/curramt.h
include/unicode/currunit.h
include/unicode/datefmt.h
include/unicode/dbbi.h
include/unicode/dcfmtsym.h
include/unicode/decimfmt.h
include/unicode/docmain.h
include/unicode/dtfmtsym.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/dtintrv.h
include/unicode/dtitvfmt.h
include/unicode/dtitvinf.h
include/unicode/dtptngen.h
include/unicode/dtrule.h
include/unicode/fieldpos.h
include/unicode/fmtable.h
include/unicode/format.h
include/unicode/gregocal.h
include/unicode/locid.h
update to icu-3.0 major changes: ICU 3.0 includes the latest bug fixes, locale/charset updates, and performance/build/porting enhancements. - Collation Collation data is in a separate data tree, allowing for easier modularization and maintenance. getFunctionalEquivalent API allows for better caching and UI support. - Unicode 4.0.1 ICU is updated to the latest version of Unicode standard, which had significant property changes. - CLDR 1.1 Updates to CLDR 1.1, with many updates to locale data, and special emphasis on collation data. - Formatting As an aid to migration of traditional C (stdio) and C++ (iostream) formatting, the POSIX-like input/output library, icuio, is officially supported. Significant digits now supported in DecimalFormat, for general use and %g support. - RFC822 time zone format support in DateFormat for compatibility. - Currency formatting/parsing improvements Allows parsing multiple currencies with one formatter, without knowing the currency in advance. Much cleaner design allowing extensibility to other measurement units in the future. - Regular expressions (C) The regular expressions framework now features a C API, instead of just C++. - Locales Locale canonicalization spec defined and implemented. Provides interoperability with POSIX and .NET locale IDs, more RFC 3066 support. - Layout engine Layout engine now supports using different canonically-equivalent Unicode forms of the same text: e.g. a + ´ or á. This is especially important for non-Latin scripts. - Build Environment ICU can now build its data library much faster on most platforms. For a complete list see: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html?tag=release-3-0
2004-06-26 22:18:50 +02:00
include/unicode/measfmt.h
include/unicode/measunit.h
include/unicode/measure.h
include/unicode/msgfmt.h
include/unicode/normlzr.h
include/unicode/numfmt.h
include/unicode/parseerr.h
include/unicode/parsepos.h
include/unicode/platform.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/plurfmt.h
include/unicode/plurrule.h
include/unicode/ppalmos.h
include/unicode/putil.h
include/unicode/pwin32.h
include/unicode/rbbi.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/rbnf.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/rbtz.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/regex.h
include/unicode/rep.h
include/unicode/resbund.h
include/unicode/schriter.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/search.h
include/unicode/simpletz.h
include/unicode/smpdtfmt.h
include/unicode/sortkey.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/strenum.h
include/unicode/stsearch.h
include/unicode/symtable.h
include/unicode/tblcoll.h
include/unicode/timezone.h
include/unicode/translit.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/tzrule.h
include/unicode/tztrans.h
include/unicode/ubidi.h
include/unicode/ubrk.h
include/unicode/ucal.h
include/unicode/ucasemap.h
include/unicode/ucat.h
include/unicode/uchar.h
include/unicode/uchriter.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/uclean.h
include/unicode/ucnv.h
include/unicode/ucnv_cb.h
include/unicode/ucnv_err.h
include/unicode/ucol.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/ucoleitr.h
include/unicode/uconfig.h
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
include/unicode/ucsdet.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/ucurr.h
include/unicode/udat.h
include/unicode/udata.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/udatpg.h
update to icu-3.0 major changes: ICU 3.0 includes the latest bug fixes, locale/charset updates, and performance/build/porting enhancements. - Collation Collation data is in a separate data tree, allowing for easier modularization and maintenance. getFunctionalEquivalent API allows for better caching and UI support. - Unicode 4.0.1 ICU is updated to the latest version of Unicode standard, which had significant property changes. - CLDR 1.1 Updates to CLDR 1.1, with many updates to locale data, and special emphasis on collation data. - Formatting As an aid to migration of traditional C (stdio) and C++ (iostream) formatting, the POSIX-like input/output library, icuio, is officially supported. Significant digits now supported in DecimalFormat, for general use and %g support. - RFC822 time zone format support in DateFormat for compatibility. - Currency formatting/parsing improvements Allows parsing multiple currencies with one formatter, without knowing the currency in advance. Much cleaner design allowing extensibility to other measurement units in the future. - Regular expressions (C) The regular expressions framework now features a C API, instead of just C++. - Locales Locale canonicalization spec defined and implemented. Provides interoperability with POSIX and .NET locale IDs, more RFC 3066 support. - Layout engine Layout engine now supports using different canonically-equivalent Unicode forms of the same text: e.g. a + ´ or á. This is especially important for non-Latin scripts. - Build Environment ICU can now build its data library much faster on most platforms. For a complete list see: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html?tag=release-3-0
2004-06-26 22:18:50 +02:00
include/unicode/udeprctd.h
include/unicode/udraft.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/uenum.h
include/unicode/uidna.h
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
include/unicode/uintrnal.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/uiter.h
include/unicode/uloc.h
include/unicode/ulocdata.h
include/unicode/umachine.h
include/unicode/umisc.h
include/unicode/umsg.h
include/unicode/unifilt.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/unifunct.h
include/unicode/unimatch.h
include/unicode/unirepl.h
include/unicode/uniset.h
include/unicode/unistr.h
include/unicode/unorm.h
include/unicode/unum.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/uobject.h
update to icu-3.0 major changes: ICU 3.0 includes the latest bug fixes, locale/charset updates, and performance/build/porting enhancements. - Collation Collation data is in a separate data tree, allowing for easier modularization and maintenance. getFunctionalEquivalent API allows for better caching and UI support. - Unicode 4.0.1 ICU is updated to the latest version of Unicode standard, which had significant property changes. - CLDR 1.1 Updates to CLDR 1.1, with many updates to locale data, and special emphasis on collation data. - Formatting As an aid to migration of traditional C (stdio) and C++ (iostream) formatting, the POSIX-like input/output library, icuio, is officially supported. Significant digits now supported in DecimalFormat, for general use and %g support. - RFC822 time zone format support in DateFormat for compatibility. - Currency formatting/parsing improvements Allows parsing multiple currencies with one formatter, without knowing the currency in advance. Much cleaner design allowing extensibility to other measurement units in the future. - Regular expressions (C) The regular expressions framework now features a C API, instead of just C++. - Locales Locale canonicalization spec defined and implemented. Provides interoperability with POSIX and .NET locale IDs, more RFC 3066 support. - Layout engine Layout engine now supports using different canonically-equivalent Unicode forms of the same text: e.g. a + ´ or á. This is especially important for non-Latin scripts. - Build Environment ICU can now build its data library much faster on most platforms. For a complete list see: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html?tag=release-3-0
2004-06-26 22:18:50 +02:00
include/unicode/uobslete.h
include/unicode/uregex.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/urename.h
include/unicode/urep.h
include/unicode/ures.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/uscript.h
include/unicode/usearch.h
include/unicode/uset.h
include/unicode/usetiter.h
include/unicode/ushape.h
include/unicode/usprep.h
include/unicode/ustdio.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/ustream.h
include/unicode/ustring.h
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
include/unicode/usystem.h
include/unicode/utext.h
include/unicode/utf.h
include/unicode/utf16.h
include/unicode/utf32.h
include/unicode/utf8.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/utf_old.h
include/unicode/utmscale.h
include/unicode/utrace.h
include/unicode/utrans.h
include/unicode/utypes.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
include/unicode/uversion.h
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
include/unicode/vtzone.h
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
lib/icu/${PKGVERSION}/Makefile.inc
lib/icu/Makefile.inc
lib/icu/current
lib/libicudata${SO_EXT}${SO_SUFFIX}
lib/libicudata${SO_EXT}.40${SO_SUFFIX}
lib/libicudata${SO_EXT}.40.1${SO_SUFFIX}
lib/libicui18n${SO_EXT}${SO_SUFFIX}
lib/libicui18n${SO_EXT}.40${SO_SUFFIX}
lib/libicui18n${SO_EXT}.40.1${SO_SUFFIX}
lib/libicuio${SO_EXT}${SO_SUFFIX}
lib/libicuio${SO_EXT}.40${SO_SUFFIX}
lib/libicuio${SO_EXT}.40.1${SO_SUFFIX}
lib/libicule${SO_EXT}${SO_SUFFIX}
lib/libicule${SO_EXT}.40${SO_SUFFIX}
lib/libicule${SO_EXT}.40.1${SO_SUFFIX}
lib/libiculx${SO_EXT}${SO_SUFFIX}
lib/libiculx${SO_EXT}.40${SO_SUFFIX}
lib/libiculx${SO_EXT}.40.1${SO_SUFFIX}
lib/libicutu${SO_EXT}${SO_SUFFIX}
lib/libicutu${SO_EXT}.40${SO_SUFFIX}
lib/libicutu${SO_EXT}.40.1${SO_SUFFIX}
lib/libicuuc${SO_EXT}${SO_SUFFIX}
lib/libicuuc${SO_EXT}.40${SO_SUFFIX}
lib/libicuuc${SO_EXT}.40.1${SO_SUFFIX}
lib/libsicudata.a
lib/libsicui18n.a
lib/libsicuio.a
lib/libsicule.a
lib/libsiculx.a
lib/libsicuuc.a
update to icu-3.0 major changes: ICU 3.0 includes the latest bug fixes, locale/charset updates, and performance/build/porting enhancements. - Collation Collation data is in a separate data tree, allowing for easier modularization and maintenance. getFunctionalEquivalent API allows for better caching and UI support. - Unicode 4.0.1 ICU is updated to the latest version of Unicode standard, which had significant property changes. - CLDR 1.1 Updates to CLDR 1.1, with many updates to locale data, and special emphasis on collation data. - Formatting As an aid to migration of traditional C (stdio) and C++ (iostream) formatting, the POSIX-like input/output library, icuio, is officially supported. Significant digits now supported in DecimalFormat, for general use and %g support. - RFC822 time zone format support in DateFormat for compatibility. - Currency formatting/parsing improvements Allows parsing multiple currencies with one formatter, without knowing the currency in advance. Much cleaner design allowing extensibility to other measurement units in the future. - Regular expressions (C) The regular expressions framework now features a C API, instead of just C++. - Locales Locale canonicalization spec defined and implemented. Provides interoperability with POSIX and .NET locale IDs, more RFC 3066 support. - Layout engine Layout engine now supports using different canonically-equivalent Unicode forms of the same text: e.g. a + ´ or á. This is especially important for non-Latin scripts. - Build Environment ICU can now build its data library much faster on most platforms. For a complete list see: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html?tag=release-3-0
2004-06-26 22:18:50 +02:00
man/man1/derb.1
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
man/man1/genbrk.1
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
man/man1/gencnval.1
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
man/man1/genctd.1
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
man/man1/genrb.1
man/man1/icu-config.1
man/man1/makeconv.1
man/man1/pkgdata.1
man/man1/uconv.1
man/man8/genccode.8
man/man8/gencmn.8
man/man8/gensprep.8
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
man/man8/genuca.8
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
man/man8/icupkg.8
sbin/genccode
sbin/gencmn
sbin/gensprep
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
sbin/genuca
update to ICU 3.6 Major changes in ICU 3.6 include the following: - Unicode: ICU uses and supports Unicode 5.0, which is the latest major release of Unicode. Unicode 5.0 will be used in many operating systems and applications, and this version of ICU is important maintain interoperability with these new operating systems and applications. More information about Unicode 5.0 can be found in the Unicode press release. - Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.4, which includes many improvements in quality and quantity of data. There is 25% more CLDR locale data in 245 locales in ICU. - ICU4C Specific Changes - Charset Detection: A charset detection framework was added, which provides heuristics for detecting the charset for unlabeled sequences of bytes. - Layout: The font layout engine has support added for Tibetan, Sinhala and Old Hangul. - BiDi: The BiDi algorithm was enhanced to be more flexible and efficient - ICU Data Management: The new icupkg tool provides an easier way to manage ICU's data library. This tool allows you to add, update or remove data from ICU's data archive. - Time Zones The time zone data is modularized to allow easier building and updating of the data. - Word Boundaries: The Thai word break iteration was improved to be more accurate. Also dictionary based detection of Thai word boundaries is now active for all locales. - UText - The BreakIterator uses UText for abstract text processing. - 64-bit indexing is now used to allow access to larger chunks of text. - API for read-only locking for security and robustness was added. - Performance - The u_sprintf/u_sscanf performance from the icuio library has been improved for number formatting/parsing. - Constructing a DateFormat is significantly faster for many locales. - Opening and closing a charset converter is significantly faster. - The UTF-8 transformation functions and macros are faster. - The UText API was improved for performance. - The collation open and close functions have a small performance improvement.
2007-03-23 13:51:13 +01:00
sbin/icupkg
sbin/icuswap
share/icu/${PKGVERSION}/config/${MH_NAME}
Update from version 3.6nb2 to 4.0.1. Pkgsrc changes: o New MASTER_SITE o Adjust PLIST o Remove no-longer-needed patches, since corresponding changes have been adopted upstream o BUILDLINK_ABI_DEPENDS bumped to >=4.0, since a new shared library version is installed o Fixes security vulnerability, ref. below. Dependent pkgsrc packages will have their revisions bumped shortly due to the (possibly/probably) changed ABI. Upstream changes: 4.0.1: ICU4C 4.0.1 is a maintenance release of ICU4J 4.0. The primary changes of this release were: * Updated time zone data to 2008i * Technical preview of string search implementation using Boyer-Moore algorithm (#6286). For detail information, please see the tech note here. * #5691 Conversion: consistent illegal sequences * #6435 Bad @stable ICU4.0 tags * #6597 TestDisplayNamesMeta failure * #6670 Test failure in format/TimeZoneTest/TestShortZoneIDs 4.0: Major changes in ICU 4.0 include the following: * Common Changes o Unicode 5.1 (#5696) o Locale Data: ICU uses and supports data from Common Locale Data Repository (CLDR) 1.6 , which includes many improvements in quality and quantity of data. o add/removeLikelySubtags (#6124) o Charset converter file size improvement (#5987) o Date Interval Formatting (#6157) Note: Calendar type supported by this feature is Gregorian only in this release. o Improved Plural support * ICU4C Specific Changes Additional Calendars + Chinese (#4081) + Coptic/Ethiopic (#4571) * ICU4J Specific Changes o Charset + Graduated from Technology Preview status + ICU2022 Converter (#5791) + HZ Converter (#6128) + SCSU/BOCU-1 Converter (#2147) + Charset Converter Callback (#6144) o Thai Dictionary break iterator (#5385) o JDK TimeZone support (#5975) o Locale Service Provider (#5976) o More convenient formatting of year+month, day+month, and other combinations (#6304) o Simple Duration Formatting (#6303) * ICU4C Security Fixes ICU4C 4.0 resolves the vulnerabilities CVE-2007-4770 and CVE-2007-4771 which were found in earlier versions of ICU. The standard ICU tests verify that these have been corrected, however, the updated versions of the previous tests may be run by applying the following patch to ICU 4.0: r24324. As well, ICU4C and ICU4J 4.0 resolve the issue underlying CVE-2008-1036.
2009-03-25 23:30:19 +01:00
share/icu/${PKGVERSION}/install-sh
share/icu/${PKGVERSION}/license.html
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/.
2003-03-22 00:44:05 +01:00
share/icu/${PKGVERSION}/mkinstalldirs