pkgsrc/textproc/icu/DESCR

The International Components for Unicode(ICU) is a C and C++ library that
provides robust and full-featured Unicode support on a wide variety of
platforms. The library provides:

- Calendar support
- Character set conversions
- Collation (language-sensitive)
- Date & time formatting
- Locales (140+ supported)
- Message catalogs (resources)
- Message formatting
- Normalization
- Number & currency formatting
- Time zones
- Transliteration
- Word, line & sentence breaks
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/. 2003-03-22 00:44:05 +01:00			`The International Components for Unicode(ICU) is a C and C++ library that`
			`provides robust and full-featured Unicode support on a wide variety of`
			`platforms. The library provides:`
Import of new ICU package: Robust and full-featured unicode support 2000-12-20 19:27:59 +01:00
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/. 2003-03-22 00:44:05 +01:00			`- Calendar support`
			`- Character set conversions`
			`- Collation (language-sensitive)`
			`- Date & time formatting`
			`- Locales (140+ supported)`
Drop trailing whitespace. Ok'ed by wiz. 2003-05-06 19:40:18 +02:00			`- Message catalogs (resources)`
			`- Message formatting`
			`- Normalization`
Update to version 2.4. Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me. - follow PKG_SYSCONFDIR List of major changes for this release: * Regular Expressions Phase 1 ICU 2.4 introduces a Regular Expression C++ API that is modeled after the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode level 1 regular expressions (see Unicode Regular Expression Guidelines) but not all pattern metacharacters and features are supported yet. Regular expressions leverage all of the UnicodeSet support, including all Unicode 3.2 property names and property value names. Future ICU releases will complete the pattern support, add support for higher Unicode regex levels, and improve performance. For more details see the API References and the User Guide. * Modularized ICU library building ICU 2.4 provides build-time switches to prune parts of the library code, for smaller custom distributions. For details see the readme file. * Character set alias management support Additional APIs map alias+standard to a unique charset name (e.g., "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset names in the alias table, not just the installed ones. See convrtrs.txt and ucnv.h. These APIs allow programmers to avoid data corruption problems when different platforms use the same names for different character conversion mappings. * EBCDIC-z/OS converter option The EBCDIC converter now handles swapped LF/NL mappings algorithmically instead of with modified .ucm/.cnv conversion table files. This makes this behavior available for all supported EBCDIC conversions without adding to the data package size. See "swaplfnl" in convrtrs.txt. * Additional converter A new converter implementation has been added for the encoding of IMAP mailbox names. See RFC 2060/5.1.3. Mailbox International Naming Convention and "IMAP-mailbox-name" in convrtrs.txt. * Customizable break iteration ICU 2.4 allows registration of a BreakIterator with a locale ID. This allows applications to provide more sophisticated word/sentence break engines and use them seamlessly with the ICU APIs. In future releases, this registration mechanism will be extended to all relevant ICU services. If you are interested in ICU customization, please try out this feature. * Collation performance ICU 2.4 collation was improved in several areas, with an emphasis on performance: * Latin-1: Improved performance of u_strcoll(). * Russian/Cyrillic: Improved performance by tailoring collation for cyrillic-script languages, removing UCA contractions that are not used for modern Russian (this uses the [suppressContractions] tailoring option). * Korean: Improved performance by resolving collation elements for modern Hangul syllables at build time (this uses the [optimize] tailoring option). * Japanese: The default strength for Japanese was reduced from quaternary to tertiary as in all other locales. * UnicodeSet performance UnicodeSet performance is significantly improved, especially for add(codePoint) and contains(codePoint). * Unicode property aliases ICU 2.4 introduces APIs for mapping between all appropriate Unicode property aliases and property value aliases and ICU property enumeration constants. See u_getPropertyName() etc. in uchar.h. * Unicode string functions * There are new C functions for searching for last occurrences of characters and partial strings. See u_strrstr(), u_strrchr32() etc. * New C/C++/Java functions for efficient checking if a string contains more than a certain number of code points. See hasMoreChar32Than(). * Copying UnicodeStrings via the standard assignment operator and copy constructor does not preserve readonly aliasing any more because this can sometimes have unexpected and dangerous effects. A new fastCopyFrom() member function provides the old copy semantics. See Jitterbug 1794 for more details. * UTF macros simplified The low-level C macros for handling code points in 8-bit and 16-bit Unicode strings have been replaced by a simpler, more consistent set with more concise names. For details see utf_old.h and utf.h. Similarly, ICU 2.4 defines the UChar32 consistently (now always as int32_t) and adds a U_SENTINEL non-code point value for new APIs. * Performance tests ICU 2.4 has a new performance test framework and additional performance tests using this framework. This is not currently documented, but it is available as part of the source distribution at source/test/perf/. 2003-03-22 00:44:05 +01:00			`- Number & currency formatting`
			`- Time zones`
			`- Transliteration`
			`- Word, line & sentence breaks`