e97a889232
what is new for perl v5.30.0 Core Enhancements Limited variable length lookbehind in regular expression pattern matching is now experimentally supported Using a lookbehind assertion (like "(?<=foo?)" or "(?<!ba{1,9}r)" previously would generate an error and refuse to compile. Now it compiles (if the maximum lookbehind is at most 255 characters), but raises a warning in the new "experimental::vlb" warnings category. This is to caution you that the precise behavior is subject to change based on feedback from use in the field. See "(?<=pattern)" in perlre and "(?<!pattern)" in perlre. The upper limit "n" specifiable in a regular expression quantifier of the form "{m,n}" has been doubled to 65534 The meaning of an unbounded upper quantifier "{m,}" remains unchanged. It matches 2**31 - 1 times on most platforms, and more on ones where a C language short variable is more than 4 bytes long. Unicode 12.1 is supported Because of a change in Unicode release cycles, Perl jumps from Unicode 10.0 in Perl 5.28 to Unicode 12.1 in Perl 5.30. For details on the Unicode changes, see <https://www.unicode.org/versions/Unicode11.0.0/> for 11.0; <https://www.unicode.org/versions/Unicode12.0.0/> for 12.0; and <https://www.unicode.org/versions/Unicode12.1.0/> for 12.1. (Unicode 12.1 differs from 12.0 only in the addition of a single character, that for the new Japanese era name.) The Word_Break property, as in past Perl releases, remains tailored to behave more in line with expectations of Perl users. This means that sequential runs of horizontal white space characters are not broken apart, but kept as a single run. Unicode 11 changed from past versions to be more in line with Perl, but it left several white space characters as causing breaks: TAB, NO BREAK SPACE, and FIGURE SPACE (U+2007). We have decided to continue to use the previous Perl tailoring with regards to these. Wildcards in Unicode property value specifications are now partially supported You can now do something like this in a regular expression pattern qr! \p{nv= /(?x) \A [0-5] \z / }! which matches all Unicode code points whose numeric value is between 0 and 5 inclusive. So, it could match the Thai or Bengali digits whose numeric values are 0, 1, 2, 3, 4, or 5. This marks another step in implementing the regular expression features the Unicode Consortium suggests. Most properties are supported, with the remainder planned for 5.32. Details are in "Wildcards in Property Values" in perlunicode. qr'\N{name}' is now supported Previously it was an error to evaluate a named character "\N{...}" within a single quoted regular expression pattern (whose evaluation is deferred from the normal place). This restriction is now removed. Turkic UTF-8 locales are now seamlessly supported Turkic languages have different casing rules than other languages for the characters "i" and "I". The uppercase of "i" is LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130); and the lowercase of "I" is LATIN SMALL LETTER DOTLESS I (U+0131). Unicode furnishes alternate casing rules for use with Turkic languages. Previously, Perl ignored these, but now, it uses them when it detects that it is operating under a Turkic UTF-8 locale. It is now possible to compile perl to always use thread-safe locale operations. Previously, these calls were only used when the perl was compiled to be multi-threaded. To always enable them, add -Accflags='-DUSE_THREAD_SAFE_LOCALE' to your Configure flags. Eliminate opASSIGN macro usage from core This macro is still defined but no longer used in core "-Drv" now means something on "-DDEBUGGING" builds Now, adding the verbose flag ("-Dv") to the "-Dr" flag turns on all possible regular expression debugging. Incompatible Changes Assigning non-zero to $[ is fatal Setting $[ to a non-zero value has been deprecated since Perl 5.12 and now throws a fatal error. See "Assigning non-zero to $[ is fatal" in perldeprecation. Delimiters must now be graphemes See "Use of unassigned code point or non-standalone grapheme for a delimiter." in perldeprecation Some formerly deprecated uses of an unescaped left brace "{" in regular expression patterns are now illegal But to avoid breaking code unnecessarily, most instances that issued a deprecation warning, remain legal and now have a non-deprecation warning raised. See "Unescaped left braces in regular expressions" in perldeprecation. Previously deprecated sysread()/syswrite() on :utf8 handles is now fatal Calling sysread(), syswrite(), send() or recv() on a ":utf8" handle, whether applied explicitly or implicitly, is now fatal. This was deprecated in perl 5.24. There were two problems with calling these functions on ":utf8" handles: o All four functions only paid attention to the ":utf8" flag. Other layers were completely ignored, so a handle with ":encoding(UTF-16LE)" layer would be treated as UTF-8. Other layers, such as compression are completely ignored with or without the ":utf8" flag. o sysread() and recv() would read from the handle, skipping any validation by the layers, and do no validation of their own. This could lead to invalidly encoded perl scalars. my() in false conditional prohibited Declarations such as "my $x if 0" are no longer permitted. Fatalize $* and $# These special variables, long deprecated, now throw exceptions when used. Fatalize unqualified use of dump() The "dump()" function, long discouraged, may no longer be used unless it is fully qualified, i.e., "CORE::dump()". Remove File::Glob::glob() The "File::Glob::glob()" function, long deprecated, has been removed and now throws an exception which advises use of "File::Glob::bsd_glob()" instead. "pack()" no longer can return malformed UTF-8 It croaks if it would otherwise return a UTF-8 string that contains malformed UTF-8. This protects against potential security threats. This is considered a bug fix as well. Any set of digits in the Common script are legal in a script run of another script There are several sets of digits in the Common script. "[0-9]" is the most familiar. But there are also "[\x{FF10}-\x{FF19}]" (FULLWIDTH DIGIT ZERO - FULLWIDTH DIGIT NINE), and several sets for use in mathematical notation, such as the MATHEMATICAL DOUBLE-STRUCK DIGITs. Any of these sets should be able to appear in script runs of, say, Greek. But the design of 5.30 overlooked all but the ASCII digits "[0-9]", so the design was flawed. This has been fixed, so is both a bug fix and an incompatibility. All digits in a run still have to come from the same set of ten digits. JSON::PP enables allow_nonref by default As JSON::XS 4.0 changed its policy and enabled allow_nonref by default, JSON::PP also enabled allow_nonref by default. Deprecations In XS code, use of various macros dealing with UTF-8. This deprecation was scheduled to become fatal in 5.30, but has been delayed to 5.32 due to problems that showed up with some CPAN modules. For details of what's affected, see perldeprecation. Performance Enhancements o Translating from UTF-8 into the code point it represents now is done via a deterministic finite automaton, speeding it up. As a typical example, "ord("\x7fff")" now requires 12% fewer instructions than before. The performance of checking that a sequence of bytes is valid UTF-8 is similarly improved, again by using a DFA. o Eliminate recursion from finalize_op(). o A handful of small optimizations related to character folding and character classes in regular expressions. o Optimization of "IV" to "UV" conversions. o Speed up of the integer stringification algorithm by processing two digits at a time instead of one. o Improvements based on LGTM analysis and recommendation. o Code optimizations in regcomp.c, regcomp.h, regexec.c. o Regular expression pattern matching of things like "qr/[^a]/" is significantly sped up, where a is any ASCII character. Other classes can get this speed up, but which ones is complicated and depends on the underlying bit patterns of those characters, so differs between ASCII and EBCDIC platforms, but all case pairs, like "qr/[Gg]/" are included, as is "[^01]". |
||
---|---|---|
.. | ||
a60 | ||
abcl | ||
algol68g | ||
asn1c | ||
awka | ||
baci | ||
basic256 | ||
boomerang | ||
brandybasic | ||
bwbasic | ||
caml-light | ||
camlp4 | ||
camlp5 | ||
cbmbasic | ||
ccsh | ||
cdl3 | ||
Cg-compiler | ||
chicken | ||
chicken5 | ||
cim | ||
cint | ||
clang | ||
clang-static-analyzer | ||
clang-tools-extra | ||
classpath | ||
classpath-gui | ||
clisp | ||
clojure | ||
compiler-rt | ||
coq | ||
coreclr | ||
cparser | ||
cu-prolog | ||
duktape | ||
eag | ||
ecl | ||
eieio | ||
elisp-manual | ||
elixir | ||
elk | ||
embryo | ||
erlang | ||
erlang-doc | ||
erlang-luerl | ||
erlang-man | ||
f2c | ||
ficl | ||
focal | ||
fort77 | ||
forth-retro | ||
g95 | ||
gambc | ||
gauche | ||
gawk | ||
gcc-aux | ||
gcc2 | ||
gcc3 | ||
gcc3-c | ||
gcc3-c++ | ||
gcc3-f77 | ||
gcc3-objc | ||
gcc5 | ||
gcc5-aux | ||
gcc5-libs | ||
gcc6 | ||
gcc6-aux | ||
gcc6-libs | ||
gcc7 | ||
gcc7-libs | ||
gcc8 | ||
gcc8-libs | ||
gcc34 | ||
gcc44 | ||
gcc48 | ||
gcc48-libs | ||
gcc49 | ||
gcc49-libs | ||
gforth | ||
ghc | ||
ghc-bootstrap | ||
ghc7 | ||
gnat_util | ||
gnucobol | ||
go | ||
go-hcl | ||
go14 | ||
go19 | ||
go110 | ||
go111 | ||
go112 | ||
gpc | ||
gprolog | ||
guile | ||
guile20 | ||
guile22 | ||
gwydion-dylan | ||
heirloom-awk | ||
hugs | ||
icon | ||
inform | ||
intercal | ||
ja-gawk | ||
jamvm | ||
japhar | ||
jasmin | ||
java-lang-spec | ||
java-vm-spec | ||
jikes | ||
jimtcl | ||
joos | ||
js | ||
kaffe | ||
kaffe-esound | ||
kaffe-x11 | ||
kali | ||
konoha | ||
ksi | ||
libBlocksRuntime | ||
libcxx | ||
libcxxabi | ||
libduktape | ||
libLLVM | ||
libLLVM4 | ||
libLLVM34 | ||
librep | ||
libtcl-nothread | ||
libunwind | ||
likepython | ||
llvm | ||
lua | ||
lua51 | ||
lua52 | ||
lua53 | ||
LuaJIT | ||
LuaJIT2 | ||
lush | ||
maude | ||
mawk | ||
mercury | ||
micropython | ||
minischeme | ||
mit-scheme-bin | ||
mono | ||
mono-basic | ||
mono2 | ||
moscow_ml | ||
mpd | ||
nawk | ||
newlisp | ||
newsqueak | ||
nhc98 | ||
nim | ||
nodejs | ||
nodejs6 | ||
nodejs8 | ||
npm | ||
nqp | ||
nuitka | ||
objc | ||
ocaml | ||
oo2c | ||
open-cobol-ce | ||
opencobol | ||
openjdk-bin | ||
openjdk7 | ||
openjdk8 | ||
opensource-cobol | ||
oracle-jdk8 | ||
oracle-jre8 | ||
ossp-js | ||
owl-lisp | ||
p2c | ||
p5-Switch | ||
parrot | ||
pc-lisp | ||
pcc | ||
pcc-current | ||
pear | ||
perl5 | ||
pfe | ||
pforth | ||
php | ||
php56 | ||
php71 | ||
php72 | ||
php73 | ||
picoc | ||
pict | ||
polyml | ||
py-asttokens | ||
py-basicproperty | ||
py-byterun | ||
py-cxfreeze | ||
py-execjs | ||
py-hy | ||
py-js2py | ||
py-jsparser | ||
py-mypy | ||
py-mypy_extensions | ||
py-parso | ||
py-paver | ||
py-pyrex | ||
py-python-lua | ||
py-pythonz | ||
py-six | ||
py-spark-parser | ||
py-uncompyle6 | ||
py27-html-docs | ||
py36-html-docs | ||
py37-html-docs | ||
python | ||
python27 | ||
python36 | ||
python37 | ||
qore | ||
R-codetools | ||
R-sourcetools | ||
racket | ||
racket-textual | ||
rakudo | ||
rakudo-star | ||
rcfunge | ||
rexx-imc | ||
rexx-regina | ||
ruby | ||
ruby-coffee-script | ||
ruby-coffee-script-source | ||
ruby-doc-stdlib | ||
ruby-execjs | ||
ruby-gherkin | ||
ruby-rkelly-remix | ||
ruby22 | ||
ruby22-base | ||
ruby24 | ||
ruby24-base | ||
ruby25 | ||
ruby25-base | ||
ruby26 | ||
ruby26-base | ||
runawk | ||
rust | ||
sablevm | ||
sablevm-classpath | ||
sablevm-classpath-gui | ||
sather | ||
sbcl | ||
scala | ||
scala-sbt | ||
scheme48 | ||
scm | ||
see | ||
sigscheme | ||
siod | ||
smalltalk | ||
smlnj | ||
smlnj11072 | ||
snobol | ||
spidermonkey | ||
spidermonkey52 | ||
spidermonkey185 | ||
spl | ||
squeak | ||
squeak-vm | ||
sr | ||
sr-examples | ||
stalin | ||
STk | ||
sun-jdk7 | ||
sun-jre7 | ||
swi-prolog | ||
swi-prolog-jpl | ||
swi-prolog-lite | ||
swi-prolog-packages | ||
tcl | ||
tcl-expect | ||
tcl-otcl | ||
tcl85 | ||
tinyscheme | ||
ucblogo | ||
umb-scheme | ||
utilisp | ||
vala | ||
vscm | ||
vslisp | ||
wsbasic | ||
yabasic | ||
yap | ||
zenlisp | ||
zig | ||
Makefile |