Commit graph

6 commits

Author SHA1 Message Date
wiz
1b6d0c5a65 Update to 5.0:
Release 5.0 13-Sep-04
---------------------

The licence under which PCRE is released has been changed to the more
conventional "BSD" licence.

In the code, some bugs have been fixed, and there are also some major changes
in this release (which is why I've increased the number to 5.0). Some changes
are internal rearrangements, and some provide a number of new facilities. The
new features are:

1. There's an "automatic callout" feature that inserts callouts before every
   item in the regex, and there's a new callout field that gives the position
   in the pattern - useful for debugging and tracing.

2. The extra_data structure can now be used to pass in a set of character
   tables at exec time. This is useful if compiled regex are saved and re-used
   at a later time when the tables may not be at the same address. If the
   default internal tables are used, the pointer saved with the compiled
   pattern is now set to NULL, which means that you don't need to do anything
   special unless you are using custom tables.

3. It is possible, with some restrictions on the content of the regex, to
   request "partial" matching. A special return code is given if all of the
   subject string matched part of the regex. This could be useful for testing
   an input field as it is being typed.

4. There is now some optional support for Unicode character properties, which
   means that the patterns items such as \p{Lu} and \X can now be used. Only
   the general category properties are supported. If PCRE is compiled with this
   support, an additional 90K data structure is include, which increases the
   size of the library dramatically.

5. There is support for saving compiled patterns and re-using them later.

6. There is support for running regular expressions that were compiled on a
   different host with the opposite endianness.

7. The pcretest program has been extended to accommodate the new features.

The main internal rearrangement is that sequences of literal characters are no
longer handled as strings. Instead, each character is handled on its own. This
makes some UTF-8 handling easier, and makes the support of partial matching
possible. Compiled patterns containing long literal strings will be larger as a
result of this change; I hope that performance will not be much affected.
2004-09-28 15:59:49 +00:00
jmmv
5113a4d55b Update to 4.5:
1. There has been some re-arrangement of the code for the match() function so
    that it can be compiled in a version that does not call itself recursively.
    Instead, it keeps those local variables that need separate instances for
    each "recursion" in a frame on the heap, and gets/frees frames whenever it
    needs to "recurse". Keeping track of where control must go is done by means
    of setjmp/longjmp. The whole thing is implemented by a set of macros that
    hide most of the details from the main code, and operates only if
    NO_RECURSE is defined while compiling pcre.c. If PCRE is built using the
    "configure" mechanism, "--disable-stack-for-recursion" turns on this way of
    operating.

    To make it easier for callers to provide specially tailored get/free
    functions for this usage, two new functions, pcre_stack_malloc, and
    pcre_stack_free, are used. They are always called in strict stacking order,
    and the size of block requested is always the same.

    The PCRE_CONFIG_STACKRECURSE info parameter can be used to find out whether
    PCRE has been compiled to use the stack or the heap for recursion. The
    -C option of pcretest uses this to show which version is compiled.

    A new data escape \S, is added to pcretest; it causes the amounts of store
    obtained and freed by both kinds of malloc/free at match time to be added
    to the output.

 2. Changed the locale test to use "fr_FR" instead of "fr" because that's
    what's available on my current Linux desktop machine.

 3. When matching a UTF-8 string, the test for a valid string at the start has
    been extended. If start_offset is not zero, PCRE now checks that it points
    to a byte that is the start of a UTF-8 character. If not, it returns
    PCRE_ERROR_BADUTF8_OFFSET (-11). Note: the whole string is still checked;
    this is necessary because there may be backward assertions in the pattern.
    When matching the same subject several times, it may save resources to use
    PCRE_NO_UTF8_CHECK on all but the first call if the string is long.

 4. The code for checking the validity of UTF-8 strings has been tightened so
    that it rejects (a) strings containing 0xfe or 0xff bytes and (b) strings
    containing "overlong sequences".

 5. Fixed a bug (appearing twice) that I could not find any way of exploiting!
    I had written "if ((digitab[*p++] && chtab_digit) == 0)" where the "&&"
    should have been "&", but it just so happened that all the cases this let
    through by mistake were picked up later in the function.

 6. I had used a variable called "isblank" - this is a C99 function, causing
    some compilers to warn. To avoid this, I renamed it (as "blankclass").

 7. Cosmetic: (a) only output another newline at the end of pcretest if it is
    prompting; (b) run "./pcretest /dev/null" at the start of the test script
    so the version is shown; (c) stop "make test" echoing "./RunTest".

 8. Added patches from David Burgess to enable PCRE to run on EBCDIC systems.

 9. The prototype for memmove() for systems that don't have it was using
    size_t, but the inclusion of the header that defines size_t was later. I've
    moved the #includes for the C headers earlier to avoid this.

10. Added some adjustments to the code to make it easier to compiler on certain
    special systems:

      (a) Some "const" qualifiers were missing.
      (b) Added the macro EXPORT before all exported functions; by default this
          is defined to be empty.
      (c) Changed the dftables auxiliary program (that builds chartables.c) so
          that it reads its output file name as an argument instead of writing
          to the standard output and assuming this can be redirected.

11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
    class containing characters with values greater than 255, PCRE compilation
    went into a loop.

12. A recursive reference to a subpattern that was within another subpattern
    that had a minimum quantifier of zero caused PCRE to crash. For example,
    (x(y(?2))z)? provoked this bug with a subject that got as far as the
    recursion. If the recursively-called subpattern itself had a zero repeat,
    that was OK.

13. In pcretest, the buffer for reading a data line was set at 30K, but the
    buffer into which it was copied (for escape processing) was still set at
    1024, so long lines caused crashes.

14. A pattern such as /[ab]{1,3}+/ failed to compile, giving the error
    "internal error: code overflow...". This applied to any character class
    that was followed by a possessive quantifier.

15. Modified the Makefile to add libpcre.la as a prerequisite for
    libpcreposix.la because I was told this is needed for a parallel build to
    work.

16. If a pattern that contained .* following optional items at the start was
    studied, the wrong optimizing data was generated, leading to matching
    errors. For example, studying /[ab]*.*c/ concluded, erroneously, that any
    matching string must start with a or b or c. The correct conclusion for
    this pattern is that a match can start with any character.
2003-12-12 22:33:36 +00:00
wiz
f157ffefb0 Update to 4.3.
Version 4.3 21-May-03

Refactoring for code improvements. POSIX compat fix (constification).
UTF-8 fixes.

Version 4.2 14-Apr-03

Build fixes. Removed some compiler warnings. UTF-8 fixes.

Version 4.1 12-Mar-03

Compilation fixes. A bug fix, and two optimization fixes.

Highlights of the 4.0 release:
1. Support for Perl's \Q...\E escapes.

2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
package. They provide some syntactic sugar for simple cases of "atomic
grouping".

3. Support for the \G assertion. It is true when the current matching position
is at the start point of the match.

4. A new feature that provides some of the functionality that Perl provides
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
is for the caller to provide an optional function, by setting pcre_callout to
its entry point. To get the function called, the regex must include (?C) at
appropriate points.

5. Support for recursive calls to individual subpatterns. This makes it really
easy to get totally confused.

6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
name a group.

7. Several extensions to UTF-8 support; it is now fairly complete. There is an
option for pcregrep to make it operate in UTF-8 mode.

8. The single man page has been split into a number of separate man pages.
These also give rise to individual HTML pages which are put in a separate
directory. There is an index.html page that lists them all. Some hyperlinking
between the pages has been installed.
2003-08-05 10:18:39 +00:00
cjep
103ce209c0 Add -Wl,-R... flags in pcre-config (same as we do for gtk-config). Bump pkg revision. 2002-12-30 18:08:01 +00:00
martti
816d169300 Updated to version 3.7. Changes since 3.4:
Version 3.7 29-Oct-01
---------------------

1. In updating pcretest to check change 1 of version 3.6, I screwed up.
This caused pcretest, when used on the test data, to segfault. Unfortunately,
this didn't happen under Solaris 8, where I normally test things.

Version 3.6 23-Oct-01
---------------------

1. Crashed with /(sens|respons)e and \1ibility/ and "sense and sensibility" if
offsets passed as NULL with zero offset count.

2. The config.guess and config.sub files had not been updated when I moved to
the latest autoconf.

Version 3.5 15-Aug-01
---------------------

1. Added some missing #if !defined NOPOSIX conditionals in pcretest.c that
had been forgotten.

2. By using declared but undefined structures, we can avoid using "void"
definitions in pcre.h while keeping the internal definitions of the structures
private.

3. The distribution is now built using autoconf 2.50 and libtool 1.4. From a
user point of view, this means that both static and shared libraries are built
by default, but this can be individually controlled. More of the work of
handling this static/shared cases is now inside libtool instead of PCRE's make
file.

4. The pcretest utility is now installed along with pcregrep because it is
useful for users (to test regexs) and by doing this, it automatically gets
relinked by libtool. The documentation has been turned into a man page, so
there are now .1, .txt, and .html versions in /doc.

5. Upgrades to pcregrep:
   (i)   Added long-form option names like gnu grep.
   (ii)  Added --help to list all options with an explanatory phrase.
   (iii) Added -r, --recursive to recurse into sub-directories.
   (iv)  Added -f, --file to read patterns from a file.

6. pcre_exec() was referring to its "code" argument before testing that
argument for NULL (and giving an error if it was NULL).

7. Upgraded Makefile.in to allow for compiling in a different directory from
the source directory.

8. Tiny buglet in pcretest: when pcre_fullinfo() was called to retrieve the
options bits, the pointer it was passed was to an int instead of to an unsigned
long int. This mattered only on 64-bit systems.

9. Fixed typo (3.4/1) in pcre.h again. Sigh. I had changed pcre.h (which is
generated) instead of pcre.in, which it its source. Also made the same change
in several of the .c files.

10. A new release of gcc defines printf() as a macro, which broke pcretest
because it had an ifdef in the middle of a string argument for printf(). Fixed
by using separate calls to printf().

11. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
script, to force use of CR or LF instead of \n in the source. On non-Unix
systems, the value can be set in config.h.

12. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
absolute limit. Changed the text of the error message to make this clear, and
likewise updated the man page.

13. The limit of 99 on the number of capturing subpatterns has been removed.
The new limit is 65535, which I hope will not be a "real" limit.
2001-11-30 10:20:01 +00:00
zuntum
5543dc6c28 Update pcre to version 3.4
Version 3.4 22-Aug-00
---------------------

1. Fixed typo in pcre.h: unsigned const char * changed to const unsigned char *.

2. Diagnose condition (?(0) as an error instead of crashing on matching.


Version 3.3 01-Aug-00
---------------------

1. If an octal character was given, but the value was greater than \377, it
was not getting masked to the least significant bits, as documented. This could
lead to crashes in some systems.

2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats
the hyphen as a literal. PCRE used to give an error; it now behaves like Perl.

3. Added the functions pcre_free_substring() and pcre_free_substring_list().
These just pass their arguments on to (pcre_free)(), but they are provided
because some uses of PCRE bind it to non-C systems that can call its functions,
but cannot call free() or pcre_free() directly.

4. Add "make test" as a synonym for "make check". Corrected some comments in
the Makefile.

5. Add $(DESTDIR)/ in front of all the paths in the "install" target in the
Makefile.

6. Changed the name of pgrep to pcregrep, because Solaris has introduced a
command called pgrep for grepping around the active processes.

7. Added the beginnings of support for UTF-8 character strings.

8. Arranged for the Makefile to pass over the settings of CC, CFLAGS, and
RANLIB to ./ltconfig so that they are used by libtool. I think these are all
the relevant ones. (AR is not passed because ./ltconfig does its own figuring
out for the ar command.)

Version 3.2 12-May-00
---------------------

This is purely a bug fixing release.

1. If the pattern /((Z)+|A)*/ was matched agained ZABCDEFG it matched Z instead
of ZA. This was just one example of several cases that could provoke this bug,
which was introduced by change 9 of version 2.00. The code for breaking
infinite loops after an iteration that matches an empty string was't working
correctly.

2. The pcretest program was not imitating Perl correctly for the pattern /a*/g
when matched against abbab (for example). After matching an empty string, it
wasn't forcing anchoring when setting PCRE_NOTEMPTY for the next attempt; this
caused it to match further down the string than it should.

3. The code contained an inclusion of sys/types.h. It isn't clear why this
was there because it doesn't seem to be needed, and it causes trouble on some
systems, as it is not a Standard C header. It has been removed.

4. Made 4 silly changes to the source to avoid stupid compiler warnings that
were reported on the Macintosh. The changes were from

  while ((c = *(++ptr)) != 0 && c != '\n');
to
  while ((c = *(++ptr)) != 0 && c != '\n') ;

Totally extraordinary, but if that's what it takes...

5. PCRE is being used in one environment where neither memmove() nor bcopy() is
available. Added HAVE_BCOPY and an autoconf test for it; if neither
HAVE_MEMMOVE nor HAVE_BCOPY is set, use a built-in emulation function which
assumes the way PCRE uses memmove() (always moving upwards).

6. PCRE is being used in one environment where strchr() is not available. There
was only one use in pcre.c, and writing it out to avoid strchr() probably gives
faster code anyway.


Version 3.1 09-Feb-00
---------------------
The only change in this release is the fixing of some bugs in Makefile.in for
the "install" target:

(1) It was failing to install pcreposix.h.

(2) It was overwriting the pcre.3 man page with the pcreposix.3 man page.


Version 3.0 01-Feb-00
---------------------

1. Add support for the /+ modifier to perltest (to output $` like it does in
pcretest).

2. Add support for the /g modifier to perltest.

3. Fix pcretest so that it behaves even more like Perl for /g when the pattern
matches null strings.

4. Fix perltest so that it doesn't do unwanted things when fed an empty
pattern. Perl treats empty patterns specially - it reuses the most recent
pattern, which is not what we want. Replace // by /(?#)/ in order to avoid this
effect.

5. The POSIX interface was broken in that it was just handing over the POSIX
captured string vector to pcre_exec(), but (since release 2.00) PCRE has
required a bigger vector, with some working space on the end. This means that
the POSIX wrapper now has to get and free some memory, and copy the results.

6. Added some simple autoconf support, placing the test data and the
documentation in separate directories, re-organizing some of the
information files, and making it build pcre-config (a GNU standard). Also added
libtool support for building PCRE as a shared library, which is now the
default.

7. Got rid of the leading zero in the definition of PCRE_MINOR because 08 and
09 are not valid octal constants. Single digits will be used for minor values
less than 10.

8. Defined REG_EXTENDED and REG_NOSUB as zero in the POSIX header, so that
existing programs that set these in the POSIX interface can use PCRE without
modification.

9. Added a new function, pcre_fullinfo() with an extensible interface. It can
return all that pcre_info() returns, plus additional data. The pcre_info()
function is retained for compatibility, but is considered to be obsolete.

10. Added experimental recursion feature (?R) to handle one common case that
Perl 5.6 will be able to do with (?p{...}).

11. Added support for POSIX character classes like [:alpha:], which Perl is
adopting.
2001-05-14 12:55:30 +00:00