This is part of the japanese category retirement. CATEGORIES and pathnames to japanese/ adjusted.
86 KiB
86 KiB
$NetBSD: patch-aa,v 1.1.1.1 2002/05/31 13:00:43 seb Exp $
$B!|!|!|!|!| GNU grep version 2.0 + multi-byte extension 1.04 $B!|!|!|!|!|
$B!|!|!|!|!| Jun. 2, 1994 by t^2 $B!|!|!|!|!|
$B$3$N%U%!%$%k$O GNU grep version 2.0 (grep-2.0) $B$N%=!<%9%3!<%I$+$i, $B$=
$B$N%^%k%A%P%$%HJ8;zBP1~HG grep-2.0+mb1.04 $B$N%=!<%9%3!<%I$r@8@.$9$k$?$a
$B$N:9J,$r4^$s$G$$$^$9. grep-2.0 $B$N%=!<%9$rE83+$7$F$"$k%G%#%l%/%H%j$G
% patch -p1 < $B$3$N%U%!%$%k
$B$J$I$H$7$F%Q%C%A$rEv$F$F$/$@$5$$. $B$=$N8e README.MB $B$rFI$s$G$/$@$5$$.
$B")810 $BJ!2,;TCf1{6hG_8w1`CDCO 7-207
TEL/FAX: 092-731-4025 (TEL/FAX $B<+F0@ZBX$()
092-724-6342 (TEL $B$N$_)
E-mail: NBC02362@niftyserve.or.jp t^2 ($BC+K\9'9@)
diff -ru2N grep-2.0/ChangeLog.MB grep+mb1.04/ChangeLog.MB
--- grep-2.0/ChangeLog.MB Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/ChangeLog.MB Thu Jun 2 17:01:42 1994
@@ -0,0 +1,219 @@
+Thu Jun 2 16:58:03 1994 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.04 released.
+
+Sat Mar 5 16:30:16 1994 Takahiro Tanimoto (tt@isaac)
+
+ * README.MSC: PC-9800 $B%7%j!<%:MQ MS-C 6.00A $B$N, $B%o%$%k%I%+!<%IE8
+ $B3+%k!<%A%s$N%P%0$KBP=h$7$?. $B0JA0$N stdargv.diff $B$r$3$l$KE}9g$7,
+ $B:o=|$7$?. (Thanks to $BJ!9@K.$5$s <GFE00522@niftyserve.or.jp>)
+
+Thu Aug 19 04:26:09 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_compile_fastmap): charset_not $B$N fastmap $B$N:n@.=h
+ $BM}$,4V0c$C$F$$$F, regex $B$N fastmap $B$r;HMQ$9$k>l9g (e?grep $B$G$O
+ fastmap $B$r;HMQ$7$F$$$J$$$?$a, $B$3$NLdBj$OI=LL$K$O8=$l$J$$), $B@55,
+ $BI=8=$N@hF,$N [^$B#A] $B$d [^a] $B$KNc$($P $B#B $B$,%^%C%A$7$J$+$C$?.
+ (Thanks to $B>.20NIM4$5$s <JAE03716@niftyserve.or.jp>)
+
+Tue Aug 10 01:29:05 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (set_list_bits): $BJ8;z%/%i%9Cf$N%^%k%A%P%$%HJ8;z$N:GE,
+ $B2=$G, $B6h4V=*E@$N99?7=hM}ItJ,$K%P%0$,$"$j, [$B#A-$B#C#E-$B#G#B-$B#D] $B$r:G
+ $BE,2=$9$k$H [$B#A-$B#G] $B$@$,, $B$3$l$, [$B#A-$B#E] $B$H$J$C$F$7$^$C$F$$$?.
+ $B$?$@$7, regex $B$G$O$J$/ dfa $B$,;HMQ$5$l$k>l9g$K$O$3$N%P%0$OI=LL$K
+ $B$O8=$l$J$$.
+
+Fri Jul 23 03:22:13 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.03 released.
+
+ * DEFS.dos: strcmpi $B$r stricmp $B$KJQ99.
+
+ * grep.c (main): MS-DOS $B$N>l9g, argv[0] $B$r2C9)$7$?J8;zNs$X$N%]%$
+ $B%s%?$r argv[0] $B$X$b%;%C%H$9$k. getopt $B$,=PNO$9$k%a%C%;!<%8$b2C
+ $B9)$5$l$?J8;zNs$H$J$k.
+
+ * grep.c (main): MS-C 6.00A $B$N stdargv.asm $B$N%P%0$r%U%#%C%/%9$7
+ $B$?$?$a, argv[0] == "" $B$N$H$-$N=hM}$r:o=|$7$?.
+
+ * stdargv.diff: $BDI2C.
+
+Tue Jul 13 07:04:13 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.02 released.
+
+Mon Jul 12 00:20:36 1993 Takahiro Tanimoto (tt@isaac)
+
+ * grep.c: HAVE_STRCASECMP $B$, #define $B$5$l$F$$$J$$$H$-, $B0lJ}$NJ8
+ $B;zNs$@$1$r>.J8;z$K$7$F$+$iHf3S$9$k4X?t$rDj5A$7$F$$$?$,, $B;H$$J}$,
+ $B0-$+$C$?. $B$=$l$r strcasecmp() $B$HF1$8$b$N$KJQ99$7$?.
+
+ * DEFS.dos: HAVE_STRCMPI $B$r #define $B$9$kBe$o$j$K,
+ HAVE_STRCASECMP $B$r #define $B$7, strcasecmp $B$r strcmpi $B$K #define
+ $B$7$?.
+
+Sat Jul 10 01:05:04 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.01 released.
+
+ * grep.c (main): MSDOS $B$N>l9g, argv[0] $B$r>.J8;z$K$7$F prog $B$K%;%C
+ $B%H$9$k. $B$^$?, $B3HD%;R$O<h$j=|$/.
+
+ * obstack.h: chunk_size $B$N7?$r size_t $B$+$i unsigned $B$KJQ99.
+ old-C $B$N>l9g, size_t $B$,Dj5A$5$l$F$$$J$$>uBV$H$J$C$?$?$a.
+
+ * regex.h: $BDj?t$N8e$K U, UL $B$r$D$1$k$H old-C $B$G%3%s%Q%$%k$G$-$J
+ $B$$. $B$3$l$i$r%-%c%9%H$KJQ99$7$?.
+
+ * regex.h: RE_DUP_MAX $B$NDj5A$r 16 $B%S%C%H int $B$N%^%7%s$G$b%*!<%P
+ $B%U%m!<$7$J$$=q$-J}$K=$@5.
+
+ * obstack.h: struct obstack $B$N%a%s%P chunk_size $B$N7?$r size_t $B$H
+ $B$7$?. PTR_INT_TYPE $B$r30It$+$i #define $B$G$-$k$h$&$K$7$?. MSDOS
+ $B$G SMALL MODEL $B0J30$N>l9g, __PTR_TO_INT, __INT_TO_PTR $B$H$H$b$K,
+ $B%]%$%s%?$H long $B$rJQ49$9$k$h$&$K$7$?.
+
+ * grep.c (fillbuf, grep): read() $B$NJV$jCM$N@5Ii$K$h$k%(%i!<%A%'%C
+ $B%/$r, -1 $B$KEy$7$$$+$I$&$+$G9T$&$h$&$KJQ99.
+
+ * grep.c: totalcc, totalnl $B$r unsigned long $B$KJQ99$7, prline()
+ $BCf$N printf() $B$N=q<0$r9g$o$;$?.
+
+ * DEFS.dos: BUFSALLOC $B$r 4096 $B$K #define. (See reset() in
+ grep.c.)
+
+ * getpagesize.h: MSDOS $B$N>l9g, $B%Z!<%8%5%$%:$O 4096 $B$H$7$?.
+
+ * dfa.c: STDC_HEADERS $B$^$?$O HAVE_STRING_H $B$N$H$-, bcopy, bzero
+ $B$r%^%/%mE83+$9$k.
+
+Fri Jul 9 13:16:50 1993 Takahiro Tanimoto (tt@isaac)
+
+ * mbc.c, mbc.h, ...: ismbchar() $B$r%b%8%e!<%kKh$KFHN)$7$FDj5A$9$k
+ $B$N$r$d$a, $B%b%8%e!<%k$rDI2C$7$?.
+
+ * search.c (Fexecute): fgrep $B$r%^%k%A%P%$%HJ8;z$KBP1~$5$;$?.
+
+Wed Jul 7 17:02:33 1993 Takahiro Tanimoto (tt@isaac)
+
+ * kwset.c (bmexec): 8 $B%S%C%H%/%j!<%s$G$J$$$H$3$m$r=$@5.
+
+ * $B%Y!<%9$r grep-2.0 $B$XJQ99.
+
+Sun Jul 4 08:48:12 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_match_2): $B%*%j%8%J%k$N%P%0. maybe_finalize_jump
+ $B$N=hM}Cf, start_memory/stop_memory $B$r%9%-%C%W$9$k$H$3$m$G, $B0z?t
+ $B$N%9%-%C%W$r$7$F$$$J$$%P%0$r=$@5. $BNc$($P "([a-n]+).*\1" $B$,@5$7
+ $B$/ "abcxyzab" $B$K%^%C%A$9$k$h$&$K$J$C$?.
+
+Sat Jul 3 06:51:33 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 1.6 + multi-byte extension 1.00 released.
+
+Sat Jul 3 04:29:14 1993 Takahiro Tanimoto (tt at pc98)
+
+ * grep.c (bufprev): -b $B%*%W%7%g%s$GI=<($9$k%P%$%H%*%U%;%C%H$r
+ long $B$K$9$k$?$a$K bufprev $B$r long $B$H$7$?. bufprev $B0J30$OJQ99$7
+ $B$F$$$J$$$?$a, 1 $B9T$N%5%$%:$, int $B$NHO0O$r1[$($k$H@5$7$/=hM}$5$l
+ $B$J$$. $B$^$?, DOS $B$G$O CR+LF $B$r 1 $B%P%$%H$H$7$F%+%&%s%H$7$F$7$^$&.
+ ($B<jH4$-)
+
+ * regex.c (re_match_2): $BJ8;z%/%i%9$N=hM}Cf$N 16 $B%S%C%H int $B$G@5
+ $B>oF0:n$7$J$$ItJ,$r=$@5.
+
+ * regex.c (re_exec): re_search() $B$X$N:G8e$N0z?t$r 0 $B$+$i NULL $B$X
+ $B=$@5.
+
+ * regex.c (re_match): re_match_2() $B$X$N#2HVL\$N0z?t$r 0 $B$+$i
+ NULL $B$X=$@5.
+
+ * regex.c (re_search): re_search_2() $B$X$N#2HVL\$N0z?t$r 0 $B$+$i
+ NULL $B$X=$@5.
+
+ * grep.c (main): MS-C $B$N setargv $B$N%P%0$N$;$$$G, grep "\\" foo
+ $B$H$9$k$H argv[0] == "" $B$H$J$C$F$7$^$&. argv[0] == "" $B$N$H$-$O6/
+ $B@)E*$K "grep" $B$^$?$O "egrep" $B$r%;%C%H$9$k$h$&$K$7$?.
+
+Fri Jul 2 19:25:58 1993 Takahiro Tanimoto (tt at pc98)
+
+ * grep.c (main): $BJQ?t prog $B$N@_Dj$r DOS $BMQ$K=$@5$7$?. $B$=$N:],
+ $B%*%j%8%J%k$N$d$jJ}$O$^$:$+$C$?$N$G=$@5$7$?.
+
+ * grep.c: MSDOS $B$N$H$- errno $B$H sys_errlist $B$N@k8@$r$7$J$$$h$&$K
+ $B=$@5$7$?.
+
+ * regex.c (set_list_bits): $B;HMQ$7$F$$$J$+$C$?JQ?t$r:o=|.
+
+ * Makefile.msc: DOS $B%5%]!<%H$N$?$aDI2C.
+
+Fri Jun 11 04:14:22 1993 Takahiro Tanimoto (tt@isaac)
+
+ * grep.c: version $BJ8;zNs$,8E$$$^$^$@$C$?.
+
+Tue May 25 00:10:49 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 1.6 + multi-byte extension 0.02 released.
+
+Mon May 24 15:57:31 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_search_2): $B8eJ}$X advance $B$9$k:]$N%P%0$r=$@5.
+
+Sat May 22 02:03:41 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_compile_fastmap): exactn $B$G translate $B$9$k$N$r$d$a
+ $B$?. re_compile_pattern $B$G0lEY translate $B$5$l$F$$$k$O$:.
+
+ * regex.c (re_match_2): exactn $B$N=hM}ItJ,$G, #if 0 $B$r #if 1 $B$K$7
+ $B$?>l9g, $B@5$7$$=hM}$r9T$C$F$$$J$+$C$?$N$r=$@5.
+
+Fri May 21 20:04:07 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.[ch]: mbcharset, mbcharset_not $B$rGQ;_. $BBe$o$j$K
+ charset, charset_not $B$,%^%k%A%P%$%HJ8;z$r$bJ];}$9$k.
+
+ * grep.c (main): $B2<5-$NJQ99$KH<$C$F, "^.*(" ... ")" $B$rIU2C$9$k=h
+ $BM}$r:o=|$7$?.
+
+ * dfa.c (regcompile): searchflag $B$, ON $B$N$H$-, $B@55,I=8=$r "^.*("
+ ... ")" $B$H$7$F%3%s%Q%$%k$9$k$h$&$K$7$?. $B0JA0$O grep.c $B$NCf$GF1
+ $B$8$3$H$r9T$C$F$$$?.
+
+ * dfa.c (lex): $BJ8;z%/%i%9$G%^%k%A%P%$%HJ8;z$N#1J8;zL\$N=89g$+$i,
+ $B%7%s%0%k%P%$%HJ8;z$r=|30$9$k=hM}$rDI2C$7$?.
+
+ * dfa.c (lex): $BJ8;z%/%i%9$G%7%s%0%k%P%$%HJ8;z$N>e8B$,4V0c$C$F$$
+ $B$?$N$r=$@5$7$?.
+
+Wed May 19 01:27:07 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c: !__STDC__ $B$N$H$-$K const $B$r #define.
+
+ * dfa.h: $B%*%j%8%J%k$G$O !STDC_HEADERS $B$N$H$-$K const $B$r #define
+ $B$7$F$$$?$,, $B$3$l$r !__STDC__ $B$N$H$-$K #define $B$9$k$h$&$KJQ99$7$?.
+
+ * configure.in: bcopy(), memmove() $B$N%A%'%C%/$rDI2C.
+
+ * dfa.c (reginit): cs_tok[] $B$N=i4|2=$rDI2C$7$?. -i $B%U%i%0$rIU$1
+ $B$?>l9g$NIT6q9g$r=$@5.
+
+Tue May 18 18:14:04 1993 Takahiro Tanimoto (tt@albert)
+
+ * dfa.h: regex.h $B$G$N RE_MBCTYPE_??? $B$NCM$H0lCW$5$;$?.
+
+ * regex.[ch] (RE_TRANSLATED_RANGE): mbsed-0.01 $B$G9T$C$?3HD%$rM"
+ $BF~$7$?.
+
+Sat May 15 04:27:32 1993 Takahiro Tanimoto (tt@isaac)
+
+ * $B%^%k%A%P%$%HJ8;zBP1~HG$,0lDL$j40@.$7$?.
+
+
+Local Variables:
+mode: indented-text
+left-margin: 8
+fill-column: 72
+fill-prefix: " "
+version-control: never
+End:
diff -ru2N grep-2.0/DEFS.dos grep+mb1.04/DEFS.dos
--- grep-2.0/DEFS.dos Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/DEFS.dos Fri Jul 23 03:23:31 1993
@@ -0,0 +1,15 @@
+#define STDC_HEADERS 1
+#define HAVE_STRING_H 1
+#define HAVE_MEMCHR 1
+#define HAVE_STRERROR 1
+#define HAVE_MEMMOVE 1
+#define HAVE_STRCASECMP 1
+#define strcasecmp stricmp
+
+#define BUFSALLOC 4096
+
+#ifndef M_I86SM
+#define __PTR_TO_INT(P) ((long)(P))
+#define __INT_TO_PTR(P) ((char *)(P))
+#define PTR_INT_TYPE long
+#endif
diff -ru2N grep-2.0/MANIFEST.MB grep+mb1.04/MANIFEST.MB
--- grep-2.0/MANIFEST.MB Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/MANIFEST.MB Sat Mar 5 16:37:46 1994
@@ -0,0 +1,11 @@
+ChangeLog.MB Revision history of multi-byte extension to grep.
+DEFS.dos Definitions for DOS.
+MANIFEST.MB This file.
+Makefile.msc Makefile for MS-C version 6.
+README.MB Documentation for multi-byte extension.
+README.MSC Patch for source/startup/... of MS-C 6.00A
+mbc.c Multi-byte char handler.
+mbc.h Interface to mbc.c.
+tests/batgen.awk DOS version of scriptgen.awk.
+tests/check.bat DOS version of check.sh
+tests/spencer.dos Input for batgen.
diff -ru2N grep-2.0/Makefile.in grep+mb1.04/Makefile.in
--- grep-2.0/Makefile.in Mon May 3 05:54:24 1993
+++ grep+mb1.04/Makefile.in Mon Jul 12 02:02:28 1993
@@ -16,4 +16,7 @@
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+# Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+# Last change: Jul. 12, 1993 by t^2
+
SHELL = /bin/sh
@@ -40,4 +43,11 @@
DEFS=-DGREP @DEFS@
+# Things you might set to MBCTYPE_DEF to spec. default multi-byte char type.
+# -DEUC will make default multi-byte char type EUC and
+# -DSJIS SJIS.
+# If you do not set EUC/SJIS, grep assumes no multi-byte
+# char as default.
+MBCTYPE_DEF=-DEUC
+
# Extra libraries.
LIBS=@LIBS@
@@ -69,9 +79,9 @@
#### End of system configuration section. ####
-SRCS=grep.c getopt.c regex.c dfa.c kwset.c obstack.c search.c
-OBJS=grep.o getopt.o regex.o dfa.o kwset.o obstack.o search.o
+SRCS=grep.c getopt.c regex.c dfa.c kwset.c obstack.c search.c mbc.c
+OBJS=grep.o getopt.o regex.o dfa.o kwset.o obstack.o search.o mbc.o
.c.o:
- $(CC) $(CFLAGS) $(DEFS) -I$(srcdir) -c $<
+ $(CC) $(CFLAGS) $(DEFS) $(MBCTYPE_DEF) -I$(srcdir) -c $<
all: grep check.done
@@ -120,7 +130,9 @@
dist:
V=`sed -n '/version\\[/s/.*\\([0-9][0-9]*\\.[0-9]*\\).*/\\1/p' \
+ grep.c`+mb`sed -n '/^ + multi-byte/s/[^0-9]*\\([0-9.]*\\).*/\\1/p' \
grep.c`; \
mkdir grep-$$V; mkdir grep-$$V/tests; \
- for f in `awk '{print $$1}' MANIFEST`; do ln $$f grep-$$V/$$f; done; \
+ for f in `awk '{print $$1}' MANIFEST MANIFEST.MB`; \
+ do ln $$f grep-$$V/$$f; done; \
tar cvhf - grep-$$V | gzip > grep-$$V.tar.z; \
rm -fr grep-$$V
@@ -132,2 +144,3 @@
kwset.o obstack.o: obstack.h
regex.o search.o: regex.h
+grep.o regex.o dfa.o search.o mbc.o: mbc.h
diff -ru2N grep-2.0/Makefile.msc grep+mb1.04/Makefile.msc
--- grep-2.0/Makefile.msc Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/Makefile.msc Fri Jul 23 04:03:17 1993
@@ -0,0 +1,138 @@
+# Generated automatically from Makefile.in by configure.
+# Makefile for GNU grep
+# Copyright (C) 1992 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2, or (at your option)
+# any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+# Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+# Last change: Jul. 23, 1993 by t^2
+
+#### Start of system configuration section. ####
+
+srcdir=.
+VPATH=.
+
+AWK=gawk
+INSTALL=cp
+INSTALL_PROGRAM=$(INSTALL)
+INSTALL_DATA=$(INSTALL)
+
+CC=cl -nologo -D__STDC__ -AL
+LINT=lint
+
+# Things you might add to DEFS:
+# -DSTDC_HEADERS If you have ANSI C headers and libraries.
+# -DHAVE_UNISTD_H If you have unistd.h.
+# -DUSG If you have System V/ANSI C string
+# and memory functions and headers.
+# -D__CHAR_UNSIGNED__ If type `char' is unsigned.
+# gcc defines this automatically.
+#
+# For DOS, add those to DEFS.dos.
+
+# Things you might set to MBCTYPE_DEF to spec. default multi-byte char type.
+# -DEUC will make default multi-byte char type EUC and
+# -DSJIS SJIS.
+# If you do not set EUC/SJIS, grep assumes no multi-byte
+# char as default.
+MBCTYPE_DEF=-DSJIS
+
+# Extra libraries.
+LIBS=setargv/noe/st:30000
+ALLOCA=
+
+CFLAGS=-Ox
+LDFLAGS=$(CFLAGS)
+
+prefix=
+exec_prefix=$(prefix)
+
+# Prefix for installed program, normally empty or `g'.
+binprefix=
+# Prefix for installed man page, normally empty or `g'.
+manprefix=
+
+# Where to install executables.
+bindir=$(exec_prefix)/bin
+
+# Where to install man pages.
+mandir=$(prefix)/man/man1
+
+# Extension for man pages.
+manext=1
+
+# How to make a hard link.
+LN=cp
+
+#### End of system configuration section. ####
+
+SRCS=grep.c getopt.c regex.c dfa.c kwset.c obstack.c search.c mbc.c
+OBJS=grep.obj getopt.obj regex.obj dfa.obj kwset.obj obstack.obj search.obj mbc.obj
+
+.c.obj:
+ cat DEFS.dos $< > $*_.c
+ $(CC) $(CFLAGS) $(MBCTYPE_DEF) -I$(srcdir) -c -Fo$@ $*_.c
+ rm $*_.c
+
+all: grep.exe check.don
+
+# For Saber C.
+grep.loa: $(SRCS)
+ #load $(CFLAGS) $(DEFS) -I$(srcdir) (SRCS)
+
+# For Lint.
+grep.lin: $(SRCS)
+ $(LINT) $(CFLAGS) $(DEFS) -I$(srcdir) $(SRCS)
+
+install: all
+ $(INSTALL_PROGRAM) grep.exe $(bindir)/$(binprefix)grep.exe
+ rm -f $(bindir)/$(binprefix)egrep.exe
+ $(LN) $(bindir)/$(binprefix)grep.exe $(bindir)/$(binprefix)egrep.exe
+ rm -f $(bindir)/$(binprefix)fgrep.exe
+ $(LN) $(bindir)/$(binprefix)grep.exe $(bindir)/$(binprefix)fgrep.exe
+
+check:
+ tests\check
+ echo done >check.don
+
+check.don: grep.exe
+ tests\check
+ echo done >check.don
+
+grep.exe: $(OBJS)
+ echo $(OBJS:.obj =.obj+)+>link.tmp
+ echo $(LIBS)>>link.tmp
+ echo $@/noi;>>link.tmp
+ link @link.tmp
+ rm link.tmp
+
+clean:
+ rm -f grep.exe *.obj check.don tmp.bat tmp.in khadafy.out
+
+mostlycl: clean
+
+distclea: clean
+ rm -f Makefile config.sta
+
+realclea: distclea
+ rm -f TAGS
+
+# Some header file dependencies that really ought to be automatically deduced.
+dfa.obj search.obj: dfa.h
+grep.obj search.obj: grep.h
+kwset.obj search.obj: kwset.h
+kwset.obj obstack.obj: obstack.h
+regex.obj search.obj: regex.h
+grep.obj regex.obj dfa.obj search.obj mbc.obj: mbc.h
diff -ru2N grep-2.0/README.MB grep+mb1.04/README.MB
--- grep-2.0/README.MB Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/README.MB Thu Jun 2 17:03:37 1994
@@ -0,0 +1,327 @@
+$B!|!|!|!|!| GNU grep version 2.0 + multi-byte extension 1.04 $B!|!|!|!|!|
+$B!|!|!|!|!| Jun. 2, 1994 by t^2 $B!|!|!|!|!|
+
+ grep-2.0+mb1.04 -- $B%^%k%A%P%$%HJ8;zBP1~HG GNU grep
+
+$B!|35MW
+
+ GNU $B%W%m%8%'%/%H$K$h$k grep, egrep, fgrep ($B0J2<C1$K grep) $B$r%^%k%A%P
+ $B%$%HJ8;zBP1~2=$7$?$b$N$G$9.
+
+$B!|;HMQK!
+
+ grep $B$+$i$N3HD%ItJ,$@$1$r@bL@$7$^$9.
+
+ $BA}$($?%*%W%7%g%s$O0J2<$NDL$j$G$9.
+
+ -Wctype=ASCII
+ $B%^%k%A%P%$%HJ8;z$r9MN8$7$^$;$s. $B$3$N%*%W%7%g%s$r;HMQ$7$?>l
+ $B9g, grep $B$N%*%j%8%J%k$HF1$8F0:n$K$J$k$O$:$G$9.
+
+ -Wctype=EUC
+ $B%^%k%A%P%$%HJ8;z$H$7$F EUC $B$rG'<1$7$^$9.
+
+ -Wctype=SJIS
+ $B%^%k%A%P%$%HJ8;z$H$7$F Shift-JIS $B$rG'<1$7$^$9.
+
+ MS-DOS $B0J30$N%7%9%F%`$G, Makefile(.in)? $B$r=q$-49$($:$K%$%s%9%H!<
+ $B%k$7$?>l9g, $B%G%U%)%k%H$G$O EUC $B$rG'<1$7$^$9. MS-DOS $B$G$O%G%U%)
+ $B%k%H$G Shift-JIS $B$rG'<1$7$^$9.
+
+$B!| GREM104.LZH (MS-DOS $BHG<B9T7A<0$r4^$`%"!<%+%$%V) $B$K$D$$$F ($B$=$l0J30$N
+ $B7ABV$GF~<j$5$l$?J}$OL5;k$7$F$/$@$5$$)
+
+ 1. $B%"!<%+%$%V$K4^$^$l$F$$$k%U%!%$%k
+
+ $B%*%j%8%J%k$+$iA4$/<j$r2C$($F$$$J$$%U%!%$%k
+
+ AUTHORS $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k AUTHORS
+ CHANGELO $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k ChangeLog
+ COPYING $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k COPYING
+ MANIFEST $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k MANIFEST
+ NEWS $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k NEWS
+ PROJECTS $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k PROJECTS
+ README $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k README
+
+ grep+mb $BMQ$N%U%!%$%k
+
+ CHANGELO.MB grep+mb $B$NJQ99MzNr
+ README.MB $B$3$N%U%!%$%k
+
+ MS-DOS $BHG grep+mb $BMQ$N%U%!%$%k
+
+ GREP.CAT $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k%^%K%e%"%k%Z!<%8.
+ grep.man $B$r GNU roff $B$G%U%)!<%^%C%H$7$?$b$N.
+ GREP.EXE MS-DOS $BHG grep-2.0+mb1.04 $B$N<B9T7A<0
+ READMAN.SED sed $B$r;}$C$F$$$k?M$X$*$^$1
+ (sed -f readman.sed grep.cat)
+
+ 2. GREP.EXE $B$K$D$$$F
+
+ grep-2.0+mb1.04 $B$r MS-C 6.00A $B$G%3%s%Q%$%k$7$?$b$N$G$9.
+
+ $B%G%U%)%k%H$G Shift-JIS $B4A;z%3!<%I$r4^$`%Q%?!<%s$d%F%-%9%H$r=hM}
+ $B$G$-$^$9.
+
+ setargv.obj $B$rAH$_9~$s$G$"$j$^$9$N$G, MS-DOS $B$G%]%T%e%i!<$J%?%$
+ $B%W$N%o%$%k%I%+!<%I$,;HMQ$G$-$^$9. UNIX $B$N csh $B%i%$%/$J%o%$%k%I
+ $B%+!<%IE83+%k!<%A%s$rMQ0U$7$h$&$+$H$b;W$C$?$N$G$9$,, MS-DOS $B$NB>
+ $B$N%3%^%s%I$H$N@09g@-$,<h$l$J$$$7, $B%*%j%8%J%k$r$J$k$Y$/B:=E$7$?$+$C
+ $B$?$N$GCGG0$7$^$7$?.
+
+ 3. $B%$%s%9%H!<%k
+
+ GREP.EXE $B$O, grep $B$O$b$A$m$s, egrep, fgrep $B$N5!G=$r4^$s$G$$$^$9.
+ grep $B$K -E $B%*%W%7%g%s$rM?$($k$H egrep $B$NF0:n, -F $B%*%W%7%g%s$rM?
+ $B$($k$H fgrep $B$NF0:n$r$7$^$9. $B$^$?, GREP.EXE $B$r EGREP.EXE,
+ FGREP.EXE $B$H$$$&L>A0$G%3%T!<$7$F egrep, fgrep $B$H$7$F5/F0$9$k$H,
+ $B$=$NL>A0$K$U$5$o$7$$F0:n$r$7$^$9. $B%O!<%I%G%#%9%/$KM>M5$N$J$$J}
+ $B0J30$O,
+
+ A>copy grep.exe a:\bin
+ A>copy grep.exe a:\bin\egrep.exe
+ A>copy grep.exe a:\bin\fgrep.exe
+
+ $B$J$I$H$7$F$4;HMQ$K$J$i$l$k$3$H$r$*4+$a$7$^$9. $B$I$&$7$F$b%O!<%I
+ $B%G%#%9%/$NL5BL;H$$$r$7$?$/$J$1$l$P,
+
+ @echo off
+ grep -E %1 %2 %3 %4 %5 %6 %7 %8 %9
+
+ $B$J$I$N%P%C%A%U%!%$%k$r:n@.$9$k$H$$$&<j$b$"$j$^$9.
+
+ 4. $B%3%^%s%I%i%$%s0z?t$K$D$$$F
+
+ $BA0=R$7$?$H$*$j MS-C $B$N setargv.obj $B$r%j%s%/$7$F$$$^$9$N$G, $B$=$N
+ $B;EMM$K=>$o$J$1$l$P$J$j$^$;$s.
+
+ $B#1$D#1$D$N0z?t$O6uGr$G6h@Z$j$^$9. $B0z?t$K6uGr, ", \, <, >, | $B$r
+ $B4^$`$H$-$O%/%)!<%F%#%s%0$,I,MW$G$9. $B$=$NJ}K!$O COMMAND.COM $B$N%P
+ $B%0=-$$;EMM$H, $B$5$i$K setargv.obj $B$K$bLdBj$,$"$j, $B$+$J$jFq$7$$$N
+ $B$G$3$3$G$O@bL@$r>J$-$^$9. $B3F<+8&5f$7$F$/$@$5$$. $B0lHV4JC1$J$N$O,
+ $B8!:w%Q%?!<%s$r%U%!%$%k$K$7$F
+
+ grep -f $B%U%!%$%kL>
+
+ $B$H$9$k$3$H$G$9.
+
+ 5. $B%^%K%e%"%k
+
+ roff $B7O$N%U%)!<%^%C%?$r;H$($J$$?M$N$?$a$K GNU roff $B$G%U%)!<%^%C
+ $B%H:Q$_$N%^%K%e%"%k$rMQ0U$7$^$7$?. $B%\!<%k%I%U%'!<%9, $B%"%s%@!<%i
+ $B%$%sBP1~$N less $B$J$I$G$*FI$_$/$@$5$$. $B%(%G%#%?$J$I$G$O ^H $B$,F~$C
+ $B$F$$$FFI$_$K$/$$$H;W$$$^$9.
+
+ s/.^H//g
+
+ $B$H$$$& sed $B$N%W%m%0%i%`$KDL$;$P, $BDL>o$N%F%-%9%H%U%!%$%k$,F@$i$l
+ $B$^$9. (^H $B$H$$$&$N$O%3%s%H%m!<%k%3!<%I$rD>@\Kd$a$3$`$H$$$&0UL#
+ $B$G$9.)
+
+$B!|%$%s%9%H!<%k (MS-DOS $B0J30)
+
+ $B%G%U%)%k%H$N%^%k%A%P%$%HJ8;z$N@_Dj$O, Makefile.in $B$NCf$G;XDj$7$^$9.
+ $B%G%U%)%k%H$r Shift-JIS $B$H$9$k>l9g$H, $B%G%U%)%k%H$G%^%k%A%P%$%HJ8;z$r
+ $B;HMQ$7$J$$>l9g$O Makefile.in $B$N MBCTYPE_DEF $B%^%/%m$NDj5A$r$=$l$>$l0J
+ $B2<$N$h$&$KJQ$($F$/$@$5$$.
+
+ MBCTYPE_DEF=-DSJIS ($B%G%U%)%k%H$G Shift-JIS $B$N>l9g)
+ MBCTYPE_DEF= ($B%G%U%)%k%H$G;HMQ$7$J$$>l9g)
+
+ $B$$$:$l$N>l9g$G$b5/F0;~$N%*%W%7%g%s$K$h$j%^%k%A%P%$%HJ8;z%3!<%I$NA*Br
+ $B$,2DG=$G$9.
+
+ $B$=$NB>$N:n6H$O, $B%*%j%8%J%k$N grep $B$HF1MM$G$9$N$G INSTALL $B$r$*FI$_$/
+ $B$@$5$$.
+
+$B!|%$%s%9%H!<%k (MS-DOS $BHG. $B$3$3$G$$$&%$%s%9%H!<%k$H$$$&$N$O, $B%=!<%9$+$i
+ $B$N%$%s%9%H!<%k$N$3$H$G$9)
+
+ MS-C 6.00A $B$r;HMQ$7$F, $B%G%U%)%k%H$G Shift-JIS $B$rG'<1$9$k grep $B$r:n@.
+ $B$9$k>l9g$O, README.MSC $B$KL\$rDL$7$F, $BI,MW$J$i%i%$%V%i%j$K%Q%C%A$rEv
+ $B$F$?8e,
+
+ A>nmake -f makefile.msc
+
+ $B$H$9$k$@$1$G#O#K$G$9. grep.exe $B$r:n@.8e, $B<+F0E*$K%F%9%H$r9T$$$^$9.
+ $B$=$N:], grep $B$+$i$N%(%i!<%a%C%;!<%8$,$$$/$D$+I=<($5$l$^$9$,, $B$=$l$O
+ $B0[>o$G$O$"$j$^$;$s. $B%(%i!<$r4^$`%Q%?!<%s$rEO$7$?;~$K, $B=*N;%9%F!<%?
+ $B%9$, 2 $B$H$J$k$3$H$r3NG'$7$F$$$k$@$1$G$9. $BK\Ev$K0[>o$,$"$C$?>l9g$O
+ "Spencer test #nn faild" (nn $B$O?t;z) $B$HI=<($5$l$^$9.
+
+ $B%F%9%H$K%Q%9$7$?$i, grep.exe $B$rE,Ev$J%G%#%l%/%H%j$K%3%T!<$7$F$/$@$5
+ $B$$. $B$=$N:], $BL>A0$r egrep.exe, fgrep.exe $B$HJQ$($k$@$1$G, $B$=$l$>$l
+ egrep, fgrep $B$NF0:n$r$7$^$9. $B$=$3$G, $BNc$($P a:\bin $B$X%$%s%9%H!<%k$9
+ $B$k>l9g,
+
+ A>copy grep.exe a:\bin
+ A>copy grep.exe a:\bin\egrep.exe
+ A>copy grep.exe a:\bin\fgrep.exe
+
+ $B$J$I$H$7$^$9.
+
+ $B$=$NB>$N=hM}7O$r;HMQ$9$k>l9g$d, $B%G%U%)%k%H$r Shift-JIS $B0J30$K$9$k>l
+ $B9g$O Makefile.msc $B$r;29M$K Makefile $B$r=q$$$F$/$@$5$$. $B$J$*, $B%F%9%H$K
+ $B$O awk (gawk) $B$,I,MW$G$9.
+
+$B!|%P%0
+
+ 1. $B$$$o$f$k JIS $B$K$OBP1~$7$F$$$^$;$s. $B>-MhBP1~$9$kM=Dj$b$"$j$^$;$s.
+
+ 2. $B%^%k%A%P%$%HJ8;z%3!<%I$O$"$^$j873J$K$O9M$($F$$$^$;$s.
+
+ EUC $B#1%P%$%HL\ ... 0x80 - 0xff
+ EUC $B#2%P%$%HL\ ... 0x01 - 0xff (0x0a $B$r=|$/)
+
+ Shift-JIS $B#1%P%$%HL\ ... 0x80 - 0x9f, 0xe0 - 0xff
+ Shift-JIS $B#2%P%$%HL\ ... 0x01 - 0xff (0x0a $B$r=|$/)
+
+ $B$H$7$F=hM}$7$F$$$^$9. $BH>3Q%+%J$b;H$($^$9. EUC $B$N SS3 (0x8f) $B$K
+ $B;O$^$k#3%P%$%H%3!<%I$O;H$($^$;$s. ($B;d$O$3$l$r%5%]!<%H$7$F$$$k%7
+ $B%9%F%`$r8+$?$3$H$,$J$$...)
+
+ 3. -b $B%*%W%7%g%s$GI=<($5$l$k%P%$%H%*%U%;%C%H$O DOS $B$N>l9g CR+LF $B$r 1
+ $B$H$7$F%+%&%s%H$7$?CM$K$J$j$^$9. ($B<jH4$-)
+
+$B!|%"%k%4%j%:%` (dfa.[ch] $B$N%^%k%A%P%$%HJ8;zBP1~2=)
+
+ $B0JA0$OGyA3$H, DFA $B$rD>@\ EUC $B$d Shift-JIS $B$N$h$&$JJ8;z<o$NB?$$%3!<%I
+ $B%;%C%H$KBP1~$5$;$k$N$O, $BHs>o$KFq$7$$$H;W$C$F$$$^$7$?. $B$H$3$m$,$"$k
+ $BF|, $B<+:n%i%$%V%i%j$N%F%9%HMQ$K, $B@55,I=8=$r DFA $B$XJQ49$9$k4JC1$J%W%m
+ $B%0%i%`$r=q$$$?$H$-$K, $BFMA3$&$^$$%"%$%G%#%"$,A.$$$?$N$G$9. $B%^%k%A%P
+ $B%$%HJ8;z$H$$$($I$b7k6I$O%P%$%H$NJB$S$G$9. $B%^%k%A%P%$%HJ8;z$r, $B$9$Y
+ $B$F%P%$%HC10L$KJ,2r$7$F, $B@55,I=8=$r:n$C$F$7$^$($P$h$+$C$?$N$G$9.
+
+ $B8@MU$G$O$&$^$/I=8=$G$-$J$$$N$G, $B0J2<$N5-9f$r;HMQ$7, $B$I$&$$$&$U$&$K%P
+ $B%$%HC10L$KJ,2r$7$F$$$k$N$+, $BNc$r5s$2$^$9.
+
+ a, b, c ... $B%7%s%0%k%P%$%HJ8;z.
+ x, y, z ... $B%^%k%A%P%$%HJ8;z$N#1J8;zL\.
+
+ . ($BG$0U$N#1J8;z)
+ ==> [a-c]|[x-z][a-z]
+
+ ($B%7%s%0%k%P%$%HJ8;z$+, $B$^$?$O%^%k%A%P%$%HJ8;z$N#1J8;zL\$H
+ $BG$0U$N#1J8;z$NO"@\.)
+
+ [xb-zx] (xb $B$+$i zx $B$NHO0O$N%^%k%A%P%$%HJ8;z)
+ ==> x[b-z]|y[a-z]|z[a-x]
+
+ yb*
+ ==> (yb)*
+
+ $B<B:]$K$O@55,I=8=$r:n$j=P$9$N$G$O$J$/, $B@55,I=8=$rJ,2r$7$?%H!<%/%s$rD>
+ $B@\@8@.$7$F$$$^$9. $B$3$NJU, $B6=L#$,$"$kJ}$O%=!<%9$r8+$?$[$&$,Aa$$$H;W
+ $B$$$^$9. ($B$"$^$j%(%l%,%s%H$G$O$"$j$^$;$s$N$G%=!<%9$r$8$C$/$j8+$i$l$k
+ $B$N$OCQ$:$+$7$$5$$b$7$^$9$,...)
+
+ $B$3$l$@$1$G$O, $BNc$($P$"$k%F%-%9%H$+$i xy $B$H$$$&J8;z$rC5$=$&$H$9$k$H,
+ xxyy $B$N$h$&$JJ8;z$NJB$S$K$^$GH?1~$7$F$7$^$$$^$9. $B$=$3$G, $B%^%k%A%P%$
+ $B%H%b!<%I$N$H$-$K$OI,$: "^.*(" + $B%f!<%6%Q%?!<%s + ")" $B$H$7$F=hM}$7$^
+ $B$9. '.*' $B$K$h$j, '.' $B$O%^%k%A%P%$%HJ8;z$N0lIt$K$O%^%C%A$7$^$;$s$+$i,
+ $BF,=P$7$G$-$k$o$1$G$9.
+
+$B!| dfa.[ch], regex.[ch] $B$N3HD%;EMM
+
+ dfa.[ch], regex.[ch] $B%b%8%e!<%k$O mbc.[ch] $B%b%8%e!<%k$K0MB8$7$F$$$^
+ $B$9. $B$^$?, $B$3$l$O%*%j%8%J%k$N;EMM$G$9$,, dfa.[ch] $B$r;HMQ$9$k>l9g$O
+ regex.h $B$NDj5A$,I,MW$G$9.
+
+ $B%^%k%A%P%$%HJ8;z$N%?%$%W$O, mbc.[ch] $B$N mbcinit() $B$G@_Dj$7$^$9.
+ mbc.h $B$KDj5A$5$l$F$$$k%^%/%m MBCTYPE_ASCII, MBCTYPE_EUC,
+ MBCTYPE_SJIS $B$N$$$:$l$+$r mbcinit() $B$KEO$7$F$/$@$5$$.
+
+ dfa.[ch] $B$O, $B%Q%?!<%s$N%3%s%Q%$%k;~$K$@$1, $B$3$N mbc.[ch] $B$N@_Dj$r;2
+ $B>H$7$^$9. $B%Q%?!<%s%^%C%A%s%0$N:]$O, $B%3%s%Q%$%k;~$K@_Dj$5$l$F$$$?,
+ $B%^%k%A%P%$%HJ8;z$N%?%$%W$r8!:w$7$^$9.
+
+ $B0lJ}, regex.[ch] $B$O, $B%Q%?!<%s%3%s%Q%$%k;~, $B%^%C%A%s%0;~$NN>J}$G
+ mbc.[ch] $B$N@_Dj$r;2>H$7$^$9. $B$,, $B$3$NN><T$G mbc.[ch] $B$N@_Dj$rJQ99$9
+ $B$k$3$H$O$G$-$^$;$s. $B$D$^$j, Shift-JIS $B$G5-=R$5$l$?%Q%?!<%s$r, EUC
+ $B%F%-%9%H$+$i8!:w$9$k$H$$$C$?F0:n$O$G$-$^$;$s. $BCm0U$7$F$/$@$5$$.
+
+ $B%^%k%A%P%$%HJ8;zBP1~$KH<$C$FCm0U$9$Y$-@55,I=8=$r0J2<$K5-$7$^$9.
+
+ . $BG$0U$N#1%P%$%HJ8;z, $B@5Ev$J%^%k%A%P%$%HJ8;z$K%^%C%A$7$^$9.
+ $B!V@5Ev$J%^%k%A%P%$%HJ8;z!W$H$O, $B%^%k%A%P%$%HJ8;z$N#1J8;z
+ $BL\$K, '\0' $B$^$?$O '\n' $B0J30$,B3$/J8;z$N$3$H$G$9.
+
+ [x-y] $BJ8;z%3!<%I ($BFbItI=8=) $B$, x $B$+$i y $B$NHO0O$K$"$kG$0U$N#1J8
+ $B;z$K%^%C%A$7$^$9. $B$3$l$b . $B$HF1$8$/, $B@5Ev$G$J$$J8;z$K$O
+ $B%^%C%A$7$^$;$s.
+
+ [^x-y] $BJ8;z%3!<%I ($BFbItI=8=) $B$, x $B$+$i y $B$NHO0O$K$J$$G$0U$N#1J8
+ $B;z$K%^%C%A$7$^$9. $B@5Ev$G$J$$J8;z$K$b%^%C%A$7$^$9.
+
+ $B%^%k%A%P%$%HJ8;z$NFbItI=8=$OC1$K#1%P%$%HL\$r>e0L%P%$%H, $B#2%P%$%HL\$r
+ $B2<0L%P%$%H$H$7$?#1#6%S%C%HId9f$J$7@0?t$G$9. Shift-JIS $B$G$b EUC $B$G$b
+
+ $B#1%P%$%H ASCII $BJ8;z < $BH>3Q%+%JJ8;z < $BA43QJ8;z
+
+ $B$H$$$&Bg>.4X78$,@.$jN)$C$F$$$^$9.
+
+$B!|$=$NB>
+
+ 1. $B%*%j%8%J%k$N GNU grep $B$NCx:n8"$O Free Software Foundation, Inc.
+ $B$,M-$7$F$$$^$9. $B%Q%C%AItJ, (grep-mb.diff) $B$NCx:n8"$O;d (t^2) $B$,M-
+ $B$7$F$$$^$9.
+
+ 2. GNU grep $B$N%=!<%9%3!<%I$O3F=j$N ftp $B%5%$%H, $B$b$7$/$O Nifty-serve
+ $B$N FUNIX $B$N%G!<%?%i%$%V%i%j$+$iF~<j2DG=$G$9. GNU grep $B$+$i
+ grep+mb $B$X$N:9J, grep-mb.diff $B$O, $B;d$, FUNIX $B$XEPO?$7, $BF21`OBO:;a
+ (dohzono@sdsft.kme.mei.co.jp) $B$, fj.sources $B$X%]%9%H$7$F$/$@$5$C
+ $B$F$$$^$9.
+
+ 3. $B:9J, grep-mb.diff $B$N:FG[I[$O<+M3$G$9. $B$3$l$K4X$7$F$O FSF $B$N5,Dj$K
+ $B=>$&I,MW$b$"$j$^$;$s. $B$7$+$7:9J,$rE,MQ$7$?7k2L$N%=!<%9%3!<%I, $B$*
+ $B$h$S<B9T7A<0$G$N:FG[I[$N:]$O GNU GENERAL PUBLIC LICENSE (COPYING
+ $B;2>H) $B$K=>$C$F$/$@$5$$.
+
+ grep+mb $B$K2?$i$+$N2~JQ$r2C$($?$b$N$r:FG[I[$9$k:]$b, GNU GENERAL
+ PUBLIC LICENSE $B$K=>$&$h$&$KCm0U$7$F$/$@$5$$. $B$^$? grep+mb $B$K4^$^
+ $B$l$k%3!<%I (dfa.[ch] $B$d regex.[ch]) $B$rMxMQ$7$?%W%m%0%i%`$rG[I[$9
+ $B$k:]$b GNU GENERAL PUBLIC LICENSE $B$N3:EvItJ,$K=>$C$F$/$@$5$$.
+
+ $B$^$?5AL3$G$O$"$j$^$;$s$,:FG[I[$5$l$kJ}$O;v8e$K$G$bO"Mm$r$/$@$5$$.
+ $B$=$7$F2DG=$J8B$j, $B?7$7$$%P!<%8%g%s$X$N%"%C%W%G!<%H$KEX$a, $BMxMQ<T
+ $B$+$i$NO"Mm$,;d$KFO$/$h$&$KG[N8$7$F$/$@$5$$.
+
+ 4. $B$3$N%W%m%0%i%`$OL5J]>Z$G$9.
+
+ 5. grep+mb $B$K2?$i$+$NIT6q9g$,H/@8$7$?>l9g, (FSF $B$d, $B%*%j%8%J%k$N:n<T
+ $B$G$O$J$/) $B;d$KO"Mm$7$F$/$@$5$$. $BG[I[$7$??M$,4uK>$7$F$$$k>l9g$O,
+ $B$=$N?M$KO"Mm$7$F$/$@$5$$.
+
+ 6. $B$4<ALd/$B$4MWK>/$B$*<8$j, $B$=$NB>$bBg4?7^$G$9. $B$G$-$k$+$.$j%5%]!<%H$7
+ $B$^$9.
+
+$B!|<U<-
+
+ $B86:n<T$*$h$S FSF $B$K46<U$7$^$9.
+
+ $B%I%-%e%a%s%H:n@.$K4X$7$F=u8@$r$/$@$5$C$?F21`OBO:;a
+ (dohzono@sdsft.kme.mei.co.jp) $B$K46<U$7$^$9.
+
+ $B$3$l$^$GE>:\/$B%P%0Js9p$r$/$@$5$C$?J}!9$K46<U$7$^$9. $B<BL>$r5s$2$5$;$F
+ $BD:$-$?$+$C$?$N$G$9$,%O!<%I%G%#%9%/$N%H%i%V%k$G$[$H$s$I$N%a!<%k$r>C<:
+ $B$5$;$F$7$^$$$^$7$?.
+
+ $B:G8e$K, $B5.=E$J%G%#%9%/%9%Z!<%9$r grep+mb $B$N$?$a$K3d$$$F$4;HMQD:$$$F
+ $B$$$k$9$Y$F$NMxMQ<T$NJ}!9$K46<U$7$^$9.
+
+$B!|!V;d!W$NO"Mm@h
+
+ $B")810 $BJ!2,;TCf1{6hG_8w1`CDCO 7-207 ($BCm: $BE>5o$7$^$7$?)
+ TEL/FAX: 092-731-4025 (TEL/FAX $B<+F0@ZBX$()
+ 092-724-6342 (TEL $B$N$_)
+ E-mail: NBC02362@niftyserve.or.jp $BC+K\9'9@
+
+# Local variables:
+# mode: indented-text
+# indent-tabs-mode: nil
+# tab-stop-list: (4 8 16 24 32 40 48 56 64 72 80)
+# left-margin: 4
+# fill-column: 72
+# fill-prefix: " "
+# version-control: never
+# End:
diff -ru2N grep-2.0/README.MSC grep+mb1.04/README.MSC
--- grep-2.0/README.MSC Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/README.MSC Sat Mar 5 16:14:14 1994
@@ -0,0 +1,99 @@
+PC-9801 $BMQ MS-C version 6.00A $B$N0z?t$N%;%C%H%"%C%W%k!<%A%s$K$O%P%0$,$"$j
+$B$^$9.
+
+#include <stdio.h>
+
+int
+main(int argc, char **argv)
+{
+ int i;
+
+ for (i = 0; i <= argc; i++)
+ printf("argv[%d] == %s\n", i, argv[i]);
+ return 0;
+}
+
+$B$r%3%s%Q%$%k, $B%j%s%/$7$? FOO.EXE $B$K
+
+ A>foo "\\" abc
+
+$B$J$I$N0z?t$rEO$7$F<B9T$9$k$H, $B%P%0$,3NG'$G$-$^$9. $B$^$?, $B%o%$%k%I%+!<%I
+$BE83+%k!<%A%s$K$b%P%0$,$"$j, $B>e5-$N%W%m%0%i%`$r SETARGV.OBJ $B$H$H$b$K%j%s
+$B%/$7$F
+
+ A>foo \DOS\*.com
+
+$B$J$I$N0z?t$G<B9T$9$k$H, $B$*$+$7$JE83+$N;EJ}$r$7$F$7$^$$$^$9.
+
+$B$3$N%P%0$O SOURCE/STARTUP $B2<$N DOS/STDARGV.ASM $B$*$h$S WILD.C $B$K0J2<$N%Q%C
+$B%A$rEv$F$k$H=$@5$G$-$k$h$&$G$9. $B%Q%C%A$rEv$F$F STARTUP.BAT $B$G%3%s%Q%$%k
+$B$7$F$/$@$5$$. $B$=$N8e, $BNc$($P%i!<%8%b%G%kMQ$N%i%$%V%i%j$r=$@5$9$k>l9g,
+L/DOS/STDARGV.OBJ, L/DOS/_SETARGV.OBJ, L/WILD.OBJ $B$r$=$l$>$l
+KSTDARGV.OBJ, _KSTARGV.OBJ, KWILD.OBJ $B$H%j%M!<%`$7,
+
+ lib \msc6\lib\llibce.lib-+dos\kstdargv.obj-+dos\_kstargv.obj-+kwild.obj;
+
+$B$J$I$H$7$F%b%8%e!<%k$r99?7$7$F$/$@$5$$. $BG0$N$?$a$3$N:n6H$r9T$&A0$K,
+
+ lib \msc6\lib\llibce.lib*kstdargv.obj*_kstargv.obj*kwild.obj;
+
+$B$J$I$G, kstdargv.obj, _kstargv.obj, kwild.obj $B$N%P%C%/%"%C%W$r$H$C$FCV$/
+$B$H$$$$$G$7$g$&.
+
+$B$J$*, $B$3$N%Q%C%A$OEvA3$N$3$H$J$,$iL5J]>Z$G$9.
+
+Mar. 5, 1994 t^2
+
+*** stdargv.org Mon Oct 8 19:50:46 1990
+--- stdargv.asm Thu Jul 22 17:50:44 1993
+***************
+*** 409,415 ****
+ shr cx,1
+ adc dx,cx ; add 1 for every pair of backslashes
+ test al,1 ; plus 1 for the " if odd number of \
+! jz arg310 ; [J1]
+ jmp arg210 ; [J1]
+ ;
+ ; Command line is fully parsed - compute number of bytes needed
+--- 409,415 ----
+ shr cx,1
+ adc dx,cx ; add 1 for every pair of backslashes
+ test al,1 ; plus 1 for the " if odd number of \
+! jnz arg310 ; ! Jul.21.93 t^2
+ jmp arg210 ; [J1]
+ ;
+ ; Command line is fully parsed - compute number of bytes needed
+
+*** wild.org Mon Oct 8 19:49:48 1990
+--- wild.c Sat Mar 5 00:42:12 1994
+***************
+*** 186,197 ****
+ char *ptr2 = arg; // [J1]
+
+ if(ptr != arg) { // [J1]
+! while(ptr2 + 1 != ptr && *ptr2 != SLASHCHAR && *ptr2 != FWDSLASHCHAR
+! && *ptr2 != ':') { // [J1]
+ if(iskanji(*ptr2)) ptr2++; // [J1]
+ ptr2++; // [J1]
+ } // [J1]
+! ptr = ptr2; // [J1]
+ } // [J1]
+
+ if (*ptr == ':' && ptr != arg+1) /* weird name, just add it as is */
+--- 186,201 ----
+ char *ptr2 = arg; // [J1]
+
+ if(ptr != arg) { // [J1]
+! char *ptr3 = arg;
+!
+! while (ptr2 < ptr) {
+! if (*ptr2 == SLASHCHAR || *ptr2 == FWDSLASHCHAR
+! || *ptr2 == ':')
+! ptr3 = ptr2;
+ if(iskanji(*ptr2)) ptr2++; // [J1]
+ ptr2++; // [J1]
+ } // [J1]
+! ptr = ptr3;
+ } // [J1]
+
+ if (*ptr == ':' && ptr != arg+1) /* weird name, just add it as is */
diff -ru2N grep-2.0/configure grep+mb1.04/configure
--- grep-2.0/configure Sat May 22 13:20:23 1993
+++ grep+mb1.04/configure Fri Jul 9 13:05:45 1993
@@ -566,5 +566,5 @@
fi
-for func in getpagesize memchr strerror valloc
+for func in getpagesize memchr strerror valloc bcopy memmove strcasecmp
do
trfunc=HAVE_`echo $func | tr '[a-z]' '[A-Z]'`
diff -ru2N grep-2.0/configure.in grep+mb1.04/configure.in
--- grep-2.0/configure.in Sat May 22 13:20:16 1993
+++ grep+mb1.04/configure.in Fri Jul 9 13:05:32 1993
@@ -12,5 +12,5 @@
AC_SIZE_T
AC_ALLOCA
-AC_HAVE_FUNCS(getpagesize memchr strerror valloc)
+AC_HAVE_FUNCS(getpagesize memchr strerror valloc bcopy memmove strcasecmp)
AC_CHAR_UNSIGNED
AC_CONST
diff -ru2N grep-2.0/dfa.c grep+mb1.04/dfa.c
--- grep-2.0/dfa.c Mon May 31 08:02:20 1993
+++ grep+mb1.04/dfa.c Sat Jul 10 01:17:14 1993
@@ -18,4 +18,6 @@
/* Written June, 1988 by Mike Haertel
Modified July, 1988 by Arthur David Olson to assist BMG speedups */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
#include <assert.h>
@@ -35,4 +37,8 @@
#undef index
#define index strchr
+#undef bcopy
+#define bcopy(s, d, n) memcpy(d, s, n)
+#undef bzero
+#define bzero(d, n) memset(d, 0, n)
#else
#include <strings.h>
@@ -71,4 +77,5 @@
#include "dfa.h"
#include "regex.h"
+#include "mbc.h"
#if __STDC__
@@ -141,5 +148,8 @@
fprintf(stderr, "END");
else if (t < NOTCHAR)
- fprintf(stderr, "%c", t);
+ if (t & 0x80)
+ fprintf(stderr, "0x%02x", (unsigned char)t);
+ else
+ fprintf(stderr, "%c", t);
else
{
@@ -239,4 +249,16 @@
}
+static int
+isemptyset(s)
+ charclass s;
+{
+ int i;
+
+ for (i = 0; i < CHARCLASS_INTS; i++)
+ if (s[i])
+ return 0;
+ return 1;
+}
+
/* A pointer to the current dfa is kept here during parsing. */
static struct dfa *dfa;
@@ -259,5 +281,6 @@
/* Syntax bits controlling the behavior of the lexical analyzer. */
-static int syntax_bits, syntax_bits_set;
+static unsigned long syntax_bits;
+static int syntax_bits_set;
/* Flag for case-folding letters into sets. */
@@ -267,5 +290,5 @@
void
dfasyntax(bits, fold)
- int bits;
+ unsigned long bits;
int fold;
{
@@ -289,4 +312,66 @@
static int minrep, maxrep; /* Repeat counts for {m,n}. */
+static charclass cs_cset[8];
+static token cs_tok[8] = {0, 0, 0, 0, 0, 0, 0, 0};
+
+static enum {
+ MBEXTTOK_NONE = -1,
+ MBEXTTOK_NOTCHAR = 256,
+ MBEXTTOK_ORMBC = MBEXTTOK_NOTCHAR,
+ MBEXTTOK_ORMBC_NL,
+ MBEXTTOK_CLASS,
+ MBEXTTOK_INVCLASS,
+} mbexttok = MBEXTTOK_NONE;
+
+static charclass mbcset_set;
+static charclass mbcset_all;
+static charclass mbcset[128]; /* 128*256/8 = 4 Kbytes */
+
+/* $BIQHK$K;HMQ$5$l$k ($B$H;W$o$l$k) $BJ8;z=89g$r%H!<%/%s$H$7$FJV$9.
+ n = 0 ... 1$B%P%$%HJ8;zA4BN$N=89g.
+ 1 ... 2$B%P%$%HJ8;z$N1$B%P%$%HL\A4BN$N=89g.
+ 2 ... 2$B%P%$%HJ8;z$N2$B%P%$%HL\A4BN$N=89g.
+ +4 ... '\n'$B$r=|30$7$J$$. */
+static token
+setcodeset(n)
+ int n;
+{
+ token c;
+
+ if (!cs_tok[n]) {
+ zeroset(cs_cset[n]);
+ switch (n) {
+ case 0:
+ case 4:
+ /* 1$B%P%$%HJ8;zA4BN$N=89g. */
+ for (c = 0; c < NOTCHAR; c++)
+ if (ismbchar(c))
+ setbit(c, cs_cset[n]);
+ notset(cs_cset[n]);
+ break;
+ case 1:
+ case 5:
+ /* 2$B%P%$%HJ8;z$N1$BJ8;zL\A4BN$N=89g. */
+ for (c = 0; c < NOTCHAR; c++)
+ if (ismbchar(c))
+ setbit(c, cs_cset[n]);
+ break;
+ case 2:
+ case 6:
+ /* 2$B%P%$%HJ8;z$N2$BJ8;zL\A4BN$N=89g. */
+ notset(cs_cset[n]);
+ break;
+ }
+ if (!(n & 4)) {
+ if (syntax_bits & RE_DOT_NOT_NULL || n != 0)
+ clrbit('\0', cs_cset[n]);
+ if (!(syntax_bits & RE_DOT_NEWLINE) || n != 0)
+ clrbit('\n', cs_cset[n]);
+ }
+ cs_tok[n] = CSET + charclass_index(cs_cset[n]);
+ }
+ return cs_tok[n];
+}
+
/* Note that characters become unsigned here. */
#define FETCH(c, eoferr) \
@@ -362,4 +447,5 @@
it means that just about every case begins with
"if (backslash) ...". */
+ mbexttok = MBEXTTOK_NONE;
for (i = 0; i < 2; ++i)
{
@@ -543,14 +629,19 @@
if (backslash)
goto normal_char;
+ if (current_mbctype != MBCTYPE_ASCII)
+ mbexttok = MBEXTTOK_ORMBC;
+ laststart = 0;
+ return setcodeset(0);
+
+ case 'w':
+ if (!backslash)
+ goto normal_char;
zeroset(ccl);
- notset(ccl);
- if (!(syntax_bits & RE_DOT_NEWLINE))
- clrbit('\n', ccl);
- if (syntax_bits & RE_DOT_NOT_NULL)
- clrbit('\0', ccl);
+ for (c2 = 0; c2 < NOTCHAR; ++c2)
+ if (ISALNUM(c2))
+ setbit(c2, ccl);
laststart = 0;
return lasttok = CSET + charclass_index(ccl);
- case 'w':
case 'W':
if (!backslash)
@@ -558,8 +649,7 @@
zeroset(ccl);
for (c2 = 0; c2 < NOTCHAR; ++c2)
- if (ISALNUM(c2))
+ if (!ISALNUM(c2) && !ismbchar(c2))
setbit(c2, ccl);
- if (c == 'W')
- notset(ccl);
+ mbexttok = MBEXTTOK_ORMBC_NL;
laststart = 0;
return lasttok = CSET + charclass_index(ccl);
@@ -579,4 +669,6 @@
do
{
+ unsigned char ch = 0, c2h = 0;
+
/* Nobody ever said this had to be fast. :-)
Note that if we're looking at some other [:...:]
@@ -599,4 +691,8 @@
if (c == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
FETCH(c, "Unbalanced [");
+ if (ismbchar(c)) {
+ ch = (unsigned char)c;
+ FETCH(c, "Multi-byte char incomplete");
+ }
FETCH(c1, "Unbalanced [");
if (c1 == '-')
@@ -616,19 +712,83 @@
&& (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
FETCH(c2, "Unbalanced [");
+ if (ismbchar(c2)) {
+ c2h = (unsigned char)c2;
+ FETCH(c2, "Multi-byte char incomplete");
+ }
FETCH(c1, "Unbalanced [");
}
}
- else
+ else {
+ c2h = ch;
c2 = c;
- while (c <= c2)
- {
- setbit(c, ccl);
- if (case_fold)
- if (ISUPPER(c))
- setbit(tolower(c), ccl);
- else if (ISLOWER(c))
- setbit(toupper(c), ccl);
- ++c;
+ }
+ if (ch < c2h || (ch == c2h && c <= c2)) {
+ if (ch == 0) {
+ ch = (unsigned char)c2;
+ if (c2h > 0)
+ ch = NOTCHAR - 1;
+ for (; (unsigned char)c <= ch; c++) {
+ setbit(c, ccl);
+ if (case_fold) {
+ if (ISUPPER(c))
+ setbit(tolower(c), ccl);
+ else if (ISLOWER(c))
+ setbit(toupper(c), ccl);
+ }
+ }
+ ch = 0x80;
+ c = 0x00;
}
+ if (ch <= c2h) {
+ if (mbexttok < 0) {
+ mbexttok = MBEXTTOK_CLASS;
+ zeroset(mbcset_set);
+ zeroset(mbcset_all);
+ }
+ if (ch < c2h && c != 0x00) { /* $B:G=i$NH>C< */
+ int t;
+
+ if (ismbchar(ch)
+ && ((t = tstbit(ch, mbcset_set))
+ || !tstbit(ch, mbcset_all))) {
+ if (!t) {
+ setbit(ch, mbcset_set);
+ zeroset(mbcset[ch - 0x80]);
+ }
+ for (; c < NOTCHAR; c++)
+ setbit(c, mbcset[ch - 0x80]);
+ }
+ ch++;
+ c = 0x00;
+ }
+ if (ch < c2h || (ch == c2h && c == 0x00 && c2 == 0xff)) {
+ if (c == 0x00 && c2 == 0xff)
+ c2h++;
+ for (; ch < c2h; ch++)
+ if (ismbchar(ch)) {
+ clrbit(ch, mbcset_set);
+ setbit(ch, mbcset_all);
+ }
+ if (c == 0x00 && c2 == 0xff)
+ c2h--;
+ c = 0x00;
+ }
+ if (ch <= c2h) {
+ int t;
+
+ /* $B$3$3$G$OI,$: c <= c2 $B$H$J$C$F$$$k. */
+ if (ismbchar(ch)
+ && ((t = tstbit(ch, mbcset_set))
+ || !tstbit(ch, mbcset_all))) {
+ if (!t) {
+ setbit(ch, mbcset_set);
+ zeroset(mbcset[ch - 0x80]);
+ }
+ for (; c <= c2; c++)
+ setbit(c, mbcset[ch - 0x80]);
+ }
+ }
+ }
+ }
skip:
;
@@ -640,5 +800,20 @@
if (syntax_bits & RE_HAT_LISTS_NOT_NEWLINE)
clrbit('\n', ccl);
+ if (mbexttok == MBEXTTOK_CLASS) {
+ mbexttok = MBEXTTOK_INVCLASS;
+ if (!isemptyset(mbcset_set)) {
+ for (c = 0x80; c <= 0xff; c++)
+ if (tstbit(c, mbcset_set))
+ notset(mbcset[c - 0x80]);
+ }
+ notset(mbcset_all);
+ }
+ else
+ mbexttok = MBEXTTOK_ORMBC_NL;
}
+ if (current_mbctype != MBCTYPE_ASCII)
+ for (c = 0x80; c <= 0xff; c++)
+ if (ismbchar(c))
+ clrbit(c, ccl);
laststart = 0;
return lasttok = CSET + charclass_index(ccl);
@@ -647,4 +822,8 @@
normal_char:
laststart = 0;
+ if (ismbchar(c)) {
+ FETCH(mbexttok, "Multi-byte char incomplete");
+ return c;
+ }
if (case_fold && ISALPHA(c))
{
@@ -746,5 +925,67 @@
atom()
{
- if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
+ if (mbexttok >= 0) {
+ if (mbexttok < MBEXTTOK_NOTCHAR) {
+ addtok(tok);
+ addtok(mbexttok);
+ addtok(CAT);
+ }
+ else
+ switch (mbexttok) {
+ case MBEXTTOK_ORMBC:
+ case MBEXTTOK_ORMBC_NL:
+ addtok(tok);
+ if (mbexttok == MBEXTTOK_ORMBC) {
+ addtok(setcodeset(1));
+ addtok(setcodeset(2));
+ }
+ else {
+ addtok(setcodeset(5));
+ addtok(setcodeset(6));
+ }
+ addtok(CAT);
+ addtok(OR);
+ break;
+ case MBEXTTOK_CLASS:
+ case MBEXTTOK_INVCLASS:
+ {
+ token c;
+
+ addtok(tok);
+ if (!isemptyset(mbcset_set))
+ for (c = 0x80; c <= 0xff; c++)
+ if (tstbit(c, mbcset_set)) {
+ /* Make sure all bits in mbcset_all valid. */
+ clrbit(c, mbcset_all);
+ addtok(c);
+ if (mbexttok == MBEXTTOK_CLASS) {
+ clrbit('\n', mbcset[c - 0x80]);
+ clrbit('\0', mbcset[c - 0x80]);
+ }
+ else {
+ setbit('\n', mbcset[c - 0x80]);
+ setbit('\0', mbcset[c - 0x80]);
+ }
+ addtok(CSET + charclass_index(mbcset[c - 0x80]));
+ addtok(CAT);
+ addtok(OR);
+ }
+ if (!isemptyset(mbcset_all)) {
+ addtok(CSET + charclass_index(mbcset_all));
+ if (mbexttok == MBEXTTOK_CLASS)
+ addtok(setcodeset(2));
+ else
+ addtok(setcodeset(6));
+ addtok(CAT);
+ addtok(OR);
+ }
+ }
+ break;
+ default:
+ break;
+ }
+ tok = lex();
+ }
+ else if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
|| tok == BEGLINE || tok == ENDLINE || tok == BEGWORD
|| tok == ENDWORD || tok == LIMWORD || tok == NOTLIMWORD)
@@ -1904,4 +2145,6 @@
d->musts = 0;
+
+ bzero(cs_tok, sizeof cs_tok);
}
@@ -1916,8 +2159,8 @@
if (case_fold) /* dummy folding in service of dfamust() */
{
- char *copy;
+ char *copy, *p;
int i;
- copy = malloc(len);
+ p = copy = malloc(len + 7);
if (!copy)
dfaerror("out of memory");
@@ -1925,23 +2168,61 @@
/* This is a kludge. */
case_fold = 0;
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ *p++ = '^';
+ *p++ = '.';
+ *p++ = '*';
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = '(';
+ }
for (i = 0; i < len; ++i)
if (ISUPPER(s[i]))
- copy[i] = tolower(s[i]);
+ *p++ = tolower((unsigned char)s[i]);
else
- copy[i] = s[i];
+ *p++ = s[i];
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = ')';
+ }
dfainit(d);
- dfaparse(copy, len, d);
- free(copy);
+ dfaparse(copy, p - copy, d);
dfamust(d);
d->cindex = d->tindex = d->depth = d->nleaves = d->nregexps = 0;
+ bzero(cs_tok, sizeof cs_tok);
case_fold = 1;
- dfaparse(s, len, d);
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ bcopy(s, copy + (syntax_bits & RE_NO_BK_PARENS ? 4 : 5), len);
+ dfaparse(copy, p - copy, d);
+ }
+ else
+ dfaparse(s, len, d);
dfaanalyze(d, searchflag);
+ free(copy);
}
else
{
dfainit(d);
- dfaparse(s, len, d);
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ char *copy, *p;
+
+ p = copy = malloc(len + 7);
+ *p++ = '^';
+ *p++ = '.';
+ *p++ = '*';
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = '(';
+ bcopy(s, p, len);
+ p += len;
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = ')';
+ dfaparse(copy, p - copy, d);
+ free(copy);
+ }
+ else
+ dfaparse(s, len, d);
dfamust(d);
dfaanalyze(d, searchflag);
diff -ru2N grep-2.0/dfa.h grep+mb1.04/dfa.h
--- grep-2.0/dfa.h Mon Apr 12 06:17:22 1993
+++ grep+mb1.04/dfa.h Wed Jul 7 17:02:13 1993
@@ -15,4 +15,6 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 7, 1993 by t^2 */
/* Written June, 1988 by Mike Haertel */
@@ -306,5 +308,5 @@
/* dfasyntax() takes two arguments; the first sets the syntax bits described
earlier in this file, and the second sets the case-folding flag. */
-extern void dfasyntax(int, int);
+extern void dfasyntax(unsigned long, int);
/* Compile the given string of the given length into the given struct dfa.
diff -ru2N grep-2.0/getpagesize.h grep+mb1.04/getpagesize.h
--- grep-2.0/getpagesize.h Fri May 21 14:18:58 1993
+++ grep+mb1.04/getpagesize.h Sat Jul 10 02:19:10 1993
@@ -1,2 +1,4 @@
+/* Multi-byte extension added Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
#ifdef BSD
#ifndef BSD4_1
@@ -35,5 +37,9 @@
#endif /* no EXEC_PAGESIZE */
#else /* !HAVE_SYS_PARAM_H */
+#ifndef MSDOS
#define getpagesize() 8192 /* punt totally */
+#else
+#define getpagesize() 4096
+#endif
#endif /* !HAVE_SYS_PARAM_H */
#endif /* no _SC_PAGESIZE */
diff -ru2N grep-2.0/grep.c grep+mb1.04/grep.c
--- grep-2.0/grep.c Sun May 23 14:52:52 1993
+++ grep+mb1.04/grep.c Thu Jun 2 17:01:53 1994
@@ -17,4 +17,6 @@
Written July 1992 by Mike Haertel. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jun. 2, 1994 by t^2 */
#include <errno.h>
@@ -22,6 +24,8 @@
#ifndef errno
+#ifndef MSDOS
extern int errno;
#endif
+#endif
#ifdef STDC_HEADERS
@@ -59,4 +63,5 @@
#include "getpagesize.h"
#include "grep.h"
+#include "mbc.h"
#undef MAX
@@ -315,6 +320,6 @@
cc = read(bufdesc, buffer + bufsalloc, bufalloc - bufsalloc);
#endif
- if (cc > 0)
- buflim = buffer + bufsalloc + cc;
+ if (cc != -1)
+ buflim = buffer + bufsalloc + (unsigned)cc;
else
buflim = buffer + bufsalloc;
@@ -332,10 +337,10 @@
/* Internal variables to keep track of byte count, context, etc. */
-static size_t totalcc; /* Total character count before bufbeg. */
+static unsigned long totalcc; /* Total character count before bufbeg. */
static char *lastnl; /* Pointer after last newline counted. */
static char *lastout; /* Pointer after last character output;
NULL if no character has been output
or if it's conceptually before bufbeg. */
-static size_t totalnl; /* Total newline count before lastnl. */
+static unsigned long totalnl; /* Total newline count before lastnl. */
static int pending; /* Pending lines of output. */
@@ -363,5 +368,5 @@
{
nlscan(beg);
- printf("%d%c", ++totalnl, sep);
+ printf("%lu%c", ++totalnl, sep);
lastnl = lim;
}
@@ -519,5 +524,5 @@
for (;;)
{
- if (fillbuf(save) < 0)
+ if (fillbuf(save) == -1)
{
error(filename, errno);
@@ -564,8 +569,10 @@
}
-static char version[] = "GNU grep version 2.0";
+static char version[] = "GNU grep version 2.0\
+ + multi-byte extension 1.04";
#define USAGE \
- "usage: %s [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-[ef]] <expr> [<files...>]\n"
+ "usage: %s [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-W ctype=...]\n\
+ [-[ef]] <expr> [<files...>]\n"
static void
@@ -594,4 +601,32 @@
}
+#ifndef HAVE_STRCASECMP
+
+static int
+#ifdef __STDC__
+strcasecmp(const char *s1, const char *s2)
+#else
+strcasecmp(s1, s2)
+ char *s1, *s2;
+#endif
+{
+ unsigned char c1, c2;
+
+ while ((c1 = *s1++)) {
+ if ((unsigned char)(c1 - 'A') <= (unsigned char)('Z' - 'A'))
+ c1 += 'a' - 'A';
+ c2 = *s2++;
+ if ((unsigned char)(c2 - 'A') <= (unsigned char)('Z' - 'A'))
+ c2 += 'a' - 'A';
+ if (c1 != c2) {
+ --s2;
+ break;
+ }
+ }
+ return c1 - (unsigned char)*s2;
+}
+
+#endif
+
int
main(argc, argv)
@@ -607,7 +642,27 @@
extern int optind;
- prog = argv[0];
- if (prog && strrchr(prog, '/'))
- prog = strrchr(prog, '/') + 1;
+ if ((prog = argv[0]) && prog[0]) {
+ char c, *p;
+#ifdef MSDOS
+ static char progname[8 + 1];
+#endif
+
+ for (p = prog; (c = *p++); )
+ if (c == '/'
+#ifdef MSDOS
+ || c == '\\' || c == ':'
+#endif
+ )
+ prog = p;
+#ifdef MSDOS
+ for (p = progname; p < &progname[8] && (c = *prog++) && c != '.'; ) {
+ if ((unsigned char)(c - 'A') <= (unsigned char)('Z' - 'A'))
+ c += 'a' - 'A';
+ *p++ = c;
+ }
+ *p++ = '\0';
+ prog = argv[0] = progname;
+#endif
+ }
keys = NULL;
@@ -620,5 +675,5 @@
matcher = NULL;
- while ((opt = getopt(argc, argv, "0123456789A:B:CEFGVX:bce:f:hiLlnqsvwxy"))
+ while ((opt = getopt(argc, argv, "0123456789A:B:CEFGVX:bce:f:hiLlnqsvwxyW:"))
!= EOF)
switch (opt)
@@ -747,4 +802,19 @@
case 'x':
match_lines = 1;
+ break;
+ case 'W':
+ if (strcasecmp(optarg, "ctype=ASCII") == 0) {
+ mbcinit(MBCTYPE_ASCII);
+ break;
+ }
+ if (strcasecmp(optarg, "ctype=EUC") == 0) {
+ mbcinit(MBCTYPE_EUC);
+ break;
+ }
+ if (strcasecmp(optarg, "ctype=SJIS") == 0) {
+ mbcinit(MBCTYPE_SJIS);
+ break;
+ }
+ fatal("unknown argument to -Wctype", 0);
break;
default:
diff -ru2N grep-2.0/kwset.c grep+mb1.04/kwset.c
--- grep-2.0/kwset.c Mon May 3 04:26:20 1993
+++ grep+mb1.04/kwset.c Fri Jul 9 14:54:46 1993
@@ -19,4 +19,6 @@
The author may be reached (Email) at the address mike@ai.mit.edu,
or (US mail) as Mike Haertel c/o Free Software Foundation. */
+/* Multi-byte extension added Jul, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
/* The algorithm implemented by these routines bears a startling resemblence
@@ -592,5 +594,5 @@
if (d != 0)
continue;
- if (tp[-2] == gc)
+ if (U(tp[-2]) == gc)
{
for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
diff -ru2N grep-2.0/mbc.c grep+mb1.04/mbc.c
--- grep-2.0/mbc.c Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/mbc.c Fri Jul 9 14:38:28 1993
@@ -0,0 +1,98 @@
+/* Functions for multi-byte support.
+ Created for grep multi-byte extension Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
+#include "mbc.h"
+
+static const unsigned char mbctab_ascii[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+static const unsigned char mbctab_euc[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
+};
+
+static const unsigned char mbctab_sjis[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
+};
+
+#ifdef EUC
+const unsigned char *mbctab = mbctab_euc;
+int current_mbctype = MBCTYPE_EUC;
+#else
+#ifdef SJIS
+const unsigned char *mbctab = mbctab_sjis;
+int current_mbctype = MBCTYPE_SJIS;
+#else
+const unsigned char *mbctab = mbctab_ascii;
+int current_mbctype = MBCTYPE_ASCII;
+#endif
+#endif
+
+void
+#ifdef __STDC__
+mbcinit(int mbctype)
+#else
+mbcinit(mbctype)
+ int mbctype;
+#endif
+{
+ switch (mbctype) {
+ case MBCTYPE_ASCII:
+ mbctab = mbctab_ascii;
+ current_mbctype = MBCTYPE_ASCII;
+ break;
+ case MBCTYPE_EUC:
+ mbctab = mbctab_euc;
+ current_mbctype = MBCTYPE_EUC;
+ break;
+ case MBCTYPE_SJIS:
+ mbctab = mbctab_sjis;
+ current_mbctype = MBCTYPE_SJIS;
+ break;
+ }
+}
diff -ru2N grep-2.0/mbc.h grep+mb1.04/mbc.h
--- grep-2.0/mbc.h Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/mbc.h Fri Jul 9 14:40:03 1993
@@ -0,0 +1,38 @@
+#ifndef MBC_H
+#define MBC_H 1
+/* Definitions for multi-byte support.
+ Created for grep multi-byte extension Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
+
+#ifndef const
+#ifndef __STDC__
+#ifdef __GNUC__
+#define const __const__
+#define volatile __volatile__
+#else
+#define const
+#define volatile
+#endif
+#endif
+#endif
+
+#ifndef _
+#ifdef __STDC__
+#define _(x) x
+#else
+#define _(x) ()
+#endif
+#endif
+
+#define MBCTYPE_ASCII 0
+#define MBCTYPE_EUC 1
+#define MBCTYPE_SJIS 2
+
+extern const unsigned char *mbctab;
+extern int current_mbctype;
+
+void mbcinit _((int));
+
+#define ismbchar(c) mbctab[(unsigned char)c]
+
+#endif /* !MBC_H */
diff -ru2N grep-2.0/obstack.h grep+mb1.04/obstack.h
--- grep-2.0/obstack.h Sat May 22 11:55:23 1993
+++ grep+mb1.04/obstack.h Sat Jul 10 04:47:06 1993
@@ -15,4 +15,6 @@
along with this program; if not, write to the Free Software
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
/* Summary:
@@ -136,4 +138,5 @@
#endif
+#ifndef PTR_INT_TYPE
#ifdef __STDC__
#define PTR_INT_TYPE ptrdiff_t
@@ -141,4 +144,5 @@
#define PTR_INT_TYPE long
#endif
+#endif
struct _obstack_chunk /* Lives at front of each chunk. */
@@ -151,5 +155,5 @@
struct obstack /* control current object in current chunk */
{
- long chunk_size; /* preferred size to allocate chunks in */
+ unsigned chunk_size; /* preferred size to allocate chunks in */
struct _obstack_chunk* chunk; /* address of current struct obstack_chunk */
char *object_base; /* address of object we are building */
diff -ru2N grep-2.0/regex.c grep+mb1.04/regex.c
--- grep-2.0/regex.c Fri May 21 14:11:40 1993
+++ grep+mb1.04/regex.c Thu Aug 19 04:37:03 1993
@@ -19,4 +19,6 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Aug. 19, 1993 by t^2 */
/* AIX requires this to be the first thing in the file. */
@@ -54,6 +56,33 @@
#define bcmp(s1, s2, n) memcmp ((s1), (s2), (n))
#endif
+#ifdef HAVE_MEMMOVE
#ifndef bcopy
-#define bcopy(s, d, n) memcpy ((d), (s), (n))
+#define bcopy(s, d, n) memmove ((d), (s), (n))
+#endif
+#else
+#ifndef HAVE_BCOPY
+static void
+#ifdef __STDC__
+bcopy (const void *s0, void *d0, size_t n)
+#else
+bcopy (s, d, n)
+ const void *s0;
+ void *d;
+ size_t n;
+#endif
+{
+ const char *s = s0;
+ char *d = d0;
+
+ if (s < d) {
+ s += n, d += n;
+ while (n--)
+ *--d = *--s;
+ }
+ else
+ while (n--)
+ *d++ = *s++;
+}
+#endif
#endif
#ifndef bzero
@@ -124,4 +153,5 @@
/* Get the interface, including the syntax bits. */
#include "regex.h"
+#include "mbc.h"
/* isalpha etc. are used for the character classes. */
@@ -462,4 +492,19 @@
#endif /* DEBUG */
+
+#define STORE_MBC(p, c) \
+ ((p)[0] = (unsigned char) ((c) >> 8), (p)[1] = (unsigned char) (c))
+#define STORE_MBC_AND_INCR(p, c) \
+ (*(p)++ = (unsigned char) ((c) >> 8), *(p)++ = (unsigned char) (c))
+
+#define EXTRACT_MBC(p) \
+ ((unsigned char) (p)[0] << 8 | (unsigned char) (p)[1])
+#define EXTRACT_MBC_AND_INCR(p) \
+ ((p) += 2, (unsigned char) (p)[-2] << 8 | (unsigned char) (p)[-1])
+
+#define EXTRACT_UNSIGNED(p) \
+ ((unsigned char) (p)[0] | (unsigned char) (p)[1] << 8)
+#define EXTRACT_UNSIGNED_AND_INCR(p) \
+ ((p) += 2, (unsigned char) (p)[-2] | (unsigned char) (p)[-1] << 8)
/* If DEBUG is defined, Regex prints many voluminous messages about what
@@ -558,4 +603,8 @@
{
putchar ('/');
+ if (ismbchar (*p) && 2 <= mcnt) {
+ printf ("/%.2s", (char *) p), p += 2, --mcnt;
+ continue;
+ }
printchar (*p++);
}
@@ -618,7 +667,14 @@
printchar (last);
- putchar (']');
-
p += 1 + *p;
+ {
+ unsigned short i, size;
+
+ size = EXTRACT_UNSIGNED_AND_INCR (p);
+ for (i = 0; i < size; i++)
+ printf ("%.2s-%.2s", (char *) p, (char *) p + 2),
+ p += 4;
+ }
+ putchar (']');
}
break;
@@ -779,5 +835,5 @@
printf ("not_bol: %d\t", bufp->not_bol);
printf ("not_eol: %d\t", bufp->not_eol);
- printf ("syntax: %d\n", bufp->syntax);
+ printf ("syntax: %lu\n", bufp->syntax);
/* Perhaps we should print the translate table? */
}
@@ -878,5 +934,7 @@
static boolean at_begline_loc_p (), at_endline_loc_p ();
static boolean group_in_compile_stack ();
+#if 0
static reg_errcode_t compile_range ();
+#endif
/* Fetch the next character in the uncompiled pattern---translating it
@@ -887,5 +945,6 @@
do {if (p == pend) return REG_EEND; \
c = (unsigned char) *p++; \
- if (translate) c = translate[c]; \
+ if (translate && !ismbchar (c)) \
+ c = (unsigned char) translate[(unsigned char) c]; \
} while (0)
@@ -905,5 +964,7 @@
`char *', to avoid warnings when a string constant is passed. But
when we use a character as a subscript we must make it unsigned. */
-#define TRANSLATE(d) (translate ? translate[(unsigned char) (d)] : (d))
+#define TRANSLATE(d) (translate \
+ ? (unsigned char) translate[(unsigned char) (d)] \
+ : (d))
@@ -1075,4 +1136,159 @@
|| STREQ (string, "cntrl") || STREQ (string, "blank"))
+/* Handle charset(_not)?.
+
+ Structure of charset(_not)? in compiled pattern.
+
+ struct {
+ unsinged char id; charset(_not)?
+ unsigned char sbc_size;
+ unsigned char sbc_map[sbc_size]; same as original up to here.
+ unsigned short mbc_size; number of intervals.
+ struct {
+ unsigned short beg; beginning of interval.
+ unsigned short end; end of interval.
+ } intervals[mbc_size];
+ }; */
+
+static reg_errcode_t
+#ifdef __STDC__
+set_list_bits (unsigned short c1, unsigned short c2,
+ reg_syntax_t syntax, unsigned char *b, const char *translate)
+#else
+set_list_bits (c1, c2, syntax, b, translate)
+ unsigned short c1, c2;
+ reg_syntax_t syntax;
+ unsigned char *b;
+ const char *translate;
+#endif
+{
+ unsigned char sbc_size = b[-1];
+ unsigned short mbc_size = EXTRACT_UNSIGNED (&b[sbc_size]);
+ unsigned short beg, end, upb;
+
+ if (c1 > c2)
+ return syntax & RE_NO_EMPTY_RANGES ? REG_ERANGE : REG_NOERROR;
+ if (c1 < 1 << BYTEWIDTH) {
+ upb = c2;
+ if (1 << BYTEWIDTH <= upb)
+ upb = (1 << BYTEWIDTH) - 1; /* The last single-byte char */
+ if (sbc_size <= upb / BYTEWIDTH) {
+ /* Allocate maximum size so it never happens again. */
+ /* NOTE: memcpy() would not work here. */
+ bcopy (&b[sbc_size], &b[(1 << BYTEWIDTH) / BYTEWIDTH], 2 + mbc_size*4);
+ bzero (&b[sbc_size], (1 << BYTEWIDTH) / BYTEWIDTH - sbc_size);
+ b[-1] = sbc_size = (1 << BYTEWIDTH) / BYTEWIDTH;
+ }
+ if (!translate) {
+ for (; c1 <= upb; c1++)
+ if (!ismbchar (c1))
+ SET_LIST_BIT (c1);
+ }
+ else
+ for (; c1 <= upb; c1++)
+ if (!ismbchar (c1))
+ SET_LIST_BIT (TRANSLATE (c1));
+ if (c2 < 1 << BYTEWIDTH)
+ return REG_NOERROR;
+ c1 = 0x8000; /* The first wide char */
+ }
+ b = &b[sbc_size + 2];
+
+ /* intervals[beg]
+ $B!|----------$B!| $B!|----------$B!|
+ c1
+ $B!{----------------------$B!|
+
+ $B>e?^$N$h$&$J6h4V$N%$%s%G%C%/%9 beg $B$r7hDj$9$k. */
+ for (beg = 0, upb = mbc_size; beg < upb; ) {
+ unsigned short mid = (beg + upb) >> 1;
+
+ if (c1 - 1 > EXTRACT_MBC (&b[mid*4 + 2]))
+ beg = mid + 1;
+ else
+ upb = mid;
+ }
+
+ /* intervals[end]
+ $B!|-------$B!| $B!|----------$B!|
+ c2
+ $B!|---------------$B!{
+
+ $B>e?^$N$h$&$J6h4V$N%$%s%G%C%/%9 end $B$r7hDj$9$k. */
+ for (end = beg, upb = mbc_size; end < upb; ) {
+ unsigned short mid = (end + upb) >> 1;
+
+ if (c2 >= EXTRACT_MBC (&b[mid*4]) - 1)
+ end = mid + 1;
+ else
+ upb = mid;
+ }
+
+ if (beg != end) {
+ /* $B4{B8$N6h4V$r>/$J$/$H$b1$B$DE}9g$9$k>l9g,
+ $B6h4V$N;OE@, $B=*E@$r=$@5$9$k. */
+ if (c1 > EXTRACT_MBC (&b[beg*4]))
+ c1 = EXTRACT_MBC (&b[beg*4]);
+ if (c2 < EXTRACT_MBC (&b[end*4 - 2]))
+ c2 = EXTRACT_MBC (&b[end*4 - 2]);
+ }
+ if (end < mbc_size && end != beg + 1)
+ /* $BDI2C$5$l$k6h4V$N8e$m$K4{B8$N6h4V$r0\F0$9$k. */
+ /* NOTE: memcpy() would not work here. */
+ bcopy (&b[end*4], &b[(beg + 1)*4], (mbc_size - end)*4);
+ STORE_MBC (&b[beg*4 + 0], c1);
+ STORE_MBC (&b[beg*4 + 2], c2);
+ mbc_size += beg + 1 - end;
+ STORE_NUMBER (&b[-2], mbc_size);
+ return REG_NOERROR;
+}
+
+static int
+#ifdef __STDC__
+is_in_list (unsigned short c, const unsigned char *b)
+#else
+is_in_list (c, b)
+ unsigned short c;
+ const unsigned char *b;
+#endif
+{
+ unsigned short size;
+ int in = (re_opcode_t) b[-1] == charset_not;
+
+ size = *b++;
+ if (c < 1 << BYTEWIDTH) {
+ if (c / BYTEWIDTH < size && b[c / BYTEWIDTH] & 1 << c % BYTEWIDTH)
+ in = !in;
+ }
+ else {
+ unsigned short i, j;
+
+ b += size + 2;
+ size = EXTRACT_UNSIGNED (&b[-2]);
+
+ /* intervals[i]
+ $B!|-------$B!| $B!|--------$B!|
+ c
+ $B!{----------------$B!|
+
+ $B>e?^$N$h$&$J6h4V$N%$%s%G%C%/%9 i $B$r7hDj$9$k. */
+ for (i = 0, j = size; i < j; ) {
+ unsigned short k = (i + j) >> 1;
+
+ if (c > EXTRACT_MBC (&b[k*4 + 2]))
+ i = k + 1;
+ else
+ j = k;
+ }
+ if (i < size && EXTRACT_MBC (&b[i*4]) <= c
+ /* [...] $B$+$i, $BL58z$J%^%k%A%P%$%HJ8;z$r=|30$9$k. $B$3$3$G$O4JC1$N
+ $B$?$a#2%P%$%HL\$, '\n' $B$^$?$O '\0' $B$@$1$rL58z$H$7$?. [^...]
+ $B$N>l9g$O, $B5U$KL58z$J%^%k%A%P%$%HJ8;z$r%^%C%A$5$;$k. */
+ && ((unsigned char) c != '\n' && (unsigned char) c != '\0'))
+ in = !in;
+ }
+ return in;
+}
+
/* `regex_compile' compiles PATTERN (of length SIZE) according to SYNTAX.
Returns one of error codes defined in `regex.h', or zero for success.
@@ -1385,4 +1601,6 @@
{
boolean had_char_class = false;
+ unsigned short c, c1;
+ int last_char = -1;
if (p == pend) return REG_EBRACK;
@@ -1390,5 +1608,6 @@
/* Ensure that we have enough space to push a charset: the
opcode, the length count, and the bitset; 34 bytes in all. */
- GET_BUFFER_SPACE (34);
+ /* + 2 + 4 for mbcharset(_not)? with just one interval. */
+ GET_BUFFER_SPACE (34 + 2 + 4);
laststart = b;
@@ -1407,5 +1626,5 @@
/* Clear the whole map. */
- bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH);
+ bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH + 2);
/* charset_not matches newline according to a syntax bit. */
@@ -1417,7 +1636,14 @@
for (;;)
{
+ int size;
+
if (p == pend) return REG_EBRACK;
- PATFETCH (c);
+ if ((size = EXTRACT_UNSIGNED (&b[(1 << BYTEWIDTH) / BYTEWIDTH])))
+ /* Ensure the space is enough to hold another interval
+ of multi-byte chars in charset(_not)?. */
+ GET_BUFFER_SPACE (32 + 2 + size*4 + 4);
+
+ PATFETCH_RAW (c);
/* \ might escape characters inside [...] and [^...]. */
@@ -1426,6 +1652,16 @@
if (p == pend) return REG_EESCAPE;
- PATFETCH (c1);
- SET_LIST_BIT (c1);
+ PATFETCH_RAW (c1);
+ if (ismbchar (c1)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c1 = c1 << 8 | c2;
+ (void) set_list_bits (c1, c1, syntax, b, translate);
+ last_char = c1;
+ continue;
+ }
+ SET_LIST_BIT (TRANSLATE (c1));
+ last_char = c1;
continue;
}
@@ -1442,4 +1678,11 @@
return REG_ERANGE;
+ if (ismbchar (c)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c = c << 8 | c2;
+ }
+
/* Look ahead to see if it's a range when the last thing
was a character: if this is a hyphen not at the
@@ -1447,10 +1690,25 @@
operator. */
if (c == '-'
+#if 0 /* The original was: */
&& !(p - 2 >= pattern && p[-2] == '[')
&& !(p - 3 >= pattern && p[-3] == '[' && p[-2] == '^')
+#else /* I wonder why he did not write like this.
+ Have we got any problems? */
+ && p != p1 + 1
+#endif
&& *p != ']')
{
- reg_errcode_t ret
- = compile_range (&p, pend, translate, syntax, b);
+ reg_errcode_t ret;
+
+ assert (last_char >= 0);
+ PATFETCH_RAW (c1);
+ if (ismbchar (c1)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c1 = c1 << 8 | c2;
+ }
+ ret = set_list_bits (last_char, c1, syntax, b, translate);
+ last_char = c1;
if (ret != REG_NOERROR) return ret;
}
@@ -1461,7 +1719,15 @@
/* Move past the `-'. */
- PATFETCH (c1);
-
- ret = compile_range (&p, pend, translate, syntax, b);
+ PATFETCH_RAW (c1);
+
+ PATFETCH_RAW (c1);
+ if (ismbchar (c1)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c1 = c1 << 8 | c2;
+ }
+ ret = set_list_bits (c, c1, syntax, b, translate);
+ last_char = c1;
if (ret != REG_NOERROR) return ret;
}
@@ -1474,5 +1740,5 @@
char str[CHAR_CLASS_MAX_LENGTH + 1];
- PATFETCH (c);
+ PATFETCH_RAW (c);
c1 = 0;
@@ -1534,4 +1800,7 @@
}
had_char_class = true;
+#ifdef DEBUG
+ last_char = -1;
+#endif
}
else
@@ -1540,7 +1809,13 @@
while (c1--)
PATUNFETCH;
+#if 0 /* The original was: */
SET_LIST_BIT ('[');
SET_LIST_BIT (':');
+#else /* I think this is the right way. */
+ SET_LIST_BIT (TRANSLATE ('['));
+ SET_LIST_BIT (TRANSLATE (':'));
+#endif
had_char_class = false;
+ last_char = ':';
}
}
@@ -1548,5 +1823,6 @@
{
had_char_class = false;
- SET_LIST_BIT (c);
+ (void) set_list_bits (c, c, syntax, b, translate);
+ last_char = c;
}
}
@@ -1556,5 +1832,9 @@
while ((int) b[-1] > 0 && b[b[-1] - 1] == 0)
b[-1]--;
- b += b[-1];
+ if (b[-1] != (1 << BYTEWIDTH) / BYTEWIDTH)
+ bcopy (&b[(1 << BYTEWIDTH) / BYTEWIDTH], &b[b[-1]],
+ 2 + EXTRACT_UNSIGNED (&b[(1 << BYTEWIDTH) / BYTEWIDTH])*4);
+ b += b[-1] + 2 + EXTRACT_UNSIGNED (&b[b[-1]])*4;
+ break;
}
break;
@@ -2023,5 +2303,6 @@
not to translate; but if we don't translate it
it will never match anything. */
- c = TRANSLATE (c);
+ if (!ismbchar (c))
+ c = TRANSLATE (c);
goto normal_char;
}
@@ -2032,4 +2313,11 @@
/* Expects the character in `c'. */
normal_char:
+
+ c1 = 0;
+ if (ismbchar (c)) {
+ c1 = c;
+ PATFETCH_RAW (c);
+ }
+
/* If no exactn currently being built. */
if (!pending_exact
@@ -2039,5 +2327,6 @@
/* We have only one byte following the exactn for the count. */
- || *pending_exact == (1 << BYTEWIDTH) - 1
+ || *pending_exact >= (c1 ? (1 << BYTEWIDTH) - 2
+ : (1 << BYTEWIDTH) - 1)
/* If followed by a repetition operator. */
@@ -2059,4 +2348,8 @@
}
+ if (c1) {
+ BUF_PUSH (c1);
+ (*pending_exact)++;
+ }
BUF_PUSH (c);
(*pending_exact)++;
@@ -2184,5 +2477,5 @@
at_endline_loc_p (p, pend, syntax)
const char *p, *pend;
- int syntax;
+ reg_syntax_t syntax;
{
const char *next = p;
@@ -2220,4 +2513,5 @@
+#if 0 /* We use set_list_bits() now. */
/* Read the ending character of a range (in a bracket expression) from the
uncompiled pattern *P_PTR (which ends at PEND). We assume the
@@ -2275,4 +2569,5 @@
return REG_NOERROR;
}
+#endif
/* Failure stack declarations and macros; both re_compile_fastmap and
@@ -2638,18 +2933,65 @@
case charset:
+ /* NOTE: Charset for single-byte chars never contain
+ multi-byte char. See set_list_bits(). */
for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
fastmap[j] = 1;
+ {
+ unsigned short size;
+ unsigned char c, end;
+
+ p += p[-1] + 2;
+ size = EXTRACT_UNSIGNED (&p[-2]);
+ for (j = 0; j < size; j++)
+ /* set bits for 1st bytes of multi-byte chars. */
+ for (c = (unsigned char) p[j*4],
+ end = (unsigned char) p[j*4 + 2];
+ c <= end; c++)
+ /* NOTE: Charset for multi-byte chars might contain
+ single-byte chars. We must reject them. */
+ if (ismbchar (c))
+ fastmap[c] = 1;
+ }
break;
case charset_not:
+ /* S: set of all single-byte chars.
+ M: set of all first bytes that can start multi-byte chars.
+ s: any set of single-byte chars.
+ m: any set of first bytes that can start multi-byte chars.
+
+ We assume S+M = U.
+ ___ _ _
+ s+m = (S*s+M*m). */
/* Chars beyond end of map must be allowed. */
+ /* NOTE: Charset_not for single-byte chars might contain
+ multi-byte chars. See set_list_bits(). */
for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
- fastmap[j] = 1;
+ if (!ismbchar (j))
+ fastmap[j] = 1;
for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
- fastmap[j] = 1;
+ if (!ismbchar (j))
+ fastmap[j] = 1;
+ {
+ unsigned short size;
+ unsigned short c, beg;
+
+ p += p[-1] + 2;
+ size = EXTRACT_UNSIGNED (&p[-2]);
+ c = 0x00;
+ for (j = 0; j < size; j++) {
+ for (beg = (unsigned char) p[j*4 + 0]; c <= beg; c++)
+ if (ismbchar (c))
+ fastmap[c] = 1;
+ c = (unsigned char) p[j*4 + 2];
+ }
+ for (beg = 0xff; c <= beg; c++)
+ if (ismbchar (c))
+ fastmap[c] = 1;
+ }
break;
@@ -2964,4 +3306,5 @@
register int lim = 0;
int irange = range;
+ unsigned char c;
if (startpos < size1 && startpos + range >= size1)
@@ -2973,11 +3316,23 @@
inside the loop. */
if (translate)
- while (range > lim
- && !fastmap[(unsigned char)
- translate[(unsigned char) *d++]])
+ while (range > lim) {
+ c = *d++;
+ if (ismbchar (c)) {
+ if (fastmap[c])
+ break;
+ d++;
+ range -= 2;
+ continue;
+ }
+ if (fastmap[(unsigned char) translate[c]])
+ break;
range--;
+ }
else
- while (range > lim && !fastmap[(unsigned char) *d++])
+ while (range > lim && (c = *d++, !fastmap[c])) {
+ if (ismbchar (c))
+ d++, range--;
range--;
+ }
startpos += irange - range;
@@ -3012,11 +3367,34 @@
else if (range > 0)
{
- range--;
- startpos++;
+ const char *d = ((startpos >= size1 ? string2 - size1 : string1)
+ + startpos);
+
+ if (ismbchar (*d)) {
+ range--, startpos++;
+ if (!range)
+ break;
+ }
+ range--, startpos++;
}
else
{
- range++;
- startpos--;
+ range++, startpos--;
+ {
+ const char *s, *d, *p;
+
+ if (startpos < size1)
+ s = string1, d = string1 + startpos;
+ else
+ s = string2, d = string2 + startpos - size1;
+ for (p = d; p-- > s && ismbchar(*p); )
+ /* --p >= s $B$@$H 80[12]?86 $B$GF0$+$J$$2DG=@-$,$"$k. (huge
+ model $B0J30$G, s $B$N%*%U%;%C%H$, 0 $B$@$C$?>l9g.) */
+ ;
+ if (!((d - p) & 1)) {
+ if (!range)
+ break;
+ range++, startpos--;
+ }
+ }
}
}
@@ -3578,6 +3956,19 @@
do
{
+ unsigned char c;
+
PREFETCH ();
- if (translate[(unsigned char) *d++] != (char) *p++)
+ c = *d++;
+ if (ismbchar (c)) {
+ if (c != (unsigned char) *p++
+ || !--mcnt /* $B%Q%?!<%s$,@5$7$/%3%s%Q%$%k$5
+ $B$l$F$$$k8B$j, $B$3$N%A%'%C%/$O
+ $B>iD9$@$,G0$N$?$a. */
+ || d == dend
+ || (unsigned char) *d++ != (unsigned char) *p++)
+ goto fail;
+ continue;
+ }
+ if ((unsigned char) translate[c] != (unsigned char) *p++)
goto fail;
}
@@ -3588,6 +3979,26 @@
do
{
+#if 0
+ /* $BB>$NItJ,$G$O, string1 $B$H string2 $B$K%^%k%A%P%$%HJ8;z
+ $B$,8Y$k$N$r5v$7$F$$$J$$. $B$3$N$3$H$rB.EY$r5>@7$K$7$F
+ $B$b%A%'%C%/$9$k>l9g$O, $B$3$3$H<!$N `#if 0' $B$r `#if 1'
+ $B$KJQ$($k$3$H. */
+ unsigned char c;
+
+#endif
PREFETCH ();
+#if 0
+ c = *d++;
+ if (ismbchar (c)) {
+ if (c != (unsigned char) *p++
+ || !--mcnt
+ || d == dend)
+ goto fail;
+ c = *d++;
+ }
+ if (c != (unsigned char) *p++) goto fail;
+#else
if (*d++ != (char) *p++) goto fail;
+#endif
}
while (--mcnt);
@@ -3602,4 +4013,14 @@
PREFETCH ();
+ if (ismbchar (*d)) {
+ if (d + 1 == dend || d[1] == '\n' || d[1] == '\0')
+ /* $BL58z$J%^%k%A%P%$%HJ8;z$K$O%^%C%A$5$;$J$$. $B$3$3$G$O, $B4J
+ $BC1$N$?$a#2%P%$%HL\$, '\n', '\0' $B$N$b$N$@$1$rL58z$H$9$k. */
+ goto fail;
+ SET_REGS_MATCHED ();
+ DEBUG_PRINT2 (" Matched `%d'.\n", EXTRACT_MBC (&d[0]));
+ d += 2;
+ break;
+ }
if ((!(bufp->syntax & RE_DOT_NEWLINE) && TRANSLATE (*d) == '\n')
@@ -3616,19 +4037,23 @@
case charset_not:
{
- register unsigned char c;
- boolean not = (re_opcode_t) *(p - 1) == charset_not;
+ register unsigned short c;
+ boolean not;
- DEBUG_PRINT2 ("EXECUTING charset%s.\n", not ? "_not" : "");
+ DEBUG_PRINT2 ("EXECUTING charset%s.\n",
+ (re_opcode_t) *(p - 1) == charset_not ? "_not" : "");
PREFETCH ();
- c = TRANSLATE (*d); /* The character to match. */
+ c = (unsigned char) *d;
+ if (ismbchar (c)) {
+ c <<= 8;
+ if (d + 1 != dend)
+ c |= (unsigned char) d[1];
+ }
+ else
+ c = TRANSLATE (c); /* The character to match. */
- /* Cast to `unsigned' instead of `unsigned char' in case the
- bit list is a full 32 bytes long. */
- if (c < (unsigned) (*p * BYTEWIDTH)
- && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
- not = !not;
+ not = is_in_list (c, p);
- p += 1 + *p;
+ p += 1 + *p + 2 + EXTRACT_UNSIGNED (&p[1 + *p])*4;
if (!not) goto fail;
@@ -3636,4 +4061,6 @@
SET_REGS_MATCHED ();
d++;
+ if (d != dend && c >= 1 << BYTEWIDTH)
+ d++;
break;
}
@@ -3801,5 +4228,5 @@
/* xx why this test? */
- if ((int) old_regend[r] >= (int) regstart[r])
+ if (old_regend[r] >= regstart[r])
regend[r] = old_regend[r];
}
@@ -4052,5 +4479,5 @@
|| (bufp->newline_anchor && (re_opcode_t) *p2 == endline))
{
- register unsigned char c
+ register unsigned short c
= *p2 == (unsigned char) endline ? '\n' : p2[2];
p1 = p + mcnt;
@@ -4069,13 +4496,10 @@
|| (re_opcode_t) p1[3] == charset_not)
{
- int not = (re_opcode_t) p1[3] == charset_not;
-
- if (c < (unsigned char) (p1[4] * BYTEWIDTH)
- && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
- not = !not;
+ if (ismbchar (c))
+ c = c << 8 | p2[3];
- /* `not' is equal to 1 if c would match, which means
+ /* `is_in_list()' is TRUE if c would match, which means
that we can't change to pop_failure_jump. */
- if (!not)
+ if (!is_in_list (c, p1 + 4))
{
p[-3] = (unsigned char) pop_failure_jump;
@@ -4632,8 +5056,15 @@
char *translate;
{
- register unsigned char *p1 = s1, *p2 = s2;
+ register unsigned char *p1 = s1, *p2 = s2, c;
while (len)
{
- if (translate[*p1++] != translate[*p2++]) return 1;
+ c = *p1++;
+ if (ismbchar(c)) {
+ if (c != *p2++ || !--len || *p1++ != *p2++)
+ return 1;
+ }
+ else
+ if (translate[c] != translate[*p2++])
+ return 1;
len--;
}
@@ -4778,5 +5209,5 @@
{
reg_errcode_t ret;
- unsigned syntax
+ reg_syntax_t syntax
= (cflags & REG_EXTENDED) ?
RE_SYNTAX_POSIX_EXTENDED : RE_SYNTAX_POSIX_BASIC;
diff -ru2N grep-2.0/regex.h grep+mb1.04/regex.h
--- grep-2.0/regex.h Fri May 21 14:11:43 1993
+++ grep+mb1.04/regex.h Sat Jul 10 04:38:03 1993
@@ -17,4 +17,6 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
#ifndef __REGEXP_LIBRARY_H__
@@ -36,9 +38,9 @@
the definitions shifted by one from the previous bit; thus, when we
add or remove a bit, only one other definition need change. */
-typedef unsigned reg_syntax_t;
+typedef unsigned long reg_syntax_t;
/* If this bit is not set, then \ inside a bracket expression is literal.
If set, then such a \ quotes the following character. */
-#define RE_BACKSLASH_ESCAPE_IN_LISTS (1)
+#define RE_BACKSLASH_ESCAPE_IN_LISTS ((unsigned long)1)
/* If this bit is not set, then + and ? are operators, and \+ and \? are
@@ -206,5 +208,5 @@
#undef RE_DUP_MAX
#endif
-#define RE_DUP_MAX ((1 << 15) - 1)
+#define RE_DUP_MAX ((int)(((unsigned)1 << 15) - 1))
@@ -397,4 +399,10 @@
#define _RE_ARGS(args) ()
+
+#ifdef __GNUC__
+#define const __const__
+#else
+#define const
+#endif
#endif /* not __STDC__ */
diff -ru2N grep-2.0/search.c grep+mb1.04/search.c
--- grep-2.0/search.c Mon May 3 06:02:00 1993
+++ grep+mb1.04/search.c Fri Jul 9 14:55:21 1993
@@ -17,4 +17,6 @@
Written August 1992 by Mike Haertel. */
+/* Multi-byte extension added Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
#include <ctype.h>
@@ -61,4 +63,5 @@
#include "kwset.h"
#include "regex.h"
+#include "mbc.h"
#define NCHAR (UCHAR_MAX + 1)
@@ -434,8 +437,9 @@
char **endp;
{
- register char *beg, *try, *end;
+ register char *beg, *try, *end, *p, *lim;
register size_t len;
struct kwsmatch kwsmatch;
+ lim = buf;
for (beg = buf; beg <= buf + size; ++beg)
{
@@ -456,4 +460,8 @@
if (try > buf && WCHAR((unsigned char) try[-1]))
break;
+ for (p = try; p-- > lim && ismbchar(*p); )
+ ;
+ if (!((try - p) & 1))
+ break;
if (try + len < buf + size && WCHAR((unsigned char) try[len]))
{
@@ -464,6 +472,12 @@
goto success;
}
- else
- goto success;
+ else {
+ for (p = beg; p-- > lim && ismbchar(*p); )
+ ;
+ if ((beg - p) & 1)
+ goto success;
+ if (lim + 1 < beg)
+ lim = beg - 1;
+ }
}
diff -ru2N grep-2.0/tests/batgen.awk grep+mb1.04/tests/batgen.awk
--- grep-2.0/tests/batgen.awk Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/tests/batgen.awk Sat Jul 10 02:10:24 1993
@@ -0,0 +1,10 @@
+BEGIN { print "@echo off"; }
+$0 !~ /^#/ && NF == 3 {
+ printf "echo #%d --\n", ++n
+ print "set R=0";
+ print "echo " $3 ">tmp.in";
+ print "grep -E -e \"" $2 "\" tmp.in >nul";
+ print "if errorlevel 1 set R=1";
+ print "if errorlevel 2 set R=2";
+ printf "if not %R% == " $1 " echo Spencer test #%d failed\n", n
+}
diff -ru2N grep-2.0/tests/check.bat grep+mb1.04/tests/check.bat
--- grep-2.0/tests/check.bat Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/tests/check.bat Fri Jul 9 17:05:30 1993
@@ -0,0 +1,14 @@
+@echo off
+rem
+rem Regression test for GNU e?grep.
+rem
+
+rem The Khadafy test is brought to you by Scott Anderson . . .
+grep -E -f tests/khadafy.reg tests/khadafy.lin > khadafy.out
+fc tests\khadafy.lin khadafy.out
+
+rem . . . and the following by Henry Spencer.
+
+gawk -F: -f tests/batgen.awk tests/spencer.dos > tmp.bat
+
+tmp
diff -ru2N grep-2.0/tests/spencer.dos grep+mb1.04/tests/spencer.dos
--- grep-2.0/tests/spencer.dos Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/tests/spencer.dos Sat Jul 10 02:12:59 1993
@@ -0,0 +1,122 @@
+0:abc:abc
+1:abc:xbc
+1:abc:axc
+1:abc:abx
+0:abc:xabcy
+0:abc:ababc
+0:ab*c:abc
+0:ab*bc:abc
+0:ab*bc:abbc
+0:ab*bc:abbbbc
+0:ab+bc:abbc
+1:ab+bc:abc
+1:ab+bc:abq
+0:ab+bc:abbbbc
+0:ab?bc:abbc
+0:ab?bc:abc
+1:ab?bc:abbbbc
+0:ab?c:abc
+0:^abc$:abc
+1:^abc$:abcc
+0:^abc:abcc
+1:^abc$:aabc
+0:abc$:aabc
+0:^:abc
+0:$:abc
+0:a.c:abc
+0:a.c:axc
+0:a.*c:axyzc
+1:a.*c:axyzd
+1:a[bc]d:abc
+0:a[bc]d:abd
+1:a[b-d]e:abd
+0:a[b-d]e:ace
+0:a[b-d]:aac
+0:a[-b]:a-
+0:a[b-]:a-
+1:a[b-a]:-
+2:a[]b:-
+2:a[:-
+0:a]:a]
+0:a[]]b:a]b
+0:a[^bc]d:aed
+1:a[^bc]d:abd
+0:a[^-b]c:adc
+1:a[^-b]c:a-c
+1:a[^]b]c:a]c
+0:a[^]b]c:adc
+0:ab|cd:abc
+0:ab|cd:abcd
+0:()ef:def
+0:()*:-
+1:*a:-
+0:^*:-
+0:$*:-
+1:(*)b:-
+1:$b:b
+2:a\\:-
+0:a\(b:a(b
+0:a\(*b:ab
+0:a\(*b:a((b
+1:a\x:a\x
+2:abc):-
+2:(abc:-
+0:((a)):abc
+0:(a)b(c):abc
+0:a+b+c:aabbabc
+0:a**:-
+0:a*?:-
+0:(a*)*:-
+0:(a*)+:-
+0:(a|)*:-
+0:(a*|b)*:-
+0:(a+|b)*:ab
+0:(a+|b)+:ab
+0:(a+|b)?:ab
+0:[^ab]*:cde
+0:(^)*:-
+0:(ab|)*:-
+2:)(:-
+1:abc:
+1:abc:
+0:a*:
+0:([abc])*d:abbbcd
+0:([abc])*bcd:abcd
+0:a|b|c|d|e:e
+0:(a|b|c|d|e)f:ef
+0:((a*|b))*:-
+0:abcd*efg:abcdefg
+0:ab*:xabyabbbz
+0:ab*:xayabbbz
+0:(ab|cd)e:abcde
+0:[abhgefdc]ij:hij
+1:^(ab|cd)e:abcde
+0:(abc|)ef:abcdef
+0:(a|b)c*d:abcd
+0:(ab|ab*)bc:abc
+0:a([bc]*)c*:abc
+0:a([bc]*)(c*d):abcd
+0:a([bc]+)(c*d):abcd
+0:a([bc]*)(c+d):abcd
+0:a[bcd]*dcdcde:adcdcde
+1:a[bcd]+dcdcde:adcdcde
+0:(ab|a)b*c:abc
+0:((a)(b)c)(d):abcd
+0:[A-Za-z_][A-Za-z0-9_]*:alpha
+0:^a(bc+|b[eh])g|.h$:abh
+0:(bc+d$|ef*g.|h?i(j|k)):effgz
+0:(bc+d$|ef*g.|h?i(j|k)):ij
+1:(bc+d$|ef*g.|h?i(j|k)):effg
+1:(bc+d$|ef*g.|h?i(j|k)):bcdd
+0:(bc+d$|ef*g.|h?i(j|k)):reffgz
+1:((((((((((a)))))))))):-
+0:(((((((((a))))))))):a
+1:multiple words of text:uh-uh
+0:multiple words:multiple words, yeah
+0:(.*)c(.*):abcde
+1:\((.*),:(.*)\)
+1:[k]:ab
+0:abcd:abcd
+0:a(bc)d:abcd
+0:a[-]?c:ac
+0:(....).*\1:beriberi
$B!|!|!|!|!| GNU grep version 2.0 + multi-byte extension 1.04 $B!|!|!|!|!|
$B!|!|!|!|!| Jun. 2, 1994 by t^2 $B!|!|!|!|!|
$B$3$N%U%!%$%k$O GNU grep version 2.0 (grep-2.0) $B$N%=!<%9%3!<%I$+$i, $B$=
$B$N%^%k%A%P%$%HJ8;zBP1~HG grep-2.0+mb1.04 $B$N%=!<%9%3!<%I$r@8@.$9$k$?$a
$B$N:9J,$r4^$s$G$$$^$9. grep-2.0 $B$N%=!<%9$rE83+$7$F$"$k%G%#%l%/%H%j$G
% patch -p1 < $B$3$N%U%!%$%k
$B$J$I$H$7$F%Q%C%A$rEv$F$F$/$@$5$$. $B$=$N8e README.MB $B$rFI$s$G$/$@$5$$.
$B")810 $BJ!2,;TCf1{6hG_8w1`CDCO 7-207
TEL/FAX: 092-731-4025 (TEL/FAX $B<+F0@ZBX$()
092-724-6342 (TEL $B$N$_)
E-mail: NBC02362@niftyserve.or.jp t^2 ($BC+K\9'9@)
diff -ru2N grep-2.0/ChangeLog.MB grep+mb1.04/ChangeLog.MB
--- grep-2.0/ChangeLog.MB Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/ChangeLog.MB Thu Jun 2 17:01:42 1994
@@ -0,0 +1,219 @@
+Thu Jun 2 16:58:03 1994 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.04 released.
+
+Sat Mar 5 16:30:16 1994 Takahiro Tanimoto (tt@isaac)
+
+ * README.MSC: PC-9800 $B%7%j!<%:MQ MS-C 6.00A $B$N, $B%o%$%k%I%+!<%IE8
+ $B3+%k!<%A%s$N%P%0$KBP=h$7$?. $B0JA0$N stdargv.diff $B$r$3$l$KE}9g$7,
+ $B:o=|$7$?. (Thanks to $BJ!9@K.$5$s <GFE00522@niftyserve.or.jp>)
+
+Thu Aug 19 04:26:09 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_compile_fastmap): charset_not $B$N fastmap $B$N:n@.=h
+ $BM}$,4V0c$C$F$$$F, regex $B$N fastmap $B$r;HMQ$9$k>l9g (e?grep $B$G$O
+ fastmap $B$r;HMQ$7$F$$$J$$$?$a, $B$3$NLdBj$OI=LL$K$O8=$l$J$$), $B@55,
+ $BI=8=$N@hF,$N [^$B#A] $B$d [^a] $B$KNc$($P $B#B $B$,%^%C%A$7$J$+$C$?.
+ (Thanks to $B>.20NIM4$5$s <JAE03716@niftyserve.or.jp>)
+
+Tue Aug 10 01:29:05 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (set_list_bits): $BJ8;z%/%i%9Cf$N%^%k%A%P%$%HJ8;z$N:GE,
+ $B2=$G, $B6h4V=*E@$N99?7=hM}ItJ,$K%P%0$,$"$j, [$B#A-$B#C#E-$B#G#B-$B#D] $B$r:G
+ $BE,2=$9$k$H [$B#A-$B#G] $B$@$,, $B$3$l$, [$B#A-$B#E] $B$H$J$C$F$7$^$C$F$$$?.
+ $B$?$@$7, regex $B$G$O$J$/ dfa $B$,;HMQ$5$l$k>l9g$K$O$3$N%P%0$OI=LL$K
+ $B$O8=$l$J$$.
+
+Fri Jul 23 03:22:13 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.03 released.
+
+ * DEFS.dos: strcmpi $B$r stricmp $B$KJQ99.
+
+ * grep.c (main): MS-DOS $B$N>l9g, argv[0] $B$r2C9)$7$?J8;zNs$X$N%]%$
+ $B%s%?$r argv[0] $B$X$b%;%C%H$9$k. getopt $B$,=PNO$9$k%a%C%;!<%8$b2C
+ $B9)$5$l$?J8;zNs$H$J$k.
+
+ * grep.c (main): MS-C 6.00A $B$N stdargv.asm $B$N%P%0$r%U%#%C%/%9$7
+ $B$?$?$a, argv[0] == "" $B$N$H$-$N=hM}$r:o=|$7$?.
+
+ * stdargv.diff: $BDI2C.
+
+Tue Jul 13 07:04:13 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.02 released.
+
+Mon Jul 12 00:20:36 1993 Takahiro Tanimoto (tt@isaac)
+
+ * grep.c: HAVE_STRCASECMP $B$, #define $B$5$l$F$$$J$$$H$-, $B0lJ}$NJ8
+ $B;zNs$@$1$r>.J8;z$K$7$F$+$iHf3S$9$k4X?t$rDj5A$7$F$$$?$,, $B;H$$J}$,
+ $B0-$+$C$?. $B$=$l$r strcasecmp() $B$HF1$8$b$N$KJQ99$7$?.
+
+ * DEFS.dos: HAVE_STRCMPI $B$r #define $B$9$kBe$o$j$K,
+ HAVE_STRCASECMP $B$r #define $B$7, strcasecmp $B$r strcmpi $B$K #define
+ $B$7$?.
+
+Sat Jul 10 01:05:04 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 2.0 + multi-byte extension 1.01 released.
+
+ * grep.c (main): MSDOS $B$N>l9g, argv[0] $B$r>.J8;z$K$7$F prog $B$K%;%C
+ $B%H$9$k. $B$^$?, $B3HD%;R$O<h$j=|$/.
+
+ * obstack.h: chunk_size $B$N7?$r size_t $B$+$i unsigned $B$KJQ99.
+ old-C $B$N>l9g, size_t $B$,Dj5A$5$l$F$$$J$$>uBV$H$J$C$?$?$a.
+
+ * regex.h: $BDj?t$N8e$K U, UL $B$r$D$1$k$H old-C $B$G%3%s%Q%$%k$G$-$J
+ $B$$. $B$3$l$i$r%-%c%9%H$KJQ99$7$?.
+
+ * regex.h: RE_DUP_MAX $B$NDj5A$r 16 $B%S%C%H int $B$N%^%7%s$G$b%*!<%P
+ $B%U%m!<$7$J$$=q$-J}$K=$@5.
+
+ * obstack.h: struct obstack $B$N%a%s%P chunk_size $B$N7?$r size_t $B$H
+ $B$7$?. PTR_INT_TYPE $B$r30It$+$i #define $B$G$-$k$h$&$K$7$?. MSDOS
+ $B$G SMALL MODEL $B0J30$N>l9g, __PTR_TO_INT, __INT_TO_PTR $B$H$H$b$K,
+ $B%]%$%s%?$H long $B$rJQ49$9$k$h$&$K$7$?.
+
+ * grep.c (fillbuf, grep): read() $B$NJV$jCM$N@5Ii$K$h$k%(%i!<%A%'%C
+ $B%/$r, -1 $B$KEy$7$$$+$I$&$+$G9T$&$h$&$KJQ99.
+
+ * grep.c: totalcc, totalnl $B$r unsigned long $B$KJQ99$7, prline()
+ $BCf$N printf() $B$N=q<0$r9g$o$;$?.
+
+ * DEFS.dos: BUFSALLOC $B$r 4096 $B$K #define. (See reset() in
+ grep.c.)
+
+ * getpagesize.h: MSDOS $B$N>l9g, $B%Z!<%8%5%$%:$O 4096 $B$H$7$?.
+
+ * dfa.c: STDC_HEADERS $B$^$?$O HAVE_STRING_H $B$N$H$-, bcopy, bzero
+ $B$r%^%/%mE83+$9$k.
+
+Fri Jul 9 13:16:50 1993 Takahiro Tanimoto (tt@isaac)
+
+ * mbc.c, mbc.h, ...: ismbchar() $B$r%b%8%e!<%kKh$KFHN)$7$FDj5A$9$k
+ $B$N$r$d$a, $B%b%8%e!<%k$rDI2C$7$?.
+
+ * search.c (Fexecute): fgrep $B$r%^%k%A%P%$%HJ8;z$KBP1~$5$;$?.
+
+Wed Jul 7 17:02:33 1993 Takahiro Tanimoto (tt@isaac)
+
+ * kwset.c (bmexec): 8 $B%S%C%H%/%j!<%s$G$J$$$H$3$m$r=$@5.
+
+ * $B%Y!<%9$r grep-2.0 $B$XJQ99.
+
+Sun Jul 4 08:48:12 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_match_2): $B%*%j%8%J%k$N%P%0. maybe_finalize_jump
+ $B$N=hM}Cf, start_memory/stop_memory $B$r%9%-%C%W$9$k$H$3$m$G, $B0z?t
+ $B$N%9%-%C%W$r$7$F$$$J$$%P%0$r=$@5. $BNc$($P "([a-n]+).*\1" $B$,@5$7
+ $B$/ "abcxyzab" $B$K%^%C%A$9$k$h$&$K$J$C$?.
+
+Sat Jul 3 06:51:33 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 1.6 + multi-byte extension 1.00 released.
+
+Sat Jul 3 04:29:14 1993 Takahiro Tanimoto (tt at pc98)
+
+ * grep.c (bufprev): -b $B%*%W%7%g%s$GI=<($9$k%P%$%H%*%U%;%C%H$r
+ long $B$K$9$k$?$a$K bufprev $B$r long $B$H$7$?. bufprev $B0J30$OJQ99$7
+ $B$F$$$J$$$?$a, 1 $B9T$N%5%$%:$, int $B$NHO0O$r1[$($k$H@5$7$/=hM}$5$l
+ $B$J$$. $B$^$?, DOS $B$G$O CR+LF $B$r 1 $B%P%$%H$H$7$F%+%&%s%H$7$F$7$^$&.
+ ($B<jH4$-)
+
+ * regex.c (re_match_2): $BJ8;z%/%i%9$N=hM}Cf$N 16 $B%S%C%H int $B$G@5
+ $B>oF0:n$7$J$$ItJ,$r=$@5.
+
+ * regex.c (re_exec): re_search() $B$X$N:G8e$N0z?t$r 0 $B$+$i NULL $B$X
+ $B=$@5.
+
+ * regex.c (re_match): re_match_2() $B$X$N#2HVL\$N0z?t$r 0 $B$+$i
+ NULL $B$X=$@5.
+
+ * regex.c (re_search): re_search_2() $B$X$N#2HVL\$N0z?t$r 0 $B$+$i
+ NULL $B$X=$@5.
+
+ * grep.c (main): MS-C $B$N setargv $B$N%P%0$N$;$$$G, grep "\\" foo
+ $B$H$9$k$H argv[0] == "" $B$H$J$C$F$7$^$&. argv[0] == "" $B$N$H$-$O6/
+ $B@)E*$K "grep" $B$^$?$O "egrep" $B$r%;%C%H$9$k$h$&$K$7$?.
+
+Fri Jul 2 19:25:58 1993 Takahiro Tanimoto (tt at pc98)
+
+ * grep.c (main): $BJQ?t prog $B$N@_Dj$r DOS $BMQ$K=$@5$7$?. $B$=$N:],
+ $B%*%j%8%J%k$N$d$jJ}$O$^$:$+$C$?$N$G=$@5$7$?.
+
+ * grep.c: MSDOS $B$N$H$- errno $B$H sys_errlist $B$N@k8@$r$7$J$$$h$&$K
+ $B=$@5$7$?.
+
+ * regex.c (set_list_bits): $B;HMQ$7$F$$$J$+$C$?JQ?t$r:o=|.
+
+ * Makefile.msc: DOS $B%5%]!<%H$N$?$aDI2C.
+
+Fri Jun 11 04:14:22 1993 Takahiro Tanimoto (tt@isaac)
+
+ * grep.c: version $BJ8;zNs$,8E$$$^$^$@$C$?.
+
+Tue May 25 00:10:49 1993 Takahiro Tanimoto (tt@isaac)
+
+ * Version 1.6 + multi-byte extension 0.02 released.
+
+Mon May 24 15:57:31 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_search_2): $B8eJ}$X advance $B$9$k:]$N%P%0$r=$@5.
+
+Sat May 22 02:03:41 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c (re_compile_fastmap): exactn $B$G translate $B$9$k$N$r$d$a
+ $B$?. re_compile_pattern $B$G0lEY translate $B$5$l$F$$$k$O$:.
+
+ * regex.c (re_match_2): exactn $B$N=hM}ItJ,$G, #if 0 $B$r #if 1 $B$K$7
+ $B$?>l9g, $B@5$7$$=hM}$r9T$C$F$$$J$+$C$?$N$r=$@5.
+
+Fri May 21 20:04:07 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.[ch]: mbcharset, mbcharset_not $B$rGQ;_. $BBe$o$j$K
+ charset, charset_not $B$,%^%k%A%P%$%HJ8;z$r$bJ];}$9$k.
+
+ * grep.c (main): $B2<5-$NJQ99$KH<$C$F, "^.*(" ... ")" $B$rIU2C$9$k=h
+ $BM}$r:o=|$7$?.
+
+ * dfa.c (regcompile): searchflag $B$, ON $B$N$H$-, $B@55,I=8=$r "^.*("
+ ... ")" $B$H$7$F%3%s%Q%$%k$9$k$h$&$K$7$?. $B0JA0$O grep.c $B$NCf$GF1
+ $B$8$3$H$r9T$C$F$$$?.
+
+ * dfa.c (lex): $BJ8;z%/%i%9$G%^%k%A%P%$%HJ8;z$N#1J8;zL\$N=89g$+$i,
+ $B%7%s%0%k%P%$%HJ8;z$r=|30$9$k=hM}$rDI2C$7$?.
+
+ * dfa.c (lex): $BJ8;z%/%i%9$G%7%s%0%k%P%$%HJ8;z$N>e8B$,4V0c$C$F$$
+ $B$?$N$r=$@5$7$?.
+
+Wed May 19 01:27:07 1993 Takahiro Tanimoto (tt@isaac)
+
+ * regex.c: !__STDC__ $B$N$H$-$K const $B$r #define.
+
+ * dfa.h: $B%*%j%8%J%k$G$O !STDC_HEADERS $B$N$H$-$K const $B$r #define
+ $B$7$F$$$?$,, $B$3$l$r !__STDC__ $B$N$H$-$K #define $B$9$k$h$&$KJQ99$7$?.
+
+ * configure.in: bcopy(), memmove() $B$N%A%'%C%/$rDI2C.
+
+ * dfa.c (reginit): cs_tok[] $B$N=i4|2=$rDI2C$7$?. -i $B%U%i%0$rIU$1
+ $B$?>l9g$NIT6q9g$r=$@5.
+
+Tue May 18 18:14:04 1993 Takahiro Tanimoto (tt@albert)
+
+ * dfa.h: regex.h $B$G$N RE_MBCTYPE_??? $B$NCM$H0lCW$5$;$?.
+
+ * regex.[ch] (RE_TRANSLATED_RANGE): mbsed-0.01 $B$G9T$C$?3HD%$rM"
+ $BF~$7$?.
+
+Sat May 15 04:27:32 1993 Takahiro Tanimoto (tt@isaac)
+
+ * $B%^%k%A%P%$%HJ8;zBP1~HG$,0lDL$j40@.$7$?.
+
+
+Local Variables:
+mode: indented-text
+left-margin: 8
+fill-column: 72
+fill-prefix: " "
+version-control: never
+End:
diff -ru2N grep-2.0/DEFS.dos grep+mb1.04/DEFS.dos
--- grep-2.0/DEFS.dos Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/DEFS.dos Fri Jul 23 03:23:31 1993
@@ -0,0 +1,15 @@
+#define STDC_HEADERS 1
+#define HAVE_STRING_H 1
+#define HAVE_MEMCHR 1
+#define HAVE_STRERROR 1
+#define HAVE_MEMMOVE 1
+#define HAVE_STRCASECMP 1
+#define strcasecmp stricmp
+
+#define BUFSALLOC 4096
+
+#ifndef M_I86SM
+#define __PTR_TO_INT(P) ((long)(P))
+#define __INT_TO_PTR(P) ((char *)(P))
+#define PTR_INT_TYPE long
+#endif
diff -ru2N grep-2.0/MANIFEST.MB grep+mb1.04/MANIFEST.MB
--- grep-2.0/MANIFEST.MB Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/MANIFEST.MB Sat Mar 5 16:37:46 1994
@@ -0,0 +1,11 @@
+ChangeLog.MB Revision history of multi-byte extension to grep.
+DEFS.dos Definitions for DOS.
+MANIFEST.MB This file.
+Makefile.msc Makefile for MS-C version 6.
+README.MB Documentation for multi-byte extension.
+README.MSC Patch for source/startup/... of MS-C 6.00A
+mbc.c Multi-byte char handler.
+mbc.h Interface to mbc.c.
+tests/batgen.awk DOS version of scriptgen.awk.
+tests/check.bat DOS version of check.sh
+tests/spencer.dos Input for batgen.
diff -ru2N grep-2.0/Makefile.in grep+mb1.04/Makefile.in
--- grep-2.0/Makefile.in Mon May 3 05:54:24 1993
+++ grep+mb1.04/Makefile.in Mon Jul 12 02:02:28 1993
@@ -16,4 +16,7 @@
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+# Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+# Last change: Jul. 12, 1993 by t^2
+
SHELL = /bin/sh
@@ -40,4 +43,11 @@
DEFS=-DGREP @DEFS@
+# Things you might set to MBCTYPE_DEF to spec. default multi-byte char type.
+# -DEUC will make default multi-byte char type EUC and
+# -DSJIS SJIS.
+# If you do not set EUC/SJIS, grep assumes no multi-byte
+# char as default.
+MBCTYPE_DEF=-DEUC
+
# Extra libraries.
LIBS=@LIBS@
@@ -69,9 +79,9 @@
#### End of system configuration section. ####
-SRCS=grep.c getopt.c regex.c dfa.c kwset.c obstack.c search.c
-OBJS=grep.o getopt.o regex.o dfa.o kwset.o obstack.o search.o
+SRCS=grep.c getopt.c regex.c dfa.c kwset.c obstack.c search.c mbc.c
+OBJS=grep.o getopt.o regex.o dfa.o kwset.o obstack.o search.o mbc.o
.c.o:
- $(CC) $(CFLAGS) $(DEFS) -I$(srcdir) -c $<
+ $(CC) $(CFLAGS) $(DEFS) $(MBCTYPE_DEF) -I$(srcdir) -c $<
all: grep check.done
@@ -120,7 +130,9 @@
dist:
V=`sed -n '/version\\[/s/.*\\([0-9][0-9]*\\.[0-9]*\\).*/\\1/p' \
+ grep.c`+mb`sed -n '/^ + multi-byte/s/[^0-9]*\\([0-9.]*\\).*/\\1/p' \
grep.c`; \
mkdir grep-$$V; mkdir grep-$$V/tests; \
- for f in `awk '{print $$1}' MANIFEST`; do ln $$f grep-$$V/$$f; done; \
+ for f in `awk '{print $$1}' MANIFEST MANIFEST.MB`; \
+ do ln $$f grep-$$V/$$f; done; \
tar cvhf - grep-$$V | gzip > grep-$$V.tar.z; \
rm -fr grep-$$V
@@ -132,2 +144,3 @@
kwset.o obstack.o: obstack.h
regex.o search.o: regex.h
+grep.o regex.o dfa.o search.o mbc.o: mbc.h
diff -ru2N grep-2.0/Makefile.msc grep+mb1.04/Makefile.msc
--- grep-2.0/Makefile.msc Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/Makefile.msc Fri Jul 23 04:03:17 1993
@@ -0,0 +1,138 @@
+# Generated automatically from Makefile.in by configure.
+# Makefile for GNU grep
+# Copyright (C) 1992 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2, or (at your option)
+# any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+# Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+# Last change: Jul. 23, 1993 by t^2
+
+#### Start of system configuration section. ####
+
+srcdir=.
+VPATH=.
+
+AWK=gawk
+INSTALL=cp
+INSTALL_PROGRAM=$(INSTALL)
+INSTALL_DATA=$(INSTALL)
+
+CC=cl -nologo -D__STDC__ -AL
+LINT=lint
+
+# Things you might add to DEFS:
+# -DSTDC_HEADERS If you have ANSI C headers and libraries.
+# -DHAVE_UNISTD_H If you have unistd.h.
+# -DUSG If you have System V/ANSI C string
+# and memory functions and headers.
+# -D__CHAR_UNSIGNED__ If type `char' is unsigned.
+# gcc defines this automatically.
+#
+# For DOS, add those to DEFS.dos.
+
+# Things you might set to MBCTYPE_DEF to spec. default multi-byte char type.
+# -DEUC will make default multi-byte char type EUC and
+# -DSJIS SJIS.
+# If you do not set EUC/SJIS, grep assumes no multi-byte
+# char as default.
+MBCTYPE_DEF=-DSJIS
+
+# Extra libraries.
+LIBS=setargv/noe/st:30000
+ALLOCA=
+
+CFLAGS=-Ox
+LDFLAGS=$(CFLAGS)
+
+prefix=
+exec_prefix=$(prefix)
+
+# Prefix for installed program, normally empty or `g'.
+binprefix=
+# Prefix for installed man page, normally empty or `g'.
+manprefix=
+
+# Where to install executables.
+bindir=$(exec_prefix)/bin
+
+# Where to install man pages.
+mandir=$(prefix)/man/man1
+
+# Extension for man pages.
+manext=1
+
+# How to make a hard link.
+LN=cp
+
+#### End of system configuration section. ####
+
+SRCS=grep.c getopt.c regex.c dfa.c kwset.c obstack.c search.c mbc.c
+OBJS=grep.obj getopt.obj regex.obj dfa.obj kwset.obj obstack.obj search.obj mbc.obj
+
+.c.obj:
+ cat DEFS.dos $< > $*_.c
+ $(CC) $(CFLAGS) $(MBCTYPE_DEF) -I$(srcdir) -c -Fo$@ $*_.c
+ rm $*_.c
+
+all: grep.exe check.don
+
+# For Saber C.
+grep.loa: $(SRCS)
+ #load $(CFLAGS) $(DEFS) -I$(srcdir) (SRCS)
+
+# For Lint.
+grep.lin: $(SRCS)
+ $(LINT) $(CFLAGS) $(DEFS) -I$(srcdir) $(SRCS)
+
+install: all
+ $(INSTALL_PROGRAM) grep.exe $(bindir)/$(binprefix)grep.exe
+ rm -f $(bindir)/$(binprefix)egrep.exe
+ $(LN) $(bindir)/$(binprefix)grep.exe $(bindir)/$(binprefix)egrep.exe
+ rm -f $(bindir)/$(binprefix)fgrep.exe
+ $(LN) $(bindir)/$(binprefix)grep.exe $(bindir)/$(binprefix)fgrep.exe
+
+check:
+ tests\check
+ echo done >check.don
+
+check.don: grep.exe
+ tests\check
+ echo done >check.don
+
+grep.exe: $(OBJS)
+ echo $(OBJS:.obj =.obj+)+>link.tmp
+ echo $(LIBS)>>link.tmp
+ echo $@/noi;>>link.tmp
+ link @link.tmp
+ rm link.tmp
+
+clean:
+ rm -f grep.exe *.obj check.don tmp.bat tmp.in khadafy.out
+
+mostlycl: clean
+
+distclea: clean
+ rm -f Makefile config.sta
+
+realclea: distclea
+ rm -f TAGS
+
+# Some header file dependencies that really ought to be automatically deduced.
+dfa.obj search.obj: dfa.h
+grep.obj search.obj: grep.h
+kwset.obj search.obj: kwset.h
+kwset.obj obstack.obj: obstack.h
+regex.obj search.obj: regex.h
+grep.obj regex.obj dfa.obj search.obj mbc.obj: mbc.h
diff -ru2N grep-2.0/README.MB grep+mb1.04/README.MB
--- grep-2.0/README.MB Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/README.MB Thu Jun 2 17:03:37 1994
@@ -0,0 +1,327 @@
+$B!|!|!|!|!| GNU grep version 2.0 + multi-byte extension 1.04 $B!|!|!|!|!|
+$B!|!|!|!|!| Jun. 2, 1994 by t^2 $B!|!|!|!|!|
+
+ grep-2.0+mb1.04 -- $B%^%k%A%P%$%HJ8;zBP1~HG GNU grep
+
+$B!|35MW
+
+ GNU $B%W%m%8%'%/%H$K$h$k grep, egrep, fgrep ($B0J2<C1$K grep) $B$r%^%k%A%P
+ $B%$%HJ8;zBP1~2=$7$?$b$N$G$9.
+
+$B!|;HMQK!
+
+ grep $B$+$i$N3HD%ItJ,$@$1$r@bL@$7$^$9.
+
+ $BA}$($?%*%W%7%g%s$O0J2<$NDL$j$G$9.
+
+ -Wctype=ASCII
+ $B%^%k%A%P%$%HJ8;z$r9MN8$7$^$;$s. $B$3$N%*%W%7%g%s$r;HMQ$7$?>l
+ $B9g, grep $B$N%*%j%8%J%k$HF1$8F0:n$K$J$k$O$:$G$9.
+
+ -Wctype=EUC
+ $B%^%k%A%P%$%HJ8;z$H$7$F EUC $B$rG'<1$7$^$9.
+
+ -Wctype=SJIS
+ $B%^%k%A%P%$%HJ8;z$H$7$F Shift-JIS $B$rG'<1$7$^$9.
+
+ MS-DOS $B0J30$N%7%9%F%`$G, Makefile(.in)? $B$r=q$-49$($:$K%$%s%9%H!<
+ $B%k$7$?>l9g, $B%G%U%)%k%H$G$O EUC $B$rG'<1$7$^$9. MS-DOS $B$G$O%G%U%)
+ $B%k%H$G Shift-JIS $B$rG'<1$7$^$9.
+
+$B!| GREM104.LZH (MS-DOS $BHG<B9T7A<0$r4^$`%"!<%+%$%V) $B$K$D$$$F ($B$=$l0J30$N
+ $B7ABV$GF~<j$5$l$?J}$OL5;k$7$F$/$@$5$$)
+
+ 1. $B%"!<%+%$%V$K4^$^$l$F$$$k%U%!%$%k
+
+ $B%*%j%8%J%k$+$iA4$/<j$r2C$($F$$$J$$%U%!%$%k
+
+ AUTHORS $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k AUTHORS
+ CHANGELO $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k ChangeLog
+ COPYING $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k COPYING
+ MANIFEST $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k MANIFEST
+ NEWS $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k NEWS
+ PROJECTS $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k PROJECTS
+ README $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k README
+
+ grep+mb $BMQ$N%U%!%$%k
+
+ CHANGELO.MB grep+mb $B$NJQ99MzNr
+ README.MB $B$3$N%U%!%$%k
+
+ MS-DOS $BHG grep+mb $BMQ$N%U%!%$%k
+
+ GREP.CAT $B%*%j%8%J%k$N%=!<%9$K4^$^$l$F$$$k%^%K%e%"%k%Z!<%8.
+ grep.man $B$r GNU roff $B$G%U%)!<%^%C%H$7$?$b$N.
+ GREP.EXE MS-DOS $BHG grep-2.0+mb1.04 $B$N<B9T7A<0
+ READMAN.SED sed $B$r;}$C$F$$$k?M$X$*$^$1
+ (sed -f readman.sed grep.cat)
+
+ 2. GREP.EXE $B$K$D$$$F
+
+ grep-2.0+mb1.04 $B$r MS-C 6.00A $B$G%3%s%Q%$%k$7$?$b$N$G$9.
+
+ $B%G%U%)%k%H$G Shift-JIS $B4A;z%3!<%I$r4^$`%Q%?!<%s$d%F%-%9%H$r=hM}
+ $B$G$-$^$9.
+
+ setargv.obj $B$rAH$_9~$s$G$"$j$^$9$N$G, MS-DOS $B$G%]%T%e%i!<$J%?%$
+ $B%W$N%o%$%k%I%+!<%I$,;HMQ$G$-$^$9. UNIX $B$N csh $B%i%$%/$J%o%$%k%I
+ $B%+!<%IE83+%k!<%A%s$rMQ0U$7$h$&$+$H$b;W$C$?$N$G$9$,, MS-DOS $B$NB>
+ $B$N%3%^%s%I$H$N@09g@-$,<h$l$J$$$7, $B%*%j%8%J%k$r$J$k$Y$/B:=E$7$?$+$C
+ $B$?$N$GCGG0$7$^$7$?.
+
+ 3. $B%$%s%9%H!<%k
+
+ GREP.EXE $B$O, grep $B$O$b$A$m$s, egrep, fgrep $B$N5!G=$r4^$s$G$$$^$9.
+ grep $B$K -E $B%*%W%7%g%s$rM?$($k$H egrep $B$NF0:n, -F $B%*%W%7%g%s$rM?
+ $B$($k$H fgrep $B$NF0:n$r$7$^$9. $B$^$?, GREP.EXE $B$r EGREP.EXE,
+ FGREP.EXE $B$H$$$&L>A0$G%3%T!<$7$F egrep, fgrep $B$H$7$F5/F0$9$k$H,
+ $B$=$NL>A0$K$U$5$o$7$$F0:n$r$7$^$9. $B%O!<%I%G%#%9%/$KM>M5$N$J$$J}
+ $B0J30$O,
+
+ A>copy grep.exe a:\bin
+ A>copy grep.exe a:\bin\egrep.exe
+ A>copy grep.exe a:\bin\fgrep.exe
+
+ $B$J$I$H$7$F$4;HMQ$K$J$i$l$k$3$H$r$*4+$a$7$^$9. $B$I$&$7$F$b%O!<%I
+ $B%G%#%9%/$NL5BL;H$$$r$7$?$/$J$1$l$P,
+
+ @echo off
+ grep -E %1 %2 %3 %4 %5 %6 %7 %8 %9
+
+ $B$J$I$N%P%C%A%U%!%$%k$r:n@.$9$k$H$$$&<j$b$"$j$^$9.
+
+ 4. $B%3%^%s%I%i%$%s0z?t$K$D$$$F
+
+ $BA0=R$7$?$H$*$j MS-C $B$N setargv.obj $B$r%j%s%/$7$F$$$^$9$N$G, $B$=$N
+ $B;EMM$K=>$o$J$1$l$P$J$j$^$;$s.
+
+ $B#1$D#1$D$N0z?t$O6uGr$G6h@Z$j$^$9. $B0z?t$K6uGr, ", \, <, >, | $B$r
+ $B4^$`$H$-$O%/%)!<%F%#%s%0$,I,MW$G$9. $B$=$NJ}K!$O COMMAND.COM $B$N%P
+ $B%0=-$$;EMM$H, $B$5$i$K setargv.obj $B$K$bLdBj$,$"$j, $B$+$J$jFq$7$$$N
+ $B$G$3$3$G$O@bL@$r>J$-$^$9. $B3F<+8&5f$7$F$/$@$5$$. $B0lHV4JC1$J$N$O,
+ $B8!:w%Q%?!<%s$r%U%!%$%k$K$7$F
+
+ grep -f $B%U%!%$%kL>
+
+ $B$H$9$k$3$H$G$9.
+
+ 5. $B%^%K%e%"%k
+
+ roff $B7O$N%U%)!<%^%C%?$r;H$($J$$?M$N$?$a$K GNU roff $B$G%U%)!<%^%C
+ $B%H:Q$_$N%^%K%e%"%k$rMQ0U$7$^$7$?. $B%\!<%k%I%U%'!<%9, $B%"%s%@!<%i
+ $B%$%sBP1~$N less $B$J$I$G$*FI$_$/$@$5$$. $B%(%G%#%?$J$I$G$O ^H $B$,F~$C
+ $B$F$$$FFI$_$K$/$$$H;W$$$^$9.
+
+ s/.^H//g
+
+ $B$H$$$& sed $B$N%W%m%0%i%`$KDL$;$P, $BDL>o$N%F%-%9%H%U%!%$%k$,F@$i$l
+ $B$^$9. (^H $B$H$$$&$N$O%3%s%H%m!<%k%3!<%I$rD>@\Kd$a$3$`$H$$$&0UL#
+ $B$G$9.)
+
+$B!|%$%s%9%H!<%k (MS-DOS $B0J30)
+
+ $B%G%U%)%k%H$N%^%k%A%P%$%HJ8;z$N@_Dj$O, Makefile.in $B$NCf$G;XDj$7$^$9.
+ $B%G%U%)%k%H$r Shift-JIS $B$H$9$k>l9g$H, $B%G%U%)%k%H$G%^%k%A%P%$%HJ8;z$r
+ $B;HMQ$7$J$$>l9g$O Makefile.in $B$N MBCTYPE_DEF $B%^%/%m$NDj5A$r$=$l$>$l0J
+ $B2<$N$h$&$KJQ$($F$/$@$5$$.
+
+ MBCTYPE_DEF=-DSJIS ($B%G%U%)%k%H$G Shift-JIS $B$N>l9g)
+ MBCTYPE_DEF= ($B%G%U%)%k%H$G;HMQ$7$J$$>l9g)
+
+ $B$$$:$l$N>l9g$G$b5/F0;~$N%*%W%7%g%s$K$h$j%^%k%A%P%$%HJ8;z%3!<%I$NA*Br
+ $B$,2DG=$G$9.
+
+ $B$=$NB>$N:n6H$O, $B%*%j%8%J%k$N grep $B$HF1MM$G$9$N$G INSTALL $B$r$*FI$_$/
+ $B$@$5$$.
+
+$B!|%$%s%9%H!<%k (MS-DOS $BHG. $B$3$3$G$$$&%$%s%9%H!<%k$H$$$&$N$O, $B%=!<%9$+$i
+ $B$N%$%s%9%H!<%k$N$3$H$G$9)
+
+ MS-C 6.00A $B$r;HMQ$7$F, $B%G%U%)%k%H$G Shift-JIS $B$rG'<1$9$k grep $B$r:n@.
+ $B$9$k>l9g$O, README.MSC $B$KL\$rDL$7$F, $BI,MW$J$i%i%$%V%i%j$K%Q%C%A$rEv
+ $B$F$?8e,
+
+ A>nmake -f makefile.msc
+
+ $B$H$9$k$@$1$G#O#K$G$9. grep.exe $B$r:n@.8e, $B<+F0E*$K%F%9%H$r9T$$$^$9.
+ $B$=$N:], grep $B$+$i$N%(%i!<%a%C%;!<%8$,$$$/$D$+I=<($5$l$^$9$,, $B$=$l$O
+ $B0[>o$G$O$"$j$^$;$s. $B%(%i!<$r4^$`%Q%?!<%s$rEO$7$?;~$K, $B=*N;%9%F!<%?
+ $B%9$, 2 $B$H$J$k$3$H$r3NG'$7$F$$$k$@$1$G$9. $BK\Ev$K0[>o$,$"$C$?>l9g$O
+ "Spencer test #nn faild" (nn $B$O?t;z) $B$HI=<($5$l$^$9.
+
+ $B%F%9%H$K%Q%9$7$?$i, grep.exe $B$rE,Ev$J%G%#%l%/%H%j$K%3%T!<$7$F$/$@$5
+ $B$$. $B$=$N:], $BL>A0$r egrep.exe, fgrep.exe $B$HJQ$($k$@$1$G, $B$=$l$>$l
+ egrep, fgrep $B$NF0:n$r$7$^$9. $B$=$3$G, $BNc$($P a:\bin $B$X%$%s%9%H!<%k$9
+ $B$k>l9g,
+
+ A>copy grep.exe a:\bin
+ A>copy grep.exe a:\bin\egrep.exe
+ A>copy grep.exe a:\bin\fgrep.exe
+
+ $B$J$I$H$7$^$9.
+
+ $B$=$NB>$N=hM}7O$r;HMQ$9$k>l9g$d, $B%G%U%)%k%H$r Shift-JIS $B0J30$K$9$k>l
+ $B9g$O Makefile.msc $B$r;29M$K Makefile $B$r=q$$$F$/$@$5$$. $B$J$*, $B%F%9%H$K
+ $B$O awk (gawk) $B$,I,MW$G$9.
+
+$B!|%P%0
+
+ 1. $B$$$o$f$k JIS $B$K$OBP1~$7$F$$$^$;$s. $B>-MhBP1~$9$kM=Dj$b$"$j$^$;$s.
+
+ 2. $B%^%k%A%P%$%HJ8;z%3!<%I$O$"$^$j873J$K$O9M$($F$$$^$;$s.
+
+ EUC $B#1%P%$%HL\ ... 0x80 - 0xff
+ EUC $B#2%P%$%HL\ ... 0x01 - 0xff (0x0a $B$r=|$/)
+
+ Shift-JIS $B#1%P%$%HL\ ... 0x80 - 0x9f, 0xe0 - 0xff
+ Shift-JIS $B#2%P%$%HL\ ... 0x01 - 0xff (0x0a $B$r=|$/)
+
+ $B$H$7$F=hM}$7$F$$$^$9. $BH>3Q%+%J$b;H$($^$9. EUC $B$N SS3 (0x8f) $B$K
+ $B;O$^$k#3%P%$%H%3!<%I$O;H$($^$;$s. ($B;d$O$3$l$r%5%]!<%H$7$F$$$k%7
+ $B%9%F%`$r8+$?$3$H$,$J$$...)
+
+ 3. -b $B%*%W%7%g%s$GI=<($5$l$k%P%$%H%*%U%;%C%H$O DOS $B$N>l9g CR+LF $B$r 1
+ $B$H$7$F%+%&%s%H$7$?CM$K$J$j$^$9. ($B<jH4$-)
+
+$B!|%"%k%4%j%:%` (dfa.[ch] $B$N%^%k%A%P%$%HJ8;zBP1~2=)
+
+ $B0JA0$OGyA3$H, DFA $B$rD>@\ EUC $B$d Shift-JIS $B$N$h$&$JJ8;z<o$NB?$$%3!<%I
+ $B%;%C%H$KBP1~$5$;$k$N$O, $BHs>o$KFq$7$$$H;W$C$F$$$^$7$?. $B$H$3$m$,$"$k
+ $BF|, $B<+:n%i%$%V%i%j$N%F%9%HMQ$K, $B@55,I=8=$r DFA $B$XJQ49$9$k4JC1$J%W%m
+ $B%0%i%`$r=q$$$?$H$-$K, $BFMA3$&$^$$%"%$%G%#%"$,A.$$$?$N$G$9. $B%^%k%A%P
+ $B%$%HJ8;z$H$$$($I$b7k6I$O%P%$%H$NJB$S$G$9. $B%^%k%A%P%$%HJ8;z$r, $B$9$Y
+ $B$F%P%$%HC10L$KJ,2r$7$F, $B@55,I=8=$r:n$C$F$7$^$($P$h$+$C$?$N$G$9.
+
+ $B8@MU$G$O$&$^$/I=8=$G$-$J$$$N$G, $B0J2<$N5-9f$r;HMQ$7, $B$I$&$$$&$U$&$K%P
+ $B%$%HC10L$KJ,2r$7$F$$$k$N$+, $BNc$r5s$2$^$9.
+
+ a, b, c ... $B%7%s%0%k%P%$%HJ8;z.
+ x, y, z ... $B%^%k%A%P%$%HJ8;z$N#1J8;zL\.
+
+ . ($BG$0U$N#1J8;z)
+ ==> [a-c]|[x-z][a-z]
+
+ ($B%7%s%0%k%P%$%HJ8;z$+, $B$^$?$O%^%k%A%P%$%HJ8;z$N#1J8;zL\$H
+ $BG$0U$N#1J8;z$NO"@\.)
+
+ [xb-zx] (xb $B$+$i zx $B$NHO0O$N%^%k%A%P%$%HJ8;z)
+ ==> x[b-z]|y[a-z]|z[a-x]
+
+ yb*
+ ==> (yb)*
+
+ $B<B:]$K$O@55,I=8=$r:n$j=P$9$N$G$O$J$/, $B@55,I=8=$rJ,2r$7$?%H!<%/%s$rD>
+ $B@\@8@.$7$F$$$^$9. $B$3$NJU, $B6=L#$,$"$kJ}$O%=!<%9$r8+$?$[$&$,Aa$$$H;W
+ $B$$$^$9. ($B$"$^$j%(%l%,%s%H$G$O$"$j$^$;$s$N$G%=!<%9$r$8$C$/$j8+$i$l$k
+ $B$N$OCQ$:$+$7$$5$$b$7$^$9$,...)
+
+ $B$3$l$@$1$G$O, $BNc$($P$"$k%F%-%9%H$+$i xy $B$H$$$&J8;z$rC5$=$&$H$9$k$H,
+ xxyy $B$N$h$&$JJ8;z$NJB$S$K$^$GH?1~$7$F$7$^$$$^$9. $B$=$3$G, $B%^%k%A%P%$
+ $B%H%b!<%I$N$H$-$K$OI,$: "^.*(" + $B%f!<%6%Q%?!<%s + ")" $B$H$7$F=hM}$7$^
+ $B$9. '.*' $B$K$h$j, '.' $B$O%^%k%A%P%$%HJ8;z$N0lIt$K$O%^%C%A$7$^$;$s$+$i,
+ $BF,=P$7$G$-$k$o$1$G$9.
+
+$B!| dfa.[ch], regex.[ch] $B$N3HD%;EMM
+
+ dfa.[ch], regex.[ch] $B%b%8%e!<%k$O mbc.[ch] $B%b%8%e!<%k$K0MB8$7$F$$$^
+ $B$9. $B$^$?, $B$3$l$O%*%j%8%J%k$N;EMM$G$9$,, dfa.[ch] $B$r;HMQ$9$k>l9g$O
+ regex.h $B$NDj5A$,I,MW$G$9.
+
+ $B%^%k%A%P%$%HJ8;z$N%?%$%W$O, mbc.[ch] $B$N mbcinit() $B$G@_Dj$7$^$9.
+ mbc.h $B$KDj5A$5$l$F$$$k%^%/%m MBCTYPE_ASCII, MBCTYPE_EUC,
+ MBCTYPE_SJIS $B$N$$$:$l$+$r mbcinit() $B$KEO$7$F$/$@$5$$.
+
+ dfa.[ch] $B$O, $B%Q%?!<%s$N%3%s%Q%$%k;~$K$@$1, $B$3$N mbc.[ch] $B$N@_Dj$r;2
+ $B>H$7$^$9. $B%Q%?!<%s%^%C%A%s%0$N:]$O, $B%3%s%Q%$%k;~$K@_Dj$5$l$F$$$?,
+ $B%^%k%A%P%$%HJ8;z$N%?%$%W$r8!:w$7$^$9.
+
+ $B0lJ}, regex.[ch] $B$O, $B%Q%?!<%s%3%s%Q%$%k;~, $B%^%C%A%s%0;~$NN>J}$G
+ mbc.[ch] $B$N@_Dj$r;2>H$7$^$9. $B$,, $B$3$NN><T$G mbc.[ch] $B$N@_Dj$rJQ99$9
+ $B$k$3$H$O$G$-$^$;$s. $B$D$^$j, Shift-JIS $B$G5-=R$5$l$?%Q%?!<%s$r, EUC
+ $B%F%-%9%H$+$i8!:w$9$k$H$$$C$?F0:n$O$G$-$^$;$s. $BCm0U$7$F$/$@$5$$.
+
+ $B%^%k%A%P%$%HJ8;zBP1~$KH<$C$FCm0U$9$Y$-@55,I=8=$r0J2<$K5-$7$^$9.
+
+ . $BG$0U$N#1%P%$%HJ8;z, $B@5Ev$J%^%k%A%P%$%HJ8;z$K%^%C%A$7$^$9.
+ $B!V@5Ev$J%^%k%A%P%$%HJ8;z!W$H$O, $B%^%k%A%P%$%HJ8;z$N#1J8;z
+ $BL\$K, '\0' $B$^$?$O '\n' $B0J30$,B3$/J8;z$N$3$H$G$9.
+
+ [x-y] $BJ8;z%3!<%I ($BFbItI=8=) $B$, x $B$+$i y $B$NHO0O$K$"$kG$0U$N#1J8
+ $B;z$K%^%C%A$7$^$9. $B$3$l$b . $B$HF1$8$/, $B@5Ev$G$J$$J8;z$K$O
+ $B%^%C%A$7$^$;$s.
+
+ [^x-y] $BJ8;z%3!<%I ($BFbItI=8=) $B$, x $B$+$i y $B$NHO0O$K$J$$G$0U$N#1J8
+ $B;z$K%^%C%A$7$^$9. $B@5Ev$G$J$$J8;z$K$b%^%C%A$7$^$9.
+
+ $B%^%k%A%P%$%HJ8;z$NFbItI=8=$OC1$K#1%P%$%HL\$r>e0L%P%$%H, $B#2%P%$%HL\$r
+ $B2<0L%P%$%H$H$7$?#1#6%S%C%HId9f$J$7@0?t$G$9. Shift-JIS $B$G$b EUC $B$G$b
+
+ $B#1%P%$%H ASCII $BJ8;z < $BH>3Q%+%JJ8;z < $BA43QJ8;z
+
+ $B$H$$$&Bg>.4X78$,@.$jN)$C$F$$$^$9.
+
+$B!|$=$NB>
+
+ 1. $B%*%j%8%J%k$N GNU grep $B$NCx:n8"$O Free Software Foundation, Inc.
+ $B$,M-$7$F$$$^$9. $B%Q%C%AItJ, (grep-mb.diff) $B$NCx:n8"$O;d (t^2) $B$,M-
+ $B$7$F$$$^$9.
+
+ 2. GNU grep $B$N%=!<%9%3!<%I$O3F=j$N ftp $B%5%$%H, $B$b$7$/$O Nifty-serve
+ $B$N FUNIX $B$N%G!<%?%i%$%V%i%j$+$iF~<j2DG=$G$9. GNU grep $B$+$i
+ grep+mb $B$X$N:9J, grep-mb.diff $B$O, $B;d$, FUNIX $B$XEPO?$7, $BF21`OBO:;a
+ (dohzono@sdsft.kme.mei.co.jp) $B$, fj.sources $B$X%]%9%H$7$F$/$@$5$C
+ $B$F$$$^$9.
+
+ 3. $B:9J, grep-mb.diff $B$N:FG[I[$O<+M3$G$9. $B$3$l$K4X$7$F$O FSF $B$N5,Dj$K
+ $B=>$&I,MW$b$"$j$^$;$s. $B$7$+$7:9J,$rE,MQ$7$?7k2L$N%=!<%9%3!<%I, $B$*
+ $B$h$S<B9T7A<0$G$N:FG[I[$N:]$O GNU GENERAL PUBLIC LICENSE (COPYING
+ $B;2>H) $B$K=>$C$F$/$@$5$$.
+
+ grep+mb $B$K2?$i$+$N2~JQ$r2C$($?$b$N$r:FG[I[$9$k:]$b, GNU GENERAL
+ PUBLIC LICENSE $B$K=>$&$h$&$KCm0U$7$F$/$@$5$$. $B$^$? grep+mb $B$K4^$^
+ $B$l$k%3!<%I (dfa.[ch] $B$d regex.[ch]) $B$rMxMQ$7$?%W%m%0%i%`$rG[I[$9
+ $B$k:]$b GNU GENERAL PUBLIC LICENSE $B$N3:EvItJ,$K=>$C$F$/$@$5$$.
+
+ $B$^$?5AL3$G$O$"$j$^$;$s$,:FG[I[$5$l$kJ}$O;v8e$K$G$bO"Mm$r$/$@$5$$.
+ $B$=$7$F2DG=$J8B$j, $B?7$7$$%P!<%8%g%s$X$N%"%C%W%G!<%H$KEX$a, $BMxMQ<T
+ $B$+$i$NO"Mm$,;d$KFO$/$h$&$KG[N8$7$F$/$@$5$$.
+
+ 4. $B$3$N%W%m%0%i%`$OL5J]>Z$G$9.
+
+ 5. grep+mb $B$K2?$i$+$NIT6q9g$,H/@8$7$?>l9g, (FSF $B$d, $B%*%j%8%J%k$N:n<T
+ $B$G$O$J$/) $B;d$KO"Mm$7$F$/$@$5$$. $BG[I[$7$??M$,4uK>$7$F$$$k>l9g$O,
+ $B$=$N?M$KO"Mm$7$F$/$@$5$$.
+
+ 6. $B$4<ALd/$B$4MWK>/$B$*<8$j, $B$=$NB>$bBg4?7^$G$9. $B$G$-$k$+$.$j%5%]!<%H$7
+ $B$^$9.
+
+$B!|<U<-
+
+ $B86:n<T$*$h$S FSF $B$K46<U$7$^$9.
+
+ $B%I%-%e%a%s%H:n@.$K4X$7$F=u8@$r$/$@$5$C$?F21`OBO:;a
+ (dohzono@sdsft.kme.mei.co.jp) $B$K46<U$7$^$9.
+
+ $B$3$l$^$GE>:\/$B%P%0Js9p$r$/$@$5$C$?J}!9$K46<U$7$^$9. $B<BL>$r5s$2$5$;$F
+ $BD:$-$?$+$C$?$N$G$9$,%O!<%I%G%#%9%/$N%H%i%V%k$G$[$H$s$I$N%a!<%k$r>C<:
+ $B$5$;$F$7$^$$$^$7$?.
+
+ $B:G8e$K, $B5.=E$J%G%#%9%/%9%Z!<%9$r grep+mb $B$N$?$a$K3d$$$F$4;HMQD:$$$F
+ $B$$$k$9$Y$F$NMxMQ<T$NJ}!9$K46<U$7$^$9.
+
+$B!|!V;d!W$NO"Mm@h
+
+ $B")810 $BJ!2,;TCf1{6hG_8w1`CDCO 7-207 ($BCm: $BE>5o$7$^$7$?)
+ TEL/FAX: 092-731-4025 (TEL/FAX $B<+F0@ZBX$()
+ 092-724-6342 (TEL $B$N$_)
+ E-mail: NBC02362@niftyserve.or.jp $BC+K\9'9@
+
+# Local variables:
+# mode: indented-text
+# indent-tabs-mode: nil
+# tab-stop-list: (4 8 16 24 32 40 48 56 64 72 80)
+# left-margin: 4
+# fill-column: 72
+# fill-prefix: " "
+# version-control: never
+# End:
diff -ru2N grep-2.0/README.MSC grep+mb1.04/README.MSC
--- grep-2.0/README.MSC Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/README.MSC Sat Mar 5 16:14:14 1994
@@ -0,0 +1,99 @@
+PC-9801 $BMQ MS-C version 6.00A $B$N0z?t$N%;%C%H%"%C%W%k!<%A%s$K$O%P%0$,$"$j
+$B$^$9.
+
+#include <stdio.h>
+
+int
+main(int argc, char **argv)
+{
+ int i;
+
+ for (i = 0; i <= argc; i++)
+ printf("argv[%d] == %s\n", i, argv[i]);
+ return 0;
+}
+
+$B$r%3%s%Q%$%k, $B%j%s%/$7$? FOO.EXE $B$K
+
+ A>foo "\\" abc
+
+$B$J$I$N0z?t$rEO$7$F<B9T$9$k$H, $B%P%0$,3NG'$G$-$^$9. $B$^$?, $B%o%$%k%I%+!<%I
+$BE83+%k!<%A%s$K$b%P%0$,$"$j, $B>e5-$N%W%m%0%i%`$r SETARGV.OBJ $B$H$H$b$K%j%s
+$B%/$7$F
+
+ A>foo \DOS\*.com
+
+$B$J$I$N0z?t$G<B9T$9$k$H, $B$*$+$7$JE83+$N;EJ}$r$7$F$7$^$$$^$9.
+
+$B$3$N%P%0$O SOURCE/STARTUP $B2<$N DOS/STDARGV.ASM $B$*$h$S WILD.C $B$K0J2<$N%Q%C
+$B%A$rEv$F$k$H=$@5$G$-$k$h$&$G$9. $B%Q%C%A$rEv$F$F STARTUP.BAT $B$G%3%s%Q%$%k
+$B$7$F$/$@$5$$. $B$=$N8e, $BNc$($P%i!<%8%b%G%kMQ$N%i%$%V%i%j$r=$@5$9$k>l9g,
+L/DOS/STDARGV.OBJ, L/DOS/_SETARGV.OBJ, L/WILD.OBJ $B$r$=$l$>$l
+KSTDARGV.OBJ, _KSTARGV.OBJ, KWILD.OBJ $B$H%j%M!<%`$7,
+
+ lib \msc6\lib\llibce.lib-+dos\kstdargv.obj-+dos\_kstargv.obj-+kwild.obj;
+
+$B$J$I$H$7$F%b%8%e!<%k$r99?7$7$F$/$@$5$$. $BG0$N$?$a$3$N:n6H$r9T$&A0$K,
+
+ lib \msc6\lib\llibce.lib*kstdargv.obj*_kstargv.obj*kwild.obj;
+
+$B$J$I$G, kstdargv.obj, _kstargv.obj, kwild.obj $B$N%P%C%/%"%C%W$r$H$C$FCV$/
+$B$H$$$$$G$7$g$&.
+
+$B$J$*, $B$3$N%Q%C%A$OEvA3$N$3$H$J$,$iL5J]>Z$G$9.
+
+Mar. 5, 1994 t^2
+
+*** stdargv.org Mon Oct 8 19:50:46 1990
+--- stdargv.asm Thu Jul 22 17:50:44 1993
+***************
+*** 409,415 ****
+ shr cx,1
+ adc dx,cx ; add 1 for every pair of backslashes
+ test al,1 ; plus 1 for the " if odd number of \
+! jz arg310 ; [J1]
+ jmp arg210 ; [J1]
+ ;
+ ; Command line is fully parsed - compute number of bytes needed
+--- 409,415 ----
+ shr cx,1
+ adc dx,cx ; add 1 for every pair of backslashes
+ test al,1 ; plus 1 for the " if odd number of \
+! jnz arg310 ; ! Jul.21.93 t^2
+ jmp arg210 ; [J1]
+ ;
+ ; Command line is fully parsed - compute number of bytes needed
+
+*** wild.org Mon Oct 8 19:49:48 1990
+--- wild.c Sat Mar 5 00:42:12 1994
+***************
+*** 186,197 ****
+ char *ptr2 = arg; // [J1]
+
+ if(ptr != arg) { // [J1]
+! while(ptr2 + 1 != ptr && *ptr2 != SLASHCHAR && *ptr2 != FWDSLASHCHAR
+! && *ptr2 != ':') { // [J1]
+ if(iskanji(*ptr2)) ptr2++; // [J1]
+ ptr2++; // [J1]
+ } // [J1]
+! ptr = ptr2; // [J1]
+ } // [J1]
+
+ if (*ptr == ':' && ptr != arg+1) /* weird name, just add it as is */
+--- 186,201 ----
+ char *ptr2 = arg; // [J1]
+
+ if(ptr != arg) { // [J1]
+! char *ptr3 = arg;
+!
+! while (ptr2 < ptr) {
+! if (*ptr2 == SLASHCHAR || *ptr2 == FWDSLASHCHAR
+! || *ptr2 == ':')
+! ptr3 = ptr2;
+ if(iskanji(*ptr2)) ptr2++; // [J1]
+ ptr2++; // [J1]
+ } // [J1]
+! ptr = ptr3;
+ } // [J1]
+
+ if (*ptr == ':' && ptr != arg+1) /* weird name, just add it as is */
diff -ru2N grep-2.0/configure grep+mb1.04/configure
--- grep-2.0/configure Sat May 22 13:20:23 1993
+++ grep+mb1.04/configure Fri Jul 9 13:05:45 1993
@@ -566,5 +566,5 @@
fi
-for func in getpagesize memchr strerror valloc
+for func in getpagesize memchr strerror valloc bcopy memmove strcasecmp
do
trfunc=HAVE_`echo $func | tr '[a-z]' '[A-Z]'`
diff -ru2N grep-2.0/configure.in grep+mb1.04/configure.in
--- grep-2.0/configure.in Sat May 22 13:20:16 1993
+++ grep+mb1.04/configure.in Fri Jul 9 13:05:32 1993
@@ -12,5 +12,5 @@
AC_SIZE_T
AC_ALLOCA
-AC_HAVE_FUNCS(getpagesize memchr strerror valloc)
+AC_HAVE_FUNCS(getpagesize memchr strerror valloc bcopy memmove strcasecmp)
AC_CHAR_UNSIGNED
AC_CONST
diff -ru2N grep-2.0/dfa.c grep+mb1.04/dfa.c
--- grep-2.0/dfa.c Mon May 31 08:02:20 1993
+++ grep+mb1.04/dfa.c Sat Jul 10 01:17:14 1993
@@ -18,4 +18,6 @@
/* Written June, 1988 by Mike Haertel
Modified July, 1988 by Arthur David Olson to assist BMG speedups */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
#include <assert.h>
@@ -35,4 +37,8 @@
#undef index
#define index strchr
+#undef bcopy
+#define bcopy(s, d, n) memcpy(d, s, n)
+#undef bzero
+#define bzero(d, n) memset(d, 0, n)
#else
#include <strings.h>
@@ -71,4 +77,5 @@
#include "dfa.h"
#include "regex.h"
+#include "mbc.h"
#if __STDC__
@@ -141,5 +148,8 @@
fprintf(stderr, "END");
else if (t < NOTCHAR)
- fprintf(stderr, "%c", t);
+ if (t & 0x80)
+ fprintf(stderr, "0x%02x", (unsigned char)t);
+ else
+ fprintf(stderr, "%c", t);
else
{
@@ -239,4 +249,16 @@
}
+static int
+isemptyset(s)
+ charclass s;
+{
+ int i;
+
+ for (i = 0; i < CHARCLASS_INTS; i++)
+ if (s[i])
+ return 0;
+ return 1;
+}
+
/* A pointer to the current dfa is kept here during parsing. */
static struct dfa *dfa;
@@ -259,5 +281,6 @@
/* Syntax bits controlling the behavior of the lexical analyzer. */
-static int syntax_bits, syntax_bits_set;
+static unsigned long syntax_bits;
+static int syntax_bits_set;
/* Flag for case-folding letters into sets. */
@@ -267,5 +290,5 @@
void
dfasyntax(bits, fold)
- int bits;
+ unsigned long bits;
int fold;
{
@@ -289,4 +312,66 @@
static int minrep, maxrep; /* Repeat counts for {m,n}. */
+static charclass cs_cset[8];
+static token cs_tok[8] = {0, 0, 0, 0, 0, 0, 0, 0};
+
+static enum {
+ MBEXTTOK_NONE = -1,
+ MBEXTTOK_NOTCHAR = 256,
+ MBEXTTOK_ORMBC = MBEXTTOK_NOTCHAR,
+ MBEXTTOK_ORMBC_NL,
+ MBEXTTOK_CLASS,
+ MBEXTTOK_INVCLASS,
+} mbexttok = MBEXTTOK_NONE;
+
+static charclass mbcset_set;
+static charclass mbcset_all;
+static charclass mbcset[128]; /* 128*256/8 = 4 Kbytes */
+
+/* $BIQHK$K;HMQ$5$l$k ($B$H;W$o$l$k) $BJ8;z=89g$r%H!<%/%s$H$7$FJV$9.
+ n = 0 ... 1$B%P%$%HJ8;zA4BN$N=89g.
+ 1 ... 2$B%P%$%HJ8;z$N1$B%P%$%HL\A4BN$N=89g.
+ 2 ... 2$B%P%$%HJ8;z$N2$B%P%$%HL\A4BN$N=89g.
+ +4 ... '\n'$B$r=|30$7$J$$. */
+static token
+setcodeset(n)
+ int n;
+{
+ token c;
+
+ if (!cs_tok[n]) {
+ zeroset(cs_cset[n]);
+ switch (n) {
+ case 0:
+ case 4:
+ /* 1$B%P%$%HJ8;zA4BN$N=89g. */
+ for (c = 0; c < NOTCHAR; c++)
+ if (ismbchar(c))
+ setbit(c, cs_cset[n]);
+ notset(cs_cset[n]);
+ break;
+ case 1:
+ case 5:
+ /* 2$B%P%$%HJ8;z$N1$BJ8;zL\A4BN$N=89g. */
+ for (c = 0; c < NOTCHAR; c++)
+ if (ismbchar(c))
+ setbit(c, cs_cset[n]);
+ break;
+ case 2:
+ case 6:
+ /* 2$B%P%$%HJ8;z$N2$BJ8;zL\A4BN$N=89g. */
+ notset(cs_cset[n]);
+ break;
+ }
+ if (!(n & 4)) {
+ if (syntax_bits & RE_DOT_NOT_NULL || n != 0)
+ clrbit('\0', cs_cset[n]);
+ if (!(syntax_bits & RE_DOT_NEWLINE) || n != 0)
+ clrbit('\n', cs_cset[n]);
+ }
+ cs_tok[n] = CSET + charclass_index(cs_cset[n]);
+ }
+ return cs_tok[n];
+}
+
/* Note that characters become unsigned here. */
#define FETCH(c, eoferr) \
@@ -362,4 +447,5 @@
it means that just about every case begins with
"if (backslash) ...". */
+ mbexttok = MBEXTTOK_NONE;
for (i = 0; i < 2; ++i)
{
@@ -543,14 +629,19 @@
if (backslash)
goto normal_char;
+ if (current_mbctype != MBCTYPE_ASCII)
+ mbexttok = MBEXTTOK_ORMBC;
+ laststart = 0;
+ return setcodeset(0);
+
+ case 'w':
+ if (!backslash)
+ goto normal_char;
zeroset(ccl);
- notset(ccl);
- if (!(syntax_bits & RE_DOT_NEWLINE))
- clrbit('\n', ccl);
- if (syntax_bits & RE_DOT_NOT_NULL)
- clrbit('\0', ccl);
+ for (c2 = 0; c2 < NOTCHAR; ++c2)
+ if (ISALNUM(c2))
+ setbit(c2, ccl);
laststart = 0;
return lasttok = CSET + charclass_index(ccl);
- case 'w':
case 'W':
if (!backslash)
@@ -558,8 +649,7 @@
zeroset(ccl);
for (c2 = 0; c2 < NOTCHAR; ++c2)
- if (ISALNUM(c2))
+ if (!ISALNUM(c2) && !ismbchar(c2))
setbit(c2, ccl);
- if (c == 'W')
- notset(ccl);
+ mbexttok = MBEXTTOK_ORMBC_NL;
laststart = 0;
return lasttok = CSET + charclass_index(ccl);
@@ -579,4 +669,6 @@
do
{
+ unsigned char ch = 0, c2h = 0;
+
/* Nobody ever said this had to be fast. :-)
Note that if we're looking at some other [:...:]
@@ -599,4 +691,8 @@
if (c == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
FETCH(c, "Unbalanced [");
+ if (ismbchar(c)) {
+ ch = (unsigned char)c;
+ FETCH(c, "Multi-byte char incomplete");
+ }
FETCH(c1, "Unbalanced [");
if (c1 == '-')
@@ -616,19 +712,83 @@
&& (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
FETCH(c2, "Unbalanced [");
+ if (ismbchar(c2)) {
+ c2h = (unsigned char)c2;
+ FETCH(c2, "Multi-byte char incomplete");
+ }
FETCH(c1, "Unbalanced [");
}
}
- else
+ else {
+ c2h = ch;
c2 = c;
- while (c <= c2)
- {
- setbit(c, ccl);
- if (case_fold)
- if (ISUPPER(c))
- setbit(tolower(c), ccl);
- else if (ISLOWER(c))
- setbit(toupper(c), ccl);
- ++c;
+ }
+ if (ch < c2h || (ch == c2h && c <= c2)) {
+ if (ch == 0) {
+ ch = (unsigned char)c2;
+ if (c2h > 0)
+ ch = NOTCHAR - 1;
+ for (; (unsigned char)c <= ch; c++) {
+ setbit(c, ccl);
+ if (case_fold) {
+ if (ISUPPER(c))
+ setbit(tolower(c), ccl);
+ else if (ISLOWER(c))
+ setbit(toupper(c), ccl);
+ }
+ }
+ ch = 0x80;
+ c = 0x00;
}
+ if (ch <= c2h) {
+ if (mbexttok < 0) {
+ mbexttok = MBEXTTOK_CLASS;
+ zeroset(mbcset_set);
+ zeroset(mbcset_all);
+ }
+ if (ch < c2h && c != 0x00) { /* $B:G=i$NH>C< */
+ int t;
+
+ if (ismbchar(ch)
+ && ((t = tstbit(ch, mbcset_set))
+ || !tstbit(ch, mbcset_all))) {
+ if (!t) {
+ setbit(ch, mbcset_set);
+ zeroset(mbcset[ch - 0x80]);
+ }
+ for (; c < NOTCHAR; c++)
+ setbit(c, mbcset[ch - 0x80]);
+ }
+ ch++;
+ c = 0x00;
+ }
+ if (ch < c2h || (ch == c2h && c == 0x00 && c2 == 0xff)) {
+ if (c == 0x00 && c2 == 0xff)
+ c2h++;
+ for (; ch < c2h; ch++)
+ if (ismbchar(ch)) {
+ clrbit(ch, mbcset_set);
+ setbit(ch, mbcset_all);
+ }
+ if (c == 0x00 && c2 == 0xff)
+ c2h--;
+ c = 0x00;
+ }
+ if (ch <= c2h) {
+ int t;
+
+ /* $B$3$3$G$OI,$: c <= c2 $B$H$J$C$F$$$k. */
+ if (ismbchar(ch)
+ && ((t = tstbit(ch, mbcset_set))
+ || !tstbit(ch, mbcset_all))) {
+ if (!t) {
+ setbit(ch, mbcset_set);
+ zeroset(mbcset[ch - 0x80]);
+ }
+ for (; c <= c2; c++)
+ setbit(c, mbcset[ch - 0x80]);
+ }
+ }
+ }
+ }
skip:
;
@@ -640,5 +800,20 @@
if (syntax_bits & RE_HAT_LISTS_NOT_NEWLINE)
clrbit('\n', ccl);
+ if (mbexttok == MBEXTTOK_CLASS) {
+ mbexttok = MBEXTTOK_INVCLASS;
+ if (!isemptyset(mbcset_set)) {
+ for (c = 0x80; c <= 0xff; c++)
+ if (tstbit(c, mbcset_set))
+ notset(mbcset[c - 0x80]);
+ }
+ notset(mbcset_all);
+ }
+ else
+ mbexttok = MBEXTTOK_ORMBC_NL;
}
+ if (current_mbctype != MBCTYPE_ASCII)
+ for (c = 0x80; c <= 0xff; c++)
+ if (ismbchar(c))
+ clrbit(c, ccl);
laststart = 0;
return lasttok = CSET + charclass_index(ccl);
@@ -647,4 +822,8 @@
normal_char:
laststart = 0;
+ if (ismbchar(c)) {
+ FETCH(mbexttok, "Multi-byte char incomplete");
+ return c;
+ }
if (case_fold && ISALPHA(c))
{
@@ -746,5 +925,67 @@
atom()
{
- if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
+ if (mbexttok >= 0) {
+ if (mbexttok < MBEXTTOK_NOTCHAR) {
+ addtok(tok);
+ addtok(mbexttok);
+ addtok(CAT);
+ }
+ else
+ switch (mbexttok) {
+ case MBEXTTOK_ORMBC:
+ case MBEXTTOK_ORMBC_NL:
+ addtok(tok);
+ if (mbexttok == MBEXTTOK_ORMBC) {
+ addtok(setcodeset(1));
+ addtok(setcodeset(2));
+ }
+ else {
+ addtok(setcodeset(5));
+ addtok(setcodeset(6));
+ }
+ addtok(CAT);
+ addtok(OR);
+ break;
+ case MBEXTTOK_CLASS:
+ case MBEXTTOK_INVCLASS:
+ {
+ token c;
+
+ addtok(tok);
+ if (!isemptyset(mbcset_set))
+ for (c = 0x80; c <= 0xff; c++)
+ if (tstbit(c, mbcset_set)) {
+ /* Make sure all bits in mbcset_all valid. */
+ clrbit(c, mbcset_all);
+ addtok(c);
+ if (mbexttok == MBEXTTOK_CLASS) {
+ clrbit('\n', mbcset[c - 0x80]);
+ clrbit('\0', mbcset[c - 0x80]);
+ }
+ else {
+ setbit('\n', mbcset[c - 0x80]);
+ setbit('\0', mbcset[c - 0x80]);
+ }
+ addtok(CSET + charclass_index(mbcset[c - 0x80]));
+ addtok(CAT);
+ addtok(OR);
+ }
+ if (!isemptyset(mbcset_all)) {
+ addtok(CSET + charclass_index(mbcset_all));
+ if (mbexttok == MBEXTTOK_CLASS)
+ addtok(setcodeset(2));
+ else
+ addtok(setcodeset(6));
+ addtok(CAT);
+ addtok(OR);
+ }
+ }
+ break;
+ default:
+ break;
+ }
+ tok = lex();
+ }
+ else if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
|| tok == BEGLINE || tok == ENDLINE || tok == BEGWORD
|| tok == ENDWORD || tok == LIMWORD || tok == NOTLIMWORD)
@@ -1904,4 +2145,6 @@
d->musts = 0;
+
+ bzero(cs_tok, sizeof cs_tok);
}
@@ -1916,8 +2159,8 @@
if (case_fold) /* dummy folding in service of dfamust() */
{
- char *copy;
+ char *copy, *p;
int i;
- copy = malloc(len);
+ p = copy = malloc(len + 7);
if (!copy)
dfaerror("out of memory");
@@ -1925,23 +2168,61 @@
/* This is a kludge. */
case_fold = 0;
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ *p++ = '^';
+ *p++ = '.';
+ *p++ = '*';
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = '(';
+ }
for (i = 0; i < len; ++i)
if (ISUPPER(s[i]))
- copy[i] = tolower(s[i]);
+ *p++ = tolower((unsigned char)s[i]);
else
- copy[i] = s[i];
+ *p++ = s[i];
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = ')';
+ }
dfainit(d);
- dfaparse(copy, len, d);
- free(copy);
+ dfaparse(copy, p - copy, d);
dfamust(d);
d->cindex = d->tindex = d->depth = d->nleaves = d->nregexps = 0;
+ bzero(cs_tok, sizeof cs_tok);
case_fold = 1;
- dfaparse(s, len, d);
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ bcopy(s, copy + (syntax_bits & RE_NO_BK_PARENS ? 4 : 5), len);
+ dfaparse(copy, p - copy, d);
+ }
+ else
+ dfaparse(s, len, d);
dfaanalyze(d, searchflag);
+ free(copy);
}
else
{
dfainit(d);
- dfaparse(s, len, d);
+ if (current_mbctype != MBCTYPE_ASCII && searchflag) {
+ char *copy, *p;
+
+ p = copy = malloc(len + 7);
+ *p++ = '^';
+ *p++ = '.';
+ *p++ = '*';
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = '(';
+ bcopy(s, p, len);
+ p += len;
+ if (!(syntax_bits & RE_NO_BK_PARENS))
+ *p++ = '\\';
+ *p++ = ')';
+ dfaparse(copy, p - copy, d);
+ free(copy);
+ }
+ else
+ dfaparse(s, len, d);
dfamust(d);
dfaanalyze(d, searchflag);
diff -ru2N grep-2.0/dfa.h grep+mb1.04/dfa.h
--- grep-2.0/dfa.h Mon Apr 12 06:17:22 1993
+++ grep+mb1.04/dfa.h Wed Jul 7 17:02:13 1993
@@ -15,4 +15,6 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 7, 1993 by t^2 */
/* Written June, 1988 by Mike Haertel */
@@ -306,5 +308,5 @@
/* dfasyntax() takes two arguments; the first sets the syntax bits described
earlier in this file, and the second sets the case-folding flag. */
-extern void dfasyntax(int, int);
+extern void dfasyntax(unsigned long, int);
/* Compile the given string of the given length into the given struct dfa.
diff -ru2N grep-2.0/getpagesize.h grep+mb1.04/getpagesize.h
--- grep-2.0/getpagesize.h Fri May 21 14:18:58 1993
+++ grep+mb1.04/getpagesize.h Sat Jul 10 02:19:10 1993
@@ -1,2 +1,4 @@
+/* Multi-byte extension added Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
#ifdef BSD
#ifndef BSD4_1
@@ -35,5 +37,9 @@
#endif /* no EXEC_PAGESIZE */
#else /* !HAVE_SYS_PARAM_H */
+#ifndef MSDOS
#define getpagesize() 8192 /* punt totally */
+#else
+#define getpagesize() 4096
+#endif
#endif /* !HAVE_SYS_PARAM_H */
#endif /* no _SC_PAGESIZE */
diff -ru2N grep-2.0/grep.c grep+mb1.04/grep.c
--- grep-2.0/grep.c Sun May 23 14:52:52 1993
+++ grep+mb1.04/grep.c Thu Jun 2 17:01:53 1994
@@ -17,4 +17,6 @@
Written July 1992 by Mike Haertel. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jun. 2, 1994 by t^2 */
#include <errno.h>
@@ -22,6 +24,8 @@
#ifndef errno
+#ifndef MSDOS
extern int errno;
#endif
+#endif
#ifdef STDC_HEADERS
@@ -59,4 +63,5 @@
#include "getpagesize.h"
#include "grep.h"
+#include "mbc.h"
#undef MAX
@@ -315,6 +320,6 @@
cc = read(bufdesc, buffer + bufsalloc, bufalloc - bufsalloc);
#endif
- if (cc > 0)
- buflim = buffer + bufsalloc + cc;
+ if (cc != -1)
+ buflim = buffer + bufsalloc + (unsigned)cc;
else
buflim = buffer + bufsalloc;
@@ -332,10 +337,10 @@
/* Internal variables to keep track of byte count, context, etc. */
-static size_t totalcc; /* Total character count before bufbeg. */
+static unsigned long totalcc; /* Total character count before bufbeg. */
static char *lastnl; /* Pointer after last newline counted. */
static char *lastout; /* Pointer after last character output;
NULL if no character has been output
or if it's conceptually before bufbeg. */
-static size_t totalnl; /* Total newline count before lastnl. */
+static unsigned long totalnl; /* Total newline count before lastnl. */
static int pending; /* Pending lines of output. */
@@ -363,5 +368,5 @@
{
nlscan(beg);
- printf("%d%c", ++totalnl, sep);
+ printf("%lu%c", ++totalnl, sep);
lastnl = lim;
}
@@ -519,5 +524,5 @@
for (;;)
{
- if (fillbuf(save) < 0)
+ if (fillbuf(save) == -1)
{
error(filename, errno);
@@ -564,8 +569,10 @@
}
-static char version[] = "GNU grep version 2.0";
+static char version[] = "GNU grep version 2.0\
+ + multi-byte extension 1.04";
#define USAGE \
- "usage: %s [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-[ef]] <expr> [<files...>]\n"
+ "usage: %s [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-W ctype=...]\n\
+ [-[ef]] <expr> [<files...>]\n"
static void
@@ -594,4 +601,32 @@
}
+#ifndef HAVE_STRCASECMP
+
+static int
+#ifdef __STDC__
+strcasecmp(const char *s1, const char *s2)
+#else
+strcasecmp(s1, s2)
+ char *s1, *s2;
+#endif
+{
+ unsigned char c1, c2;
+
+ while ((c1 = *s1++)) {
+ if ((unsigned char)(c1 - 'A') <= (unsigned char)('Z' - 'A'))
+ c1 += 'a' - 'A';
+ c2 = *s2++;
+ if ((unsigned char)(c2 - 'A') <= (unsigned char)('Z' - 'A'))
+ c2 += 'a' - 'A';
+ if (c1 != c2) {
+ --s2;
+ break;
+ }
+ }
+ return c1 - (unsigned char)*s2;
+}
+
+#endif
+
int
main(argc, argv)
@@ -607,7 +642,27 @@
extern int optind;
- prog = argv[0];
- if (prog && strrchr(prog, '/'))
- prog = strrchr(prog, '/') + 1;
+ if ((prog = argv[0]) && prog[0]) {
+ char c, *p;
+#ifdef MSDOS
+ static char progname[8 + 1];
+#endif
+
+ for (p = prog; (c = *p++); )
+ if (c == '/'
+#ifdef MSDOS
+ || c == '\\' || c == ':'
+#endif
+ )
+ prog = p;
+#ifdef MSDOS
+ for (p = progname; p < &progname[8] && (c = *prog++) && c != '.'; ) {
+ if ((unsigned char)(c - 'A') <= (unsigned char)('Z' - 'A'))
+ c += 'a' - 'A';
+ *p++ = c;
+ }
+ *p++ = '\0';
+ prog = argv[0] = progname;
+#endif
+ }
keys = NULL;
@@ -620,5 +675,5 @@
matcher = NULL;
- while ((opt = getopt(argc, argv, "0123456789A:B:CEFGVX:bce:f:hiLlnqsvwxy"))
+ while ((opt = getopt(argc, argv, "0123456789A:B:CEFGVX:bce:f:hiLlnqsvwxyW:"))
!= EOF)
switch (opt)
@@ -747,4 +802,19 @@
case 'x':
match_lines = 1;
+ break;
+ case 'W':
+ if (strcasecmp(optarg, "ctype=ASCII") == 0) {
+ mbcinit(MBCTYPE_ASCII);
+ break;
+ }
+ if (strcasecmp(optarg, "ctype=EUC") == 0) {
+ mbcinit(MBCTYPE_EUC);
+ break;
+ }
+ if (strcasecmp(optarg, "ctype=SJIS") == 0) {
+ mbcinit(MBCTYPE_SJIS);
+ break;
+ }
+ fatal("unknown argument to -Wctype", 0);
break;
default:
diff -ru2N grep-2.0/kwset.c grep+mb1.04/kwset.c
--- grep-2.0/kwset.c Mon May 3 04:26:20 1993
+++ grep+mb1.04/kwset.c Fri Jul 9 14:54:46 1993
@@ -19,4 +19,6 @@
The author may be reached (Email) at the address mike@ai.mit.edu,
or (US mail) as Mike Haertel c/o Free Software Foundation. */
+/* Multi-byte extension added Jul, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
/* The algorithm implemented by these routines bears a startling resemblence
@@ -592,5 +594,5 @@
if (d != 0)
continue;
- if (tp[-2] == gc)
+ if (U(tp[-2]) == gc)
{
for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
diff -ru2N grep-2.0/mbc.c grep+mb1.04/mbc.c
--- grep-2.0/mbc.c Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/mbc.c Fri Jul 9 14:38:28 1993
@@ -0,0 +1,98 @@
+/* Functions for multi-byte support.
+ Created for grep multi-byte extension Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
+#include "mbc.h"
+
+static const unsigned char mbctab_ascii[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+static const unsigned char mbctab_euc[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
+};
+
+static const unsigned char mbctab_sjis[] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
+};
+
+#ifdef EUC
+const unsigned char *mbctab = mbctab_euc;
+int current_mbctype = MBCTYPE_EUC;
+#else
+#ifdef SJIS
+const unsigned char *mbctab = mbctab_sjis;
+int current_mbctype = MBCTYPE_SJIS;
+#else
+const unsigned char *mbctab = mbctab_ascii;
+int current_mbctype = MBCTYPE_ASCII;
+#endif
+#endif
+
+void
+#ifdef __STDC__
+mbcinit(int mbctype)
+#else
+mbcinit(mbctype)
+ int mbctype;
+#endif
+{
+ switch (mbctype) {
+ case MBCTYPE_ASCII:
+ mbctab = mbctab_ascii;
+ current_mbctype = MBCTYPE_ASCII;
+ break;
+ case MBCTYPE_EUC:
+ mbctab = mbctab_euc;
+ current_mbctype = MBCTYPE_EUC;
+ break;
+ case MBCTYPE_SJIS:
+ mbctab = mbctab_sjis;
+ current_mbctype = MBCTYPE_SJIS;
+ break;
+ }
+}
diff -ru2N grep-2.0/mbc.h grep+mb1.04/mbc.h
--- grep-2.0/mbc.h Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/mbc.h Fri Jul 9 14:40:03 1993
@@ -0,0 +1,38 @@
+#ifndef MBC_H
+#define MBC_H 1
+/* Definitions for multi-byte support.
+ Created for grep multi-byte extension Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
+
+#ifndef const
+#ifndef __STDC__
+#ifdef __GNUC__
+#define const __const__
+#define volatile __volatile__
+#else
+#define const
+#define volatile
+#endif
+#endif
+#endif
+
+#ifndef _
+#ifdef __STDC__
+#define _(x) x
+#else
+#define _(x) ()
+#endif
+#endif
+
+#define MBCTYPE_ASCII 0
+#define MBCTYPE_EUC 1
+#define MBCTYPE_SJIS 2
+
+extern const unsigned char *mbctab;
+extern int current_mbctype;
+
+void mbcinit _((int));
+
+#define ismbchar(c) mbctab[(unsigned char)c]
+
+#endif /* !MBC_H */
diff -ru2N grep-2.0/obstack.h grep+mb1.04/obstack.h
--- grep-2.0/obstack.h Sat May 22 11:55:23 1993
+++ grep+mb1.04/obstack.h Sat Jul 10 04:47:06 1993
@@ -15,4 +15,6 @@
along with this program; if not, write to the Free Software
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
/* Summary:
@@ -136,4 +138,5 @@
#endif
+#ifndef PTR_INT_TYPE
#ifdef __STDC__
#define PTR_INT_TYPE ptrdiff_t
@@ -141,4 +144,5 @@
#define PTR_INT_TYPE long
#endif
+#endif
struct _obstack_chunk /* Lives at front of each chunk. */
@@ -151,5 +155,5 @@
struct obstack /* control current object in current chunk */
{
- long chunk_size; /* preferred size to allocate chunks in */
+ unsigned chunk_size; /* preferred size to allocate chunks in */
struct _obstack_chunk* chunk; /* address of current struct obstack_chunk */
char *object_base; /* address of object we are building */
diff -ru2N grep-2.0/regex.c grep+mb1.04/regex.c
--- grep-2.0/regex.c Fri May 21 14:11:40 1993
+++ grep+mb1.04/regex.c Thu Aug 19 04:37:03 1993
@@ -19,4 +19,6 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Aug. 19, 1993 by t^2 */
/* AIX requires this to be the first thing in the file. */
@@ -54,6 +56,33 @@
#define bcmp(s1, s2, n) memcmp ((s1), (s2), (n))
#endif
+#ifdef HAVE_MEMMOVE
#ifndef bcopy
-#define bcopy(s, d, n) memcpy ((d), (s), (n))
+#define bcopy(s, d, n) memmove ((d), (s), (n))
+#endif
+#else
+#ifndef HAVE_BCOPY
+static void
+#ifdef __STDC__
+bcopy (const void *s0, void *d0, size_t n)
+#else
+bcopy (s, d, n)
+ const void *s0;
+ void *d;
+ size_t n;
+#endif
+{
+ const char *s = s0;
+ char *d = d0;
+
+ if (s < d) {
+ s += n, d += n;
+ while (n--)
+ *--d = *--s;
+ }
+ else
+ while (n--)
+ *d++ = *s++;
+}
+#endif
#endif
#ifndef bzero
@@ -124,4 +153,5 @@
/* Get the interface, including the syntax bits. */
#include "regex.h"
+#include "mbc.h"
/* isalpha etc. are used for the character classes. */
@@ -462,4 +492,19 @@
#endif /* DEBUG */
+
+#define STORE_MBC(p, c) \
+ ((p)[0] = (unsigned char) ((c) >> 8), (p)[1] = (unsigned char) (c))
+#define STORE_MBC_AND_INCR(p, c) \
+ (*(p)++ = (unsigned char) ((c) >> 8), *(p)++ = (unsigned char) (c))
+
+#define EXTRACT_MBC(p) \
+ ((unsigned char) (p)[0] << 8 | (unsigned char) (p)[1])
+#define EXTRACT_MBC_AND_INCR(p) \
+ ((p) += 2, (unsigned char) (p)[-2] << 8 | (unsigned char) (p)[-1])
+
+#define EXTRACT_UNSIGNED(p) \
+ ((unsigned char) (p)[0] | (unsigned char) (p)[1] << 8)
+#define EXTRACT_UNSIGNED_AND_INCR(p) \
+ ((p) += 2, (unsigned char) (p)[-2] | (unsigned char) (p)[-1] << 8)
/* If DEBUG is defined, Regex prints many voluminous messages about what
@@ -558,4 +603,8 @@
{
putchar ('/');
+ if (ismbchar (*p) && 2 <= mcnt) {
+ printf ("/%.2s", (char *) p), p += 2, --mcnt;
+ continue;
+ }
printchar (*p++);
}
@@ -618,7 +667,14 @@
printchar (last);
- putchar (']');
-
p += 1 + *p;
+ {
+ unsigned short i, size;
+
+ size = EXTRACT_UNSIGNED_AND_INCR (p);
+ for (i = 0; i < size; i++)
+ printf ("%.2s-%.2s", (char *) p, (char *) p + 2),
+ p += 4;
+ }
+ putchar (']');
}
break;
@@ -779,5 +835,5 @@
printf ("not_bol: %d\t", bufp->not_bol);
printf ("not_eol: %d\t", bufp->not_eol);
- printf ("syntax: %d\n", bufp->syntax);
+ printf ("syntax: %lu\n", bufp->syntax);
/* Perhaps we should print the translate table? */
}
@@ -878,5 +934,7 @@
static boolean at_begline_loc_p (), at_endline_loc_p ();
static boolean group_in_compile_stack ();
+#if 0
static reg_errcode_t compile_range ();
+#endif
/* Fetch the next character in the uncompiled pattern---translating it
@@ -887,5 +945,6 @@
do {if (p == pend) return REG_EEND; \
c = (unsigned char) *p++; \
- if (translate) c = translate[c]; \
+ if (translate && !ismbchar (c)) \
+ c = (unsigned char) translate[(unsigned char) c]; \
} while (0)
@@ -905,5 +964,7 @@
`char *', to avoid warnings when a string constant is passed. But
when we use a character as a subscript we must make it unsigned. */
-#define TRANSLATE(d) (translate ? translate[(unsigned char) (d)] : (d))
+#define TRANSLATE(d) (translate \
+ ? (unsigned char) translate[(unsigned char) (d)] \
+ : (d))
@@ -1075,4 +1136,159 @@
|| STREQ (string, "cntrl") || STREQ (string, "blank"))
+/* Handle charset(_not)?.
+
+ Structure of charset(_not)? in compiled pattern.
+
+ struct {
+ unsinged char id; charset(_not)?
+ unsigned char sbc_size;
+ unsigned char sbc_map[sbc_size]; same as original up to here.
+ unsigned short mbc_size; number of intervals.
+ struct {
+ unsigned short beg; beginning of interval.
+ unsigned short end; end of interval.
+ } intervals[mbc_size];
+ }; */
+
+static reg_errcode_t
+#ifdef __STDC__
+set_list_bits (unsigned short c1, unsigned short c2,
+ reg_syntax_t syntax, unsigned char *b, const char *translate)
+#else
+set_list_bits (c1, c2, syntax, b, translate)
+ unsigned short c1, c2;
+ reg_syntax_t syntax;
+ unsigned char *b;
+ const char *translate;
+#endif
+{
+ unsigned char sbc_size = b[-1];
+ unsigned short mbc_size = EXTRACT_UNSIGNED (&b[sbc_size]);
+ unsigned short beg, end, upb;
+
+ if (c1 > c2)
+ return syntax & RE_NO_EMPTY_RANGES ? REG_ERANGE : REG_NOERROR;
+ if (c1 < 1 << BYTEWIDTH) {
+ upb = c2;
+ if (1 << BYTEWIDTH <= upb)
+ upb = (1 << BYTEWIDTH) - 1; /* The last single-byte char */
+ if (sbc_size <= upb / BYTEWIDTH) {
+ /* Allocate maximum size so it never happens again. */
+ /* NOTE: memcpy() would not work here. */
+ bcopy (&b[sbc_size], &b[(1 << BYTEWIDTH) / BYTEWIDTH], 2 + mbc_size*4);
+ bzero (&b[sbc_size], (1 << BYTEWIDTH) / BYTEWIDTH - sbc_size);
+ b[-1] = sbc_size = (1 << BYTEWIDTH) / BYTEWIDTH;
+ }
+ if (!translate) {
+ for (; c1 <= upb; c1++)
+ if (!ismbchar (c1))
+ SET_LIST_BIT (c1);
+ }
+ else
+ for (; c1 <= upb; c1++)
+ if (!ismbchar (c1))
+ SET_LIST_BIT (TRANSLATE (c1));
+ if (c2 < 1 << BYTEWIDTH)
+ return REG_NOERROR;
+ c1 = 0x8000; /* The first wide char */
+ }
+ b = &b[sbc_size + 2];
+
+ /* intervals[beg]
+ $B!|----------$B!| $B!|----------$B!|
+ c1
+ $B!{----------------------$B!|
+
+ $B>e?^$N$h$&$J6h4V$N%$%s%G%C%/%9 beg $B$r7hDj$9$k. */
+ for (beg = 0, upb = mbc_size; beg < upb; ) {
+ unsigned short mid = (beg + upb) >> 1;
+
+ if (c1 - 1 > EXTRACT_MBC (&b[mid*4 + 2]))
+ beg = mid + 1;
+ else
+ upb = mid;
+ }
+
+ /* intervals[end]
+ $B!|-------$B!| $B!|----------$B!|
+ c2
+ $B!|---------------$B!{
+
+ $B>e?^$N$h$&$J6h4V$N%$%s%G%C%/%9 end $B$r7hDj$9$k. */
+ for (end = beg, upb = mbc_size; end < upb; ) {
+ unsigned short mid = (end + upb) >> 1;
+
+ if (c2 >= EXTRACT_MBC (&b[mid*4]) - 1)
+ end = mid + 1;
+ else
+ upb = mid;
+ }
+
+ if (beg != end) {
+ /* $B4{B8$N6h4V$r>/$J$/$H$b1$B$DE}9g$9$k>l9g,
+ $B6h4V$N;OE@, $B=*E@$r=$@5$9$k. */
+ if (c1 > EXTRACT_MBC (&b[beg*4]))
+ c1 = EXTRACT_MBC (&b[beg*4]);
+ if (c2 < EXTRACT_MBC (&b[end*4 - 2]))
+ c2 = EXTRACT_MBC (&b[end*4 - 2]);
+ }
+ if (end < mbc_size && end != beg + 1)
+ /* $BDI2C$5$l$k6h4V$N8e$m$K4{B8$N6h4V$r0\F0$9$k. */
+ /* NOTE: memcpy() would not work here. */
+ bcopy (&b[end*4], &b[(beg + 1)*4], (mbc_size - end)*4);
+ STORE_MBC (&b[beg*4 + 0], c1);
+ STORE_MBC (&b[beg*4 + 2], c2);
+ mbc_size += beg + 1 - end;
+ STORE_NUMBER (&b[-2], mbc_size);
+ return REG_NOERROR;
+}
+
+static int
+#ifdef __STDC__
+is_in_list (unsigned short c, const unsigned char *b)
+#else
+is_in_list (c, b)
+ unsigned short c;
+ const unsigned char *b;
+#endif
+{
+ unsigned short size;
+ int in = (re_opcode_t) b[-1] == charset_not;
+
+ size = *b++;
+ if (c < 1 << BYTEWIDTH) {
+ if (c / BYTEWIDTH < size && b[c / BYTEWIDTH] & 1 << c % BYTEWIDTH)
+ in = !in;
+ }
+ else {
+ unsigned short i, j;
+
+ b += size + 2;
+ size = EXTRACT_UNSIGNED (&b[-2]);
+
+ /* intervals[i]
+ $B!|-------$B!| $B!|--------$B!|
+ c
+ $B!{----------------$B!|
+
+ $B>e?^$N$h$&$J6h4V$N%$%s%G%C%/%9 i $B$r7hDj$9$k. */
+ for (i = 0, j = size; i < j; ) {
+ unsigned short k = (i + j) >> 1;
+
+ if (c > EXTRACT_MBC (&b[k*4 + 2]))
+ i = k + 1;
+ else
+ j = k;
+ }
+ if (i < size && EXTRACT_MBC (&b[i*4]) <= c
+ /* [...] $B$+$i, $BL58z$J%^%k%A%P%$%HJ8;z$r=|30$9$k. $B$3$3$G$O4JC1$N
+ $B$?$a#2%P%$%HL\$, '\n' $B$^$?$O '\0' $B$@$1$rL58z$H$7$?. [^...]
+ $B$N>l9g$O, $B5U$KL58z$J%^%k%A%P%$%HJ8;z$r%^%C%A$5$;$k. */
+ && ((unsigned char) c != '\n' && (unsigned char) c != '\0'))
+ in = !in;
+ }
+ return in;
+}
+
/* `regex_compile' compiles PATTERN (of length SIZE) according to SYNTAX.
Returns one of error codes defined in `regex.h', or zero for success.
@@ -1385,4 +1601,6 @@
{
boolean had_char_class = false;
+ unsigned short c, c1;
+ int last_char = -1;
if (p == pend) return REG_EBRACK;
@@ -1390,5 +1608,6 @@
/* Ensure that we have enough space to push a charset: the
opcode, the length count, and the bitset; 34 bytes in all. */
- GET_BUFFER_SPACE (34);
+ /* + 2 + 4 for mbcharset(_not)? with just one interval. */
+ GET_BUFFER_SPACE (34 + 2 + 4);
laststart = b;
@@ -1407,5 +1626,5 @@
/* Clear the whole map. */
- bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH);
+ bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH + 2);
/* charset_not matches newline according to a syntax bit. */
@@ -1417,7 +1636,14 @@
for (;;)
{
+ int size;
+
if (p == pend) return REG_EBRACK;
- PATFETCH (c);
+ if ((size = EXTRACT_UNSIGNED (&b[(1 << BYTEWIDTH) / BYTEWIDTH])))
+ /* Ensure the space is enough to hold another interval
+ of multi-byte chars in charset(_not)?. */
+ GET_BUFFER_SPACE (32 + 2 + size*4 + 4);
+
+ PATFETCH_RAW (c);
/* \ might escape characters inside [...] and [^...]. */
@@ -1426,6 +1652,16 @@
if (p == pend) return REG_EESCAPE;
- PATFETCH (c1);
- SET_LIST_BIT (c1);
+ PATFETCH_RAW (c1);
+ if (ismbchar (c1)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c1 = c1 << 8 | c2;
+ (void) set_list_bits (c1, c1, syntax, b, translate);
+ last_char = c1;
+ continue;
+ }
+ SET_LIST_BIT (TRANSLATE (c1));
+ last_char = c1;
continue;
}
@@ -1442,4 +1678,11 @@
return REG_ERANGE;
+ if (ismbchar (c)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c = c << 8 | c2;
+ }
+
/* Look ahead to see if it's a range when the last thing
was a character: if this is a hyphen not at the
@@ -1447,10 +1690,25 @@
operator. */
if (c == '-'
+#if 0 /* The original was: */
&& !(p - 2 >= pattern && p[-2] == '[')
&& !(p - 3 >= pattern && p[-3] == '[' && p[-2] == '^')
+#else /* I wonder why he did not write like this.
+ Have we got any problems? */
+ && p != p1 + 1
+#endif
&& *p != ']')
{
- reg_errcode_t ret
- = compile_range (&p, pend, translate, syntax, b);
+ reg_errcode_t ret;
+
+ assert (last_char >= 0);
+ PATFETCH_RAW (c1);
+ if (ismbchar (c1)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c1 = c1 << 8 | c2;
+ }
+ ret = set_list_bits (last_char, c1, syntax, b, translate);
+ last_char = c1;
if (ret != REG_NOERROR) return ret;
}
@@ -1461,7 +1719,15 @@
/* Move past the `-'. */
- PATFETCH (c1);
-
- ret = compile_range (&p, pend, translate, syntax, b);
+ PATFETCH_RAW (c1);
+
+ PATFETCH_RAW (c1);
+ if (ismbchar (c1)) {
+ unsigned char c2;
+
+ PATFETCH_RAW (c2);
+ c1 = c1 << 8 | c2;
+ }
+ ret = set_list_bits (c, c1, syntax, b, translate);
+ last_char = c1;
if (ret != REG_NOERROR) return ret;
}
@@ -1474,5 +1740,5 @@
char str[CHAR_CLASS_MAX_LENGTH + 1];
- PATFETCH (c);
+ PATFETCH_RAW (c);
c1 = 0;
@@ -1534,4 +1800,7 @@
}
had_char_class = true;
+#ifdef DEBUG
+ last_char = -1;
+#endif
}
else
@@ -1540,7 +1809,13 @@
while (c1--)
PATUNFETCH;
+#if 0 /* The original was: */
SET_LIST_BIT ('[');
SET_LIST_BIT (':');
+#else /* I think this is the right way. */
+ SET_LIST_BIT (TRANSLATE ('['));
+ SET_LIST_BIT (TRANSLATE (':'));
+#endif
had_char_class = false;
+ last_char = ':';
}
}
@@ -1548,5 +1823,6 @@
{
had_char_class = false;
- SET_LIST_BIT (c);
+ (void) set_list_bits (c, c, syntax, b, translate);
+ last_char = c;
}
}
@@ -1556,5 +1832,9 @@
while ((int) b[-1] > 0 && b[b[-1] - 1] == 0)
b[-1]--;
- b += b[-1];
+ if (b[-1] != (1 << BYTEWIDTH) / BYTEWIDTH)
+ bcopy (&b[(1 << BYTEWIDTH) / BYTEWIDTH], &b[b[-1]],
+ 2 + EXTRACT_UNSIGNED (&b[(1 << BYTEWIDTH) / BYTEWIDTH])*4);
+ b += b[-1] + 2 + EXTRACT_UNSIGNED (&b[b[-1]])*4;
+ break;
}
break;
@@ -2023,5 +2303,6 @@
not to translate; but if we don't translate it
it will never match anything. */
- c = TRANSLATE (c);
+ if (!ismbchar (c))
+ c = TRANSLATE (c);
goto normal_char;
}
@@ -2032,4 +2313,11 @@
/* Expects the character in `c'. */
normal_char:
+
+ c1 = 0;
+ if (ismbchar (c)) {
+ c1 = c;
+ PATFETCH_RAW (c);
+ }
+
/* If no exactn currently being built. */
if (!pending_exact
@@ -2039,5 +2327,6 @@
/* We have only one byte following the exactn for the count. */
- || *pending_exact == (1 << BYTEWIDTH) - 1
+ || *pending_exact >= (c1 ? (1 << BYTEWIDTH) - 2
+ : (1 << BYTEWIDTH) - 1)
/* If followed by a repetition operator. */
@@ -2059,4 +2348,8 @@
}
+ if (c1) {
+ BUF_PUSH (c1);
+ (*pending_exact)++;
+ }
BUF_PUSH (c);
(*pending_exact)++;
@@ -2184,5 +2477,5 @@
at_endline_loc_p (p, pend, syntax)
const char *p, *pend;
- int syntax;
+ reg_syntax_t syntax;
{
const char *next = p;
@@ -2220,4 +2513,5 @@
+#if 0 /* We use set_list_bits() now. */
/* Read the ending character of a range (in a bracket expression) from the
uncompiled pattern *P_PTR (which ends at PEND). We assume the
@@ -2275,4 +2569,5 @@
return REG_NOERROR;
}
+#endif
/* Failure stack declarations and macros; both re_compile_fastmap and
@@ -2638,18 +2933,65 @@
case charset:
+ /* NOTE: Charset for single-byte chars never contain
+ multi-byte char. See set_list_bits(). */
for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
fastmap[j] = 1;
+ {
+ unsigned short size;
+ unsigned char c, end;
+
+ p += p[-1] + 2;
+ size = EXTRACT_UNSIGNED (&p[-2]);
+ for (j = 0; j < size; j++)
+ /* set bits for 1st bytes of multi-byte chars. */
+ for (c = (unsigned char) p[j*4],
+ end = (unsigned char) p[j*4 + 2];
+ c <= end; c++)
+ /* NOTE: Charset for multi-byte chars might contain
+ single-byte chars. We must reject them. */
+ if (ismbchar (c))
+ fastmap[c] = 1;
+ }
break;
case charset_not:
+ /* S: set of all single-byte chars.
+ M: set of all first bytes that can start multi-byte chars.
+ s: any set of single-byte chars.
+ m: any set of first bytes that can start multi-byte chars.
+
+ We assume S+M = U.
+ ___ _ _
+ s+m = (S*s+M*m). */
/* Chars beyond end of map must be allowed. */
+ /* NOTE: Charset_not for single-byte chars might contain
+ multi-byte chars. See set_list_bits(). */
for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
- fastmap[j] = 1;
+ if (!ismbchar (j))
+ fastmap[j] = 1;
for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
- fastmap[j] = 1;
+ if (!ismbchar (j))
+ fastmap[j] = 1;
+ {
+ unsigned short size;
+ unsigned short c, beg;
+
+ p += p[-1] + 2;
+ size = EXTRACT_UNSIGNED (&p[-2]);
+ c = 0x00;
+ for (j = 0; j < size; j++) {
+ for (beg = (unsigned char) p[j*4 + 0]; c <= beg; c++)
+ if (ismbchar (c))
+ fastmap[c] = 1;
+ c = (unsigned char) p[j*4 + 2];
+ }
+ for (beg = 0xff; c <= beg; c++)
+ if (ismbchar (c))
+ fastmap[c] = 1;
+ }
break;
@@ -2964,4 +3306,5 @@
register int lim = 0;
int irange = range;
+ unsigned char c;
if (startpos < size1 && startpos + range >= size1)
@@ -2973,11 +3316,23 @@
inside the loop. */
if (translate)
- while (range > lim
- && !fastmap[(unsigned char)
- translate[(unsigned char) *d++]])
+ while (range > lim) {
+ c = *d++;
+ if (ismbchar (c)) {
+ if (fastmap[c])
+ break;
+ d++;
+ range -= 2;
+ continue;
+ }
+ if (fastmap[(unsigned char) translate[c]])
+ break;
range--;
+ }
else
- while (range > lim && !fastmap[(unsigned char) *d++])
+ while (range > lim && (c = *d++, !fastmap[c])) {
+ if (ismbchar (c))
+ d++, range--;
range--;
+ }
startpos += irange - range;
@@ -3012,11 +3367,34 @@
else if (range > 0)
{
- range--;
- startpos++;
+ const char *d = ((startpos >= size1 ? string2 - size1 : string1)
+ + startpos);
+
+ if (ismbchar (*d)) {
+ range--, startpos++;
+ if (!range)
+ break;
+ }
+ range--, startpos++;
}
else
{
- range++;
- startpos--;
+ range++, startpos--;
+ {
+ const char *s, *d, *p;
+
+ if (startpos < size1)
+ s = string1, d = string1 + startpos;
+ else
+ s = string2, d = string2 + startpos - size1;
+ for (p = d; p-- > s && ismbchar(*p); )
+ /* --p >= s $B$@$H 80[12]?86 $B$GF0$+$J$$2DG=@-$,$"$k. (huge
+ model $B0J30$G, s $B$N%*%U%;%C%H$, 0 $B$@$C$?>l9g.) */
+ ;
+ if (!((d - p) & 1)) {
+ if (!range)
+ break;
+ range++, startpos--;
+ }
+ }
}
}
@@ -3578,6 +3956,19 @@
do
{
+ unsigned char c;
+
PREFETCH ();
- if (translate[(unsigned char) *d++] != (char) *p++)
+ c = *d++;
+ if (ismbchar (c)) {
+ if (c != (unsigned char) *p++
+ || !--mcnt /* $B%Q%?!<%s$,@5$7$/%3%s%Q%$%k$5
+ $B$l$F$$$k8B$j, $B$3$N%A%'%C%/$O
+ $B>iD9$@$,G0$N$?$a. */
+ || d == dend
+ || (unsigned char) *d++ != (unsigned char) *p++)
+ goto fail;
+ continue;
+ }
+ if ((unsigned char) translate[c] != (unsigned char) *p++)
goto fail;
}
@@ -3588,6 +3979,26 @@
do
{
+#if 0
+ /* $BB>$NItJ,$G$O, string1 $B$H string2 $B$K%^%k%A%P%$%HJ8;z
+ $B$,8Y$k$N$r5v$7$F$$$J$$. $B$3$N$3$H$rB.EY$r5>@7$K$7$F
+ $B$b%A%'%C%/$9$k>l9g$O, $B$3$3$H<!$N `#if 0' $B$r `#if 1'
+ $B$KJQ$($k$3$H. */
+ unsigned char c;
+
+#endif
PREFETCH ();
+#if 0
+ c = *d++;
+ if (ismbchar (c)) {
+ if (c != (unsigned char) *p++
+ || !--mcnt
+ || d == dend)
+ goto fail;
+ c = *d++;
+ }
+ if (c != (unsigned char) *p++) goto fail;
+#else
if (*d++ != (char) *p++) goto fail;
+#endif
}
while (--mcnt);
@@ -3602,4 +4013,14 @@
PREFETCH ();
+ if (ismbchar (*d)) {
+ if (d + 1 == dend || d[1] == '\n' || d[1] == '\0')
+ /* $BL58z$J%^%k%A%P%$%HJ8;z$K$O%^%C%A$5$;$J$$. $B$3$3$G$O, $B4J
+ $BC1$N$?$a#2%P%$%HL\$, '\n', '\0' $B$N$b$N$@$1$rL58z$H$9$k. */
+ goto fail;
+ SET_REGS_MATCHED ();
+ DEBUG_PRINT2 (" Matched `%d'.\n", EXTRACT_MBC (&d[0]));
+ d += 2;
+ break;
+ }
if ((!(bufp->syntax & RE_DOT_NEWLINE) && TRANSLATE (*d) == '\n')
@@ -3616,19 +4037,23 @@
case charset_not:
{
- register unsigned char c;
- boolean not = (re_opcode_t) *(p - 1) == charset_not;
+ register unsigned short c;
+ boolean not;
- DEBUG_PRINT2 ("EXECUTING charset%s.\n", not ? "_not" : "");
+ DEBUG_PRINT2 ("EXECUTING charset%s.\n",
+ (re_opcode_t) *(p - 1) == charset_not ? "_not" : "");
PREFETCH ();
- c = TRANSLATE (*d); /* The character to match. */
+ c = (unsigned char) *d;
+ if (ismbchar (c)) {
+ c <<= 8;
+ if (d + 1 != dend)
+ c |= (unsigned char) d[1];
+ }
+ else
+ c = TRANSLATE (c); /* The character to match. */
- /* Cast to `unsigned' instead of `unsigned char' in case the
- bit list is a full 32 bytes long. */
- if (c < (unsigned) (*p * BYTEWIDTH)
- && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
- not = !not;
+ not = is_in_list (c, p);
- p += 1 + *p;
+ p += 1 + *p + 2 + EXTRACT_UNSIGNED (&p[1 + *p])*4;
if (!not) goto fail;
@@ -3636,4 +4061,6 @@
SET_REGS_MATCHED ();
d++;
+ if (d != dend && c >= 1 << BYTEWIDTH)
+ d++;
break;
}
@@ -3801,5 +4228,5 @@
/* xx why this test? */
- if ((int) old_regend[r] >= (int) regstart[r])
+ if (old_regend[r] >= regstart[r])
regend[r] = old_regend[r];
}
@@ -4052,5 +4479,5 @@
|| (bufp->newline_anchor && (re_opcode_t) *p2 == endline))
{
- register unsigned char c
+ register unsigned short c
= *p2 == (unsigned char) endline ? '\n' : p2[2];
p1 = p + mcnt;
@@ -4069,13 +4496,10 @@
|| (re_opcode_t) p1[3] == charset_not)
{
- int not = (re_opcode_t) p1[3] == charset_not;
-
- if (c < (unsigned char) (p1[4] * BYTEWIDTH)
- && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
- not = !not;
+ if (ismbchar (c))
+ c = c << 8 | p2[3];
- /* `not' is equal to 1 if c would match, which means
+ /* `is_in_list()' is TRUE if c would match, which means
that we can't change to pop_failure_jump. */
- if (!not)
+ if (!is_in_list (c, p1 + 4))
{
p[-3] = (unsigned char) pop_failure_jump;
@@ -4632,8 +5056,15 @@
char *translate;
{
- register unsigned char *p1 = s1, *p2 = s2;
+ register unsigned char *p1 = s1, *p2 = s2, c;
while (len)
{
- if (translate[*p1++] != translate[*p2++]) return 1;
+ c = *p1++;
+ if (ismbchar(c)) {
+ if (c != *p2++ || !--len || *p1++ != *p2++)
+ return 1;
+ }
+ else
+ if (translate[c] != translate[*p2++])
+ return 1;
len--;
}
@@ -4778,5 +5209,5 @@
{
reg_errcode_t ret;
- unsigned syntax
+ reg_syntax_t syntax
= (cflags & REG_EXTENDED) ?
RE_SYNTAX_POSIX_EXTENDED : RE_SYNTAX_POSIX_BASIC;
diff -ru2N grep-2.0/regex.h grep+mb1.04/regex.h
--- grep-2.0/regex.h Fri May 21 14:11:43 1993
+++ grep+mb1.04/regex.h Sat Jul 10 04:38:03 1993
@@ -17,4 +17,6 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* Multi-byte extension added May, 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 10, 1993 by t^2 */
#ifndef __REGEXP_LIBRARY_H__
@@ -36,9 +38,9 @@
the definitions shifted by one from the previous bit; thus, when we
add or remove a bit, only one other definition need change. */
-typedef unsigned reg_syntax_t;
+typedef unsigned long reg_syntax_t;
/* If this bit is not set, then \ inside a bracket expression is literal.
If set, then such a \ quotes the following character. */
-#define RE_BACKSLASH_ESCAPE_IN_LISTS (1)
+#define RE_BACKSLASH_ESCAPE_IN_LISTS ((unsigned long)1)
/* If this bit is not set, then + and ? are operators, and \+ and \? are
@@ -206,5 +208,5 @@
#undef RE_DUP_MAX
#endif
-#define RE_DUP_MAX ((1 << 15) - 1)
+#define RE_DUP_MAX ((int)(((unsigned)1 << 15) - 1))
@@ -397,4 +399,10 @@
#define _RE_ARGS(args) ()
+
+#ifdef __GNUC__
+#define const __const__
+#else
+#define const
+#endif
#endif /* not __STDC__ */
diff -ru2N grep-2.0/search.c grep+mb1.04/search.c
--- grep-2.0/search.c Mon May 3 06:02:00 1993
+++ grep+mb1.04/search.c Fri Jul 9 14:55:21 1993
@@ -17,4 +17,6 @@
Written August 1992 by Mike Haertel. */
+/* Multi-byte extension added Jul., 1993 by t^2 (Takahiro Tanimoto)
+ Last change: Jul. 9, 1993 by t^2 */
#include <ctype.h>
@@ -61,4 +63,5 @@
#include "kwset.h"
#include "regex.h"
+#include "mbc.h"
#define NCHAR (UCHAR_MAX + 1)
@@ -434,8 +437,9 @@
char **endp;
{
- register char *beg, *try, *end;
+ register char *beg, *try, *end, *p, *lim;
register size_t len;
struct kwsmatch kwsmatch;
+ lim = buf;
for (beg = buf; beg <= buf + size; ++beg)
{
@@ -456,4 +460,8 @@
if (try > buf && WCHAR((unsigned char) try[-1]))
break;
+ for (p = try; p-- > lim && ismbchar(*p); )
+ ;
+ if (!((try - p) & 1))
+ break;
if (try + len < buf + size && WCHAR((unsigned char) try[len]))
{
@@ -464,6 +472,12 @@
goto success;
}
- else
- goto success;
+ else {
+ for (p = beg; p-- > lim && ismbchar(*p); )
+ ;
+ if ((beg - p) & 1)
+ goto success;
+ if (lim + 1 < beg)
+ lim = beg - 1;
+ }
}
diff -ru2N grep-2.0/tests/batgen.awk grep+mb1.04/tests/batgen.awk
--- grep-2.0/tests/batgen.awk Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/tests/batgen.awk Sat Jul 10 02:10:24 1993
@@ -0,0 +1,10 @@
+BEGIN { print "@echo off"; }
+$0 !~ /^#/ && NF == 3 {
+ printf "echo #%d --\n", ++n
+ print "set R=0";
+ print "echo " $3 ">tmp.in";
+ print "grep -E -e \"" $2 "\" tmp.in >nul";
+ print "if errorlevel 1 set R=1";
+ print "if errorlevel 2 set R=2";
+ printf "if not %R% == " $1 " echo Spencer test #%d failed\n", n
+}
diff -ru2N grep-2.0/tests/check.bat grep+mb1.04/tests/check.bat
--- grep-2.0/tests/check.bat Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/tests/check.bat Fri Jul 9 17:05:30 1993
@@ -0,0 +1,14 @@
+@echo off
+rem
+rem Regression test for GNU e?grep.
+rem
+
+rem The Khadafy test is brought to you by Scott Anderson . . .
+grep -E -f tests/khadafy.reg tests/khadafy.lin > khadafy.out
+fc tests\khadafy.lin khadafy.out
+
+rem . . . and the following by Henry Spencer.
+
+gawk -F: -f tests/batgen.awk tests/spencer.dos > tmp.bat
+
+tmp
diff -ru2N grep-2.0/tests/spencer.dos grep+mb1.04/tests/spencer.dos
--- grep-2.0/tests/spencer.dos Thu Jan 1 09:00:00 1970
+++ grep+mb1.04/tests/spencer.dos Sat Jul 10 02:12:59 1993
@@ -0,0 +1,122 @@
+0:abc:abc
+1:abc:xbc
+1:abc:axc
+1:abc:abx
+0:abc:xabcy
+0:abc:ababc
+0:ab*c:abc
+0:ab*bc:abc
+0:ab*bc:abbc
+0:ab*bc:abbbbc
+0:ab+bc:abbc
+1:ab+bc:abc
+1:ab+bc:abq
+0:ab+bc:abbbbc
+0:ab?bc:abbc
+0:ab?bc:abc
+1:ab?bc:abbbbc
+0:ab?c:abc
+0:^abc$:abc
+1:^abc$:abcc
+0:^abc:abcc
+1:^abc$:aabc
+0:abc$:aabc
+0:^:abc
+0:$:abc
+0:a.c:abc
+0:a.c:axc
+0:a.*c:axyzc
+1:a.*c:axyzd
+1:a[bc]d:abc
+0:a[bc]d:abd
+1:a[b-d]e:abd
+0:a[b-d]e:ace
+0:a[b-d]:aac
+0:a[-b]:a-
+0:a[b-]:a-
+1:a[b-a]:-
+2:a[]b:-
+2:a[:-
+0:a]:a]
+0:a[]]b:a]b
+0:a[^bc]d:aed
+1:a[^bc]d:abd
+0:a[^-b]c:adc
+1:a[^-b]c:a-c
+1:a[^]b]c:a]c
+0:a[^]b]c:adc
+0:ab|cd:abc
+0:ab|cd:abcd
+0:()ef:def
+0:()*:-
+1:*a:-
+0:^*:-
+0:$*:-
+1:(*)b:-
+1:$b:b
+2:a\\:-
+0:a\(b:a(b
+0:a\(*b:ab
+0:a\(*b:a((b
+1:a\x:a\x
+2:abc):-
+2:(abc:-
+0:((a)):abc
+0:(a)b(c):abc
+0:a+b+c:aabbabc
+0:a**:-
+0:a*?:-
+0:(a*)*:-
+0:(a*)+:-
+0:(a|)*:-
+0:(a*|b)*:-
+0:(a+|b)*:ab
+0:(a+|b)+:ab
+0:(a+|b)?:ab
+0:[^ab]*:cde
+0:(^)*:-
+0:(ab|)*:-
+2:)(:-
+1:abc:
+1:abc:
+0:a*:
+0:([abc])*d:abbbcd
+0:([abc])*bcd:abcd
+0:a|b|c|d|e:e
+0:(a|b|c|d|e)f:ef
+0:((a*|b))*:-
+0:abcd*efg:abcdefg
+0:ab*:xabyabbbz
+0:ab*:xayabbbz
+0:(ab|cd)e:abcde
+0:[abhgefdc]ij:hij
+1:^(ab|cd)e:abcde
+0:(abc|)ef:abcdef
+0:(a|b)c*d:abcd
+0:(ab|ab*)bc:abc
+0:a([bc]*)c*:abc
+0:a([bc]*)(c*d):abcd
+0:a([bc]+)(c*d):abcd
+0:a([bc]*)(c+d):abcd
+0:a[bcd]*dcdcde:adcdcde
+1:a[bcd]+dcdcde:adcdcde
+0:(ab|a)b*c:abc
+0:((a)(b)c)(d):abcd
+0:[A-Za-z_][A-Za-z0-9_]*:alpha
+0:^a(bc+|b[eh])g|.h$:abh
+0:(bc+d$|ef*g.|h?i(j|k)):effgz
+0:(bc+d$|ef*g.|h?i(j|k)):ij
+1:(bc+d$|ef*g.|h?i(j|k)):effg
+1:(bc+d$|ef*g.|h?i(j|k)):bcdd
+0:(bc+d$|ef*g.|h?i(j|k)):reffgz
+1:((((((((((a)))))))))):-
+0:(((((((((a))))))))):a
+1:multiple words of text:uh-uh
+0:multiple words:multiple words, yeah
+0:(.*)c(.*):abcde
+1:\((.*),:(.*)\)
+1:[k]:ab
+0:abcd:abcd
+0:a(bc)d:abcd
+0:a[-]?c:ac
+0:(....).*\1:beriberi