1 |
199 |
simons |
This is a nearly-public-domain reimplementation of the V8 regexp(3) package.
|
2 |
|
|
It gives C programs the ability to use egrep-style regular expressions, and
|
3 |
|
|
does it in a much cleaner fashion than the analogous routines in SysV.
|
4 |
|
|
|
5 |
|
|
Copyright (c) 1986 by University of Toronto.
|
6 |
|
|
Written by Henry Spencer. Not derived from licensed software.
|
7 |
|
|
|
8 |
|
|
Permission is granted to anyone to use this software for any
|
9 |
|
|
purpose on any computer system, and to redistribute it freely,
|
10 |
|
|
subject to the following restrictions:
|
11 |
|
|
|
12 |
|
|
1. The author is not responsible for the consequences of use of
|
13 |
|
|
this software, no matter how awful, even if they arise
|
14 |
|
|
from defects in it.
|
15 |
|
|
|
16 |
|
|
2. The origin of this software must not be misrepresented, either
|
17 |
|
|
by explicit claim or by omission.
|
18 |
|
|
|
19 |
|
|
3. Altered versions must be plainly marked as such, and must not
|
20 |
|
|
be misrepresented as being the original software.
|
21 |
|
|
|
22 |
|
|
Barring a couple of small items in the BUGS list, this implementation is
|
23 |
|
|
believed 100% compatible with V8. It should even be binary-compatible,
|
24 |
|
|
sort of, since the only fields in a "struct regexp" that other people have
|
25 |
|
|
any business touching are declared in exactly the same way at the same
|
26 |
|
|
location in the struct (the beginning).
|
27 |
|
|
|
28 |
|
|
This implementation is *NOT* AT&T/Bell code, and is not derived from licensed
|
29 |
|
|
software. Even though U of T is a V8 licensee. This software is based on
|
30 |
|
|
a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
|
31 |
|
|
here is a complete rewrite and hence is not covered by AT&T copyright).
|
32 |
|
|
The software was nearly complete at the time of arrival of our V8 tape.
|
33 |
|
|
I haven't even looked at V8 yet, although a friend elsewhere at U of T has
|
34 |
|
|
been kind enough to run a few test programs using the V8 regexp(3) to resolve
|
35 |
|
|
a few fine points. I admit to some familiarity with regular-expression
|
36 |
|
|
implementations of the past, but the only one that this code traces any
|
37 |
|
|
ancestry to is the one published in Kernighan & Plauger (from which this
|
38 |
|
|
one draws ideas but not code).
|
39 |
|
|
|
40 |
|
|
Simplistically: put this stuff into a source directory, copy regexp.h into
|
41 |
|
|
/usr/include, inspect Makefile for compilation options that need changing
|
42 |
|
|
to suit your local environment, and then do "make r". This compiles the
|
43 |
|
|
regexp(3) functions, compiles a test program, and runs a large set of
|
44 |
|
|
regression tests. If there are no complaints, then put regexp.o, regsub.o,
|
45 |
|
|
and regerror.o into your C library, and regexp.3 into your manual-pages
|
46 |
|
|
directory.
|
47 |
|
|
|
48 |
|
|
Note that if you don't put regexp.h into /usr/include *before* compiling,
|
49 |
|
|
you'll have to add "-I." to CFLAGS before compiling.
|
50 |
|
|
|
51 |
|
|
The files are:
|
52 |
|
|
|
53 |
|
|
Makefile instructions to make everything
|
54 |
|
|
regexp.3 manual page
|
55 |
|
|
regexp.h header file, for /usr/include
|
56 |
|
|
regexp.c source for regcomp() and regexec()
|
57 |
|
|
regsub.c source for regsub()
|
58 |
|
|
regerror.c source for default regerror()
|
59 |
|
|
regmagic.h internal header file
|
60 |
|
|
try.c source for test program
|
61 |
|
|
timer.c source for timing program
|
62 |
|
|
tests test list for try and timer
|
63 |
|
|
|
64 |
|
|
This implementation uses nondeterministic automata rather than the
|
65 |
|
|
deterministic ones found in some other implementations, which makes it
|
66 |
|
|
simpler, smaller, and faster at compiling regular expressions, but slower
|
67 |
|
|
at executing them. In theory, anyway. This implementation does employ
|
68 |
|
|
some special-case optimizations to make the simpler cases (which do make
|
69 |
|
|
up the bulk of regular expressions actually used) run quickly. In general,
|
70 |
|
|
if you want blazing speed you're in the wrong place. Replacing the insides
|
71 |
|
|
of egrep with this stuff is probably a mistake; if you want your own egrep
|
72 |
|
|
you're going to have to do a lot more work. But if you want to use regular
|
73 |
|
|
expressions a little bit in something else, you're in luck. Note that many
|
74 |
|
|
existing text editors use nondeterministic regular-expression implementations,
|
75 |
|
|
so you're in good company.
|
76 |
|
|
|
77 |
|
|
This stuff should be pretty portable, given appropriate option settings.
|
78 |
|
|
If your chars have less than 8 bits, you're going to have to change the
|
79 |
|
|
internal representation of the automaton, although knowledge of the details
|
80 |
|
|
of this is fairly localized. There are no "reserved" char values except for
|
81 |
|
|
NUL, and no special significance is attached to the top bit of chars.
|
82 |
|
|
The string(3) functions are used a fair bit, on the grounds that they are
|
83 |
|
|
probably faster than coding the operations in line. Some attempts at code
|
84 |
|
|
tuning have been made, but this is invariably a bit machine-specific.
|