OpenCores
URL https://opencores.org/ocsvn/openrisc_me/openrisc_me/trunk

Subversion Repositories openrisc_me

[/] [openrisc/] [trunk/] [gnu-src/] [newlib-1.18.0/] [newlib/] [libc/] [posix/] [regex.3] - Blame information for rev 252

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 207 jeremybenn
.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
2
.\" Copyright (c) 1992, 1993, 1994
3
.\"     The Regents of the University of California.  All rights reserved.
4
.\"
5
.\" This code is derived from software contributed to Berkeley by
6
.\" Henry Spencer.
7
.\"
8
.\" Redistribution and use in source and binary forms, with or without
9
.\" modification, are permitted provided that the following conditions
10
.\" are met:
11
.\" 1. Redistributions of source code must retain the above copyright
12
.\"    notice, this list of conditions and the following disclaimer.
13
.\" 2. Redistributions in binary form must reproduce the above copyright
14
.\"    notice, this list of conditions and the following disclaimer in the
15
.\"    documentation and/or other materials provided with the distribution.
16
.\" 3. All advertising materials mentioning features or use of this software
17
.\"    must display the following acknowledgement:
18
.\"     This product includes software developed by the University of
19
.\"     California, Berkeley and its contributors.
20
.\" 4. Neither the name of the University nor the names of its contributors
21
.\"    may be used to endorse or promote products derived from this software
22
.\"    without specific prior written permission.
23
.\"
24
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
25
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
28
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
29
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
30
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
31
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
32
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
33
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34
.\" SUCH DAMAGE.
35
.\"
36
.\"     @(#)regex.3     8.4 (Berkeley) 3/20/94
37
.\" $FreeBSD: src/lib/libc/regex/regex.3,v 1.9 2001/10/01 16:08:58 ru Exp $
38
.\"
39
.Dd March 20, 1994
40
.Dt REGEX 3
41
.Os
42
.Sh NAME
43
.Nm regcomp ,
44
.Nm regexec ,
45
.Nm regerror ,
46
.Nm regfree
47
.Nd regular-expression library
48
.Sh LIBRARY
49
.Lb libc
50
.Sh SYNOPSIS
51
.In sys/types.h
52
.In regex.h
53
.Ft int
54
.Fn regcomp "regex_t *preg" "const char *pattern" "int cflags"
55
.Ft int
56
.Fo regexec
57
.Fa "const regex_t *preg" "const char *string"
58
.Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
59
.Fc
60
.Ft size_t
61
.Fo regerror
62
.Fa "int errcode" "const regex_t *preg"
63
.Fa "char *errbuf" "size_t errbuf_size"
64
.Fc
65
.Ft void
66
.Fn regfree "regex_t *preg"
67
.Sh DESCRIPTION
68
These routines implement
69
.St -p1003.2
70
regular expressions
71
.Pq Do RE Dc Ns s ;
72
see
73
.Xr re_format 7 .
74
.Fn Regcomp
75
compiles an RE written as a string into an internal form,
76
.Fn regexec
77
matches that internal form against a string and reports results,
78
.Fn regerror
79
transforms error codes from either into human-readable messages,
80
and
81
.Fn regfree
82
frees any dynamically-allocated storage used by the internal form
83
of an RE.
84
.Pp
85
The header
86
.Aq Pa regex.h
87
declares two structure types,
88
.Ft regex_t
89
and
90
.Ft regmatch_t ,
91
the former for compiled internal forms and the latter for match reporting.
92
It also declares the four functions,
93
a type
94
.Ft regoff_t ,
95
and a number of constants with names starting with
96
.Dq Dv REG_ .
97
.Pp
98
.Fn Regcomp
99
compiles the regular expression contained in the
100
.Fa pattern
101
string,
102
subject to the flags in
103
.Fa cflags ,
104
and places the results in the
105
.Ft regex_t
106
structure pointed to by
107
.Fa preg .
108
.Fa Cflags
109
is the bitwise OR of zero or more of the following flags:
110
.Bl -tag -width REG_EXTENDED
111
.It Dv REG_EXTENDED
112
Compile modern
113
.Pq Dq extended
114
REs,
115
rather than the obsolete
116
.Pq Dq basic
117
REs that
118
are the default.
119
.It Dv REG_BASIC
120
This is a synonym for 0,
121
provided as a counterpart to
122
.Dv REG_EXTENDED
123
to improve readability.
124
.It Dv REG_NOSPEC
125
Compile with recognition of all special characters turned off.
126
All characters are thus considered ordinary,
127
so the
128
.Dq RE
129
is a literal string.
130
This is an extension,
131
compatible with but not specified by
132
.St -p1003.2 ,
133
and should be used with
134
caution in software intended to be portable to other systems.
135
.Dv REG_EXTENDED
136
and
137
.Dv REG_NOSPEC
138
may not be used
139
in the same call to
140
.Fn regcomp .
141
.It Dv REG_ICASE
142
Compile for matching that ignores upper/lower case distinctions.
143
See
144
.Xr re_format 7 .
145
.It Dv REG_NOSUB
146
Compile for matching that need only report success or failure,
147
not what was matched.
148
.It Dv REG_NEWLINE
149
Compile for newline-sensitive matching.
150
By default, newline is a completely ordinary character with no special
151
meaning in either REs or strings.
152
With this flag,
153
.Ql [^
154
bracket expressions and
155
.Ql .\&
156
never match newline,
157
a
158
.Ql ^\&
159
anchor matches the null string after any newline in the string
160
in addition to its normal function,
161
and the
162
.Ql $\&
163
anchor matches the null string before any newline in the
164
string in addition to its normal function.
165
.It Dv REG_PEND
166
The regular expression ends,
167
not at the first NUL,
168
but just before the character pointed to by the
169
.Va re_endp
170
member of the structure pointed to by
171
.Fa preg .
172
The
173
.Va re_endp
174
member is of type
175
.Ft "const char *" .
176
This flag permits inclusion of NULs in the RE;
177
they are considered ordinary characters.
178
This is an extension,
179
compatible with but not specified by
180
.St -p1003.2 ,
181
and should be used with
182
caution in software intended to be portable to other systems.
183
.El
184
.Pp
185
When successful,
186
.Fn regcomp
187
returns 0 and fills in the structure pointed to by
188
.Fa preg .
189
One member of that structure
190
(other than
191
.Va re_endp )
192
is publicized:
193
.Va re_nsub ,
194
of type
195
.Ft size_t ,
196
contains the number of parenthesized subexpressions within the RE
197
(except that the value of this member is undefined if the
198
.Dv REG_NOSUB
199
flag was used).
200
If
201
.Fn regcomp
202
fails, it returns a non-zero error code;
203
see
204
.Sx DIAGNOSTICS .
205
.Pp
206
.Fn Regexec
207
matches the compiled RE pointed to by
208
.Fa preg
209
against the
210
.Fa string ,
211
subject to the flags in
212
.Fa eflags ,
213
and reports results using
214
.Fa nmatch ,
215
.Fa pmatch ,
216
and the returned value.
217
The RE must have been compiled by a previous invocation of
218
.Fn regcomp .
219
The compiled form is not altered during execution of
220
.Fn regexec ,
221
so a single compiled RE can be used simultaneously by multiple threads.
222
.Pp
223
By default,
224
the NUL-terminated string pointed to by
225
.Fa string
226
is considered to be the text of an entire line, minus any terminating
227
newline.
228
The
229
.Fa eflags
230
argument is the bitwise OR of zero or more of the following flags:
231
.Bl -tag -width REG_STARTEND
232
.It Dv REG_NOTBOL
233
The first character of
234
the string
235
is not the beginning of a line, so the
236
.Ql ^\&
237
anchor should not match before it.
238
This does not affect the behavior of newlines under
239
.Dv REG_NEWLINE .
240
.It Dv REG_NOTEOL
241
The NUL terminating
242
the string
243
does not end a line, so the
244
.Ql $\&
245
anchor should not match before it.
246
This does not affect the behavior of newlines under
247
.Dv REG_NEWLINE .
248
.It Dv REG_STARTEND
249
The string is considered to start at
250
.Fa string
251
+
252
.Fa pmatch Ns [0]. Ns Va rm_so
253
and to have a terminating NUL located at
254
.Fa string
255
+
256
.Fa pmatch Ns [0]. Ns Va rm_eo
257
(there need not actually be a NUL at that location),
258
regardless of the value of
259
.Fa nmatch .
260
See below for the definition of
261
.Fa pmatch
262
and
263
.Fa nmatch .
264
This is an extension,
265
compatible with but not specified by
266
.St -p1003.2 ,
267
and should be used with
268
caution in software intended to be portable to other systems.
269
Note that a non-zero
270
.Va rm_so
271
does not imply
272
.Dv REG_NOTBOL ;
273
.Dv REG_STARTEND
274
affects only the location of the string,
275
not how it is matched.
276
.El
277
.Pp
278
See
279
.Xr re_format 7
280
for a discussion of what is matched in situations where an RE or a
281
portion thereof could match any of several substrings of
282
.Fa string .
283
.Pp
284
Normally,
285
.Fn regexec
286
returns 0 for success and the non-zero code
287
.Dv REG_NOMATCH
288
for failure.
289
Other non-zero error codes may be returned in exceptional situations;
290
see
291
.Sx DIAGNOSTICS .
292
.Pp
293
If
294
.Dv REG_NOSUB
295
was specified in the compilation of the RE,
296
or if
297
.Fa nmatch
298
is 0,
299
.Fn regexec
300
ignores the
301
.Fa pmatch
302
argument (but see below for the case where
303
.Dv REG_STARTEND
304
is specified).
305
Otherwise,
306
.Fa pmatch
307
points to an array of
308
.Fa nmatch
309
structures of type
310
.Ft regmatch_t .
311
Such a structure has at least the members
312
.Va rm_so
313
and
314
.Va rm_eo ,
315
both of type
316
.Ft regoff_t
317
(a signed arithmetic type at least as large as an
318
.Ft off_t
319
and a
320
.Ft ssize_t ) ,
321
containing respectively the offset of the first character of a substring
322
and the offset of the first character after the end of the substring.
323
Offsets are measured from the beginning of the
324
.Fa string
325
argument given to
326
.Fn regexec .
327
An empty substring is denoted by equal offsets,
328
both indicating the character following the empty substring.
329
.Pp
330
The 0th member of the
331
.Fa pmatch
332
array is filled in to indicate what substring of
333
.Fa string
334
was matched by the entire RE.
335
Remaining members report what substring was matched by parenthesized
336
subexpressions within the RE;
337
member
338
.Va i
339
reports subexpression
340
.Va i ,
341
with subexpressions counted (starting at 1) by the order of their opening
342
parentheses in the RE, left to right.
343
Unused entries in the array (corresponding either to subexpressions that
344
did not participate in the match at all, or to subexpressions that do not
345
exist in the RE (that is,
346
.Va i
347
>
348
.Fa preg Ns -> Ns Va re_nsub ) )
349
have both
350
.Va rm_so
351
and
352
.Va rm_eo
353
set to -1.
354
If a subexpression participated in the match several times,
355
the reported substring is the last one it matched.
356
(Note, as an example in particular, that when the RE
357
.Ql "(b*)+"
358
matches
359
.Ql bbb ,
360
the parenthesized subexpression matches each of the three
361
.So Li b Sc Ns s
362
and then
363
an infinite number of empty strings following the last
364
.Ql b ,
365
so the reported substring is one of the empties.)
366
.Pp
367
If
368
.Dv REG_STARTEND
369
is specified,
370
.Fa pmatch
371
must point to at least one
372
.Ft regmatch_t
373
(even if
374
.Fa nmatch
375
is 0 or
376
.Dv REG_NOSUB
377
was specified),
378
to hold the input offsets for
379
.Dv REG_STARTEND .
380
Use for output is still entirely controlled by
381
.Fa nmatch ;
382
if
383
.Fa nmatch
384
is 0 or
385
.Dv REG_NOSUB
386
was specified,
387
the value of
388
.Fa pmatch Ns [0]
389
will not be changed by a successful
390
.Fn regexec .
391
.Pp
392
.Fn Regerror
393
maps a non-zero
394
.Fa errcode
395
from either
396
.Fn regcomp
397
or
398
.Fn regexec
399
to a human-readable, printable message.
400
If
401
.Fa preg
402
is
403
.No non\- Ns Dv NULL ,
404
the error code should have arisen from use of
405
the
406
.Ft regex_t
407
pointed to by
408
.Fa preg ,
409
and if the error code came from
410
.Fn regcomp ,
411
it should have been the result from the most recent
412
.Fn regcomp
413
using that
414
.Ft regex_t .
415
.No ( Fn Regerror
416
may be able to supply a more detailed message using information
417
from the
418
.Ft regex_t . )
419
.Fn Regerror
420
places the NUL-terminated message into the buffer pointed to by
421
.Fa errbuf ,
422
limiting the length (including the NUL) to at most
423
.Fa errbuf_size
424
bytes.
425
If the whole message won't fit,
426
as much of it as will fit before the terminating NUL is supplied.
427
In any case,
428
the returned value is the size of buffer needed to hold the whole
429
message (including terminating NUL).
430
If
431
.Fa errbuf_size
432
is 0,
433
.Fa errbuf
434
is ignored but the return value is still correct.
435
.Pp
436
If the
437
.Fa errcode
438
given to
439
.Fn regerror
440
is first ORed with
441
.Dv REG_ITOA ,
442
the
443
.Dq message
444
that results is the printable name of the error code,
445
e.g.\&
446
.Dq Dv REG_NOMATCH ,
447
rather than an explanation thereof.
448
If
449
.Fa errcode
450
is
451
.Dv REG_ATOI ,
452
then
453
.Fa preg
454
shall be
455
.No non\- Ns Dv NULL
456
and the
457
.Va re_endp
458
member of the structure it points to
459
must point to the printable name of an error code;
460
in this case, the result in
461
.Fa errbuf
462
is the decimal digits of
463
the numeric value of the error code
464
(0 if the name is not recognized).
465
.Dv REG_ITOA
466
and
467
.Dv REG_ATOI
468
are intended primarily as debugging facilities;
469
they are extensions,
470
compatible with but not specified by
471
.St -p1003.2 ,
472
and should be used with
473
caution in software intended to be portable to other systems.
474
Be warned also that they are considered experimental and changes are possible.
475
.Pp
476
.Fn Regfree
477
frees any dynamically-allocated storage associated with the compiled RE
478
pointed to by
479
.Fa preg .
480
The remaining
481
.Ft regex_t
482
is no longer a valid compiled RE
483
and the effect of supplying it to
484
.Fn regexec
485
or
486
.Fn regerror
487
is undefined.
488
.Pp
489
None of these functions references global variables except for tables
490
of constants;
491
all are safe for use from multiple threads if the arguments are safe.
492
.Sh IMPLEMENTATION CHOICES
493
There are a number of decisions that
494
.St -p1003.2
495
leaves up to the implementor,
496
either by explicitly saying
497
.Dq undefined
498
or by virtue of them being
499
forbidden by the RE grammar.
500
This implementation treats them as follows.
501
.Pp
502
See
503
.Xr re_format 7
504
for a discussion of the definition of case-independent matching.
505
.Pp
506
There is no particular limit on the length of REs,
507
except insofar as memory is limited.
508
Memory usage is approximately linear in RE size, and largely insensitive
509
to RE complexity, except for bounded repetitions.
510
See
511
.Sx BUGS
512
for one short RE using them
513
that will run almost any system out of memory.
514
.Pp
515
A backslashed character other than one specifically given a magic meaning
516
by
517
.St -p1003.2
518
(such magic meanings occur only in obsolete
519
.Bq Dq basic
520
REs)
521
is taken as an ordinary character.
522
.Pp
523
Any unmatched
524
.Ql [\&
525
is a
526
.Dv REG_EBRACK
527
error.
528
.Pp
529
Equivalence classes cannot begin or end bracket-expression ranges.
530
The endpoint of one range cannot begin another.
531
.Pp
532
.Dv RE_DUP_MAX ,
533
the limit on repetition counts in bounded repetitions, is 255.
534
.Pp
535
A repetition operator
536
.Ql ( ?\& ,
537
.Ql *\& ,
538
.Ql +\& ,
539
or bounds)
540
cannot follow another
541
repetition operator.
542
A repetition operator cannot begin an expression or subexpression
543
or follow
544
.Ql ^\&
545
or
546
.Ql |\& .
547
.Pp
548
.Ql |\&
549
cannot appear first or last in a (sub)expression or after another
550
.Ql |\& ,
551
i.e. an operand of
552
.Ql |\&
553
cannot be an empty subexpression.
554
An empty parenthesized subexpression,
555
.Ql "()" ,
556
is legal and matches an
557
empty (sub)string.
558
An empty string is not a legal RE.
559
.Pp
560
A
561
.Ql {\&
562
followed by a digit is considered the beginning of bounds for a
563
bounded repetition, which must then follow the syntax for bounds.
564
A
565
.Ql {\&
566
.Em not
567
followed by a digit is considered an ordinary character.
568
.Pp
569
.Ql ^\&
570
and
571
.Ql $\&
572
beginning and ending subexpressions in obsolete
573
.Pq Dq basic
574
REs are anchors, not ordinary characters.
575
.Sh SEE ALSO
576
.Xr grep 1 ,
577
.Xr re_format 7
578
.Pp
579
.St -p1003.2 ,
580
sections 2.8 (Regular Expression Notation)
581
and
582
B.5 (C Binding for Regular Expression Matching).
583
.Sh DIAGNOSTICS
584
Non-zero error codes from
585
.Fn regcomp
586
and
587
.Fn regexec
588
include the following:
589
.Pp
590
.Bl -tag -width REG_ECOLLATE -compact
591
.It Dv REG_NOMATCH
592
.Fn regexec
593
failed to match
594
.It Dv REG_BADPAT
595
invalid regular expression
596
.It Dv REG_ECOLLATE
597
invalid collating element
598
.It Dv REG_ECTYPE
599
invalid character class
600
.It Dv REG_EESCAPE
601
.Ql \e
602
applied to unescapable character
603
.It Dv REG_ESUBREG
604
invalid backreference number
605
.It Dv REG_EBRACK
606
brackets
607
.Ql "[ ]"
608
not balanced
609
.It Dv REG_EPAREN
610
parentheses
611
.Ql "( )"
612
not balanced
613
.It Dv REG_EBRACE
614
braces
615
.Ql "{ }"
616
not balanced
617
.It Dv REG_BADBR
618
invalid repetition count(s) in
619
.Ql "{ }"
620
.It Dv REG_ERANGE
621
invalid character range in
622
.Ql "[ ]"
623
.It Dv REG_ESPACE
624
ran out of memory
625
.It Dv REG_BADRPT
626
.Ql ?\& ,
627
.Ql *\& ,
628
or
629
.Ql +\&
630
operand invalid
631
.It Dv REG_EMPTY
632
empty (sub)expression
633
.It Dv REG_ASSERT
634
can't happen - you found a bug
635
.It Dv REG_INVARG
636
invalid argument, e.g. negative-length string
637
.El
638
.Sh HISTORY
639
Originally written by
640
.An Henry Spencer .
641
Altered for inclusion in the
642
.Bx 4.4
643
distribution.
644
.Sh BUGS
645
This is an alpha release with known defects.
646
Please report problems.
647
.Pp
648
The back-reference code is subtle and doubts linger about its correctness
649
in complex cases.
650
.Pp
651
.Fn Regexec
652
performance is poor.
653
This will improve with later releases.
654
.Fa Nmatch
655
exceeding 0 is expensive;
656
.Fa nmatch
657
exceeding 1 is worse.
658
.Fn Regexec
659
is largely insensitive to RE complexity
660
.Em except
661
that back
662
references are massively expensive.
663
RE length does matter; in particular, there is a strong speed bonus
664
for keeping RE length under about 30 characters,
665
with most special characters counting roughly double.
666
.Pp
667
.Fn Regcomp
668
implements bounded repetitions by macro expansion,
669
which is costly in time and space if counts are large
670
or bounded repetitions are nested.
671
An RE like, say,
672
.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}"
673
will (eventually) run almost any existing machine out of swap space.
674
.Pp
675
There are suspected problems with response to obscure error conditions.
676
Notably,
677
certain kinds of internal overflow,
678
produced only by truly enormous REs or by multiply nested bounded repetitions,
679
are probably not handled well.
680
.Pp
681
Due to a mistake in
682
.St -p1003.2 ,
683
things like
684
.Ql "a)b"
685
are legal REs because
686
.Ql )\&
687
is
688
a special character only in the presence of a previous unmatched
689
.Ql (\& .
690
This can't be fixed until the spec is fixed.
691
.Pp
692
The standard's definition of back references is vague.
693
For example, does
694
.Ql "a\e(\e(b\e)*\e2\e)*d"
695
match
696
.Ql "abbbd" ?
697
Until the standard is clarified,
698
behavior in such cases should not be relied on.
699
.Pp
700
The implementation of word-boundary matching is a bit of a kludge,
701
and bugs may lurk in combinations of word-boundary matching and anchoring.

powered by: WebSVN 2.1.0

© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.