1 |
706 |
jeremybenn |
------------------------------------------------------------------------------
|
2 |
|
|
-- --
|
3 |
|
|
-- GNAT LIBRARY COMPONENTS --
|
4 |
|
|
-- --
|
5 |
|
|
-- G N A T . S P I T B O L . P A T T E R N S --
|
6 |
|
|
-- --
|
7 |
|
|
-- S p e c --
|
8 |
|
|
-- --
|
9 |
|
|
-- Copyright (C) 1997-2010, AdaCore --
|
10 |
|
|
-- --
|
11 |
|
|
-- GNAT is free software; you can redistribute it and/or modify it under --
|
12 |
|
|
-- terms of the GNU General Public License as published by the Free Soft- --
|
13 |
|
|
-- ware Foundation; either version 3, or (at your option) any later ver- --
|
14 |
|
|
-- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
|
15 |
|
|
-- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
|
16 |
|
|
-- or FITNESS FOR A PARTICULAR PURPOSE. --
|
17 |
|
|
-- --
|
18 |
|
|
-- As a special exception under Section 7 of GPL version 3, you are granted --
|
19 |
|
|
-- additional permissions described in the GCC Runtime Library Exception, --
|
20 |
|
|
-- version 3.1, as published by the Free Software Foundation. --
|
21 |
|
|
-- --
|
22 |
|
|
-- You should have received a copy of the GNU General Public License and --
|
23 |
|
|
-- a copy of the GCC Runtime Library Exception along with this program; --
|
24 |
|
|
-- see the files COPYING3 and COPYING.RUNTIME respectively. If not, see --
|
25 |
|
|
-- <http://www.gnu.org/licenses/>. --
|
26 |
|
|
-- --
|
27 |
|
|
-- GNAT was originally developed by the GNAT team at New York University. --
|
28 |
|
|
-- Extensive contributions were provided by Ada Core Technologies Inc. --
|
29 |
|
|
-- --
|
30 |
|
|
------------------------------------------------------------------------------
|
31 |
|
|
|
32 |
|
|
-- SPITBOL-like pattern construction and matching
|
33 |
|
|
|
34 |
|
|
-- This child package of GNAT.SPITBOL provides a complete implementation
|
35 |
|
|
-- of the SPITBOL-like pattern construction and matching operations. This
|
36 |
|
|
-- package is based on Macro-SPITBOL created by Robert Dewar.
|
37 |
|
|
|
38 |
|
|
------------------------------------------------------------
|
39 |
|
|
-- Summary of Pattern Matching Packages in GNAT Hierarchy --
|
40 |
|
|
------------------------------------------------------------
|
41 |
|
|
|
42 |
|
|
-- There are three related packages that perform pattern matching functions.
|
43 |
|
|
-- the following is an outline of these packages, to help you determine
|
44 |
|
|
-- which is best for your needs.
|
45 |
|
|
|
46 |
|
|
-- GNAT.Regexp (files g-regexp.ads/g-regexp.adb)
|
47 |
|
|
-- This is a simple package providing Unix-style regular expression
|
48 |
|
|
-- matching with the restriction that it matches entire strings. It
|
49 |
|
|
-- is particularly useful for file name matching, and in particular
|
50 |
|
|
-- it provides "globbing patterns" that are useful in implementing
|
51 |
|
|
-- unix or DOS style wild card matching for file names.
|
52 |
|
|
|
53 |
|
|
-- GNAT.Regpat (files g-regpat.ads/g-regpat.adb)
|
54 |
|
|
-- This is a more complete implementation of Unix-style regular
|
55 |
|
|
-- expressions, copied from the original V7 style regular expression
|
56 |
|
|
-- library written in C by Henry Spencer. It is functionally the
|
57 |
|
|
-- same as this library, and uses the same internal data structures
|
58 |
|
|
-- stored in a binary compatible manner.
|
59 |
|
|
|
60 |
|
|
-- GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb)
|
61 |
|
|
-- This is a completely general patterm matching package based on the
|
62 |
|
|
-- pattern language of SNOBOL4, as implemented in SPITBOL. The pattern
|
63 |
|
|
-- language is modeled on context free grammars, with context sensitive
|
64 |
|
|
-- extensions that provide full (type 0) computational capabilities.
|
65 |
|
|
|
66 |
|
|
with Ada.Strings.Maps; use Ada.Strings.Maps;
|
67 |
|
|
with Ada.Text_IO; use Ada.Text_IO;
|
68 |
|
|
|
69 |
|
|
package GNAT.Spitbol.Patterns is
|
70 |
|
|
pragma Elaborate_Body;
|
71 |
|
|
|
72 |
|
|
-------------------------------
|
73 |
|
|
-- Pattern Matching Tutorial --
|
74 |
|
|
-------------------------------
|
75 |
|
|
|
76 |
|
|
-- A pattern matching operation (a call to one of the Match subprograms)
|
77 |
|
|
-- takes a subject string and a pattern, and optionally a replacement
|
78 |
|
|
-- string. The replacement string option is only allowed if the subject
|
79 |
|
|
-- is a variable.
|
80 |
|
|
|
81 |
|
|
-- The pattern is matched against the subject string, and either the
|
82 |
|
|
-- match fails, or it succeeds matching a contiguous substring. If a
|
83 |
|
|
-- replacement string is specified, then the subject string is modified
|
84 |
|
|
-- by replacing the matched substring with the given replacement.
|
85 |
|
|
|
86 |
|
|
-- Concatenation and Alternation
|
87 |
|
|
-- =============================
|
88 |
|
|
|
89 |
|
|
-- A pattern consists of a series of pattern elements. The pattern is
|
90 |
|
|
-- built up using either the concatenation operator:
|
91 |
|
|
|
92 |
|
|
-- A & B
|
93 |
|
|
|
94 |
|
|
-- which means match A followed immediately by matching B, or the
|
95 |
|
|
-- alternation operator:
|
96 |
|
|
|
97 |
|
|
-- A or B
|
98 |
|
|
|
99 |
|
|
-- which means first attempt to match A, and then if that does not
|
100 |
|
|
-- succeed, match B.
|
101 |
|
|
|
102 |
|
|
-- There is full backtracking, which means that if a given pattern
|
103 |
|
|
-- element fails to match, then previous alternatives are matched.
|
104 |
|
|
-- For example if we have the pattern:
|
105 |
|
|
|
106 |
|
|
-- (A or B) & (C or D) & (E or F)
|
107 |
|
|
|
108 |
|
|
-- First we attempt to match A, if that succeeds, then we go on to try
|
109 |
|
|
-- to match C, and if that succeeds, we go on to try to match E. If E
|
110 |
|
|
-- fails, then we try F. If F fails, then we go back and try matching
|
111 |
|
|
-- D instead of C. Let's make this explicit using a specific example,
|
112 |
|
|
-- and introducing the simplest kind of pattern element, which is a
|
113 |
|
|
-- literal string. The meaning of this pattern element is simply to
|
114 |
|
|
-- match the characters that correspond to the string characters. Now
|
115 |
|
|
-- let's rewrite the above pattern form with specific string literals
|
116 |
|
|
-- as the pattern elements:
|
117 |
|
|
|
118 |
|
|
-- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
|
119 |
|
|
|
120 |
|
|
-- The following strings will be attempted in sequence:
|
121 |
|
|
|
122 |
|
|
-- ABC . DEF . GH
|
123 |
|
|
-- ABC . DEF . IJ
|
124 |
|
|
-- ABC . CDE . GH
|
125 |
|
|
-- ABC . CDE . IJ
|
126 |
|
|
-- AB . DEF . GH
|
127 |
|
|
-- AB . DEF . IJ
|
128 |
|
|
-- AB . CDE . GH
|
129 |
|
|
-- AB . CDE . IJ
|
130 |
|
|
|
131 |
|
|
-- Here we use the dot simply to separate the pieces of the string
|
132 |
|
|
-- matched by the three separate elements.
|
133 |
|
|
|
134 |
|
|
-- Moving the Start Point
|
135 |
|
|
-- ======================
|
136 |
|
|
|
137 |
|
|
-- A pattern is not required to match starting at the first character
|
138 |
|
|
-- of the string, and is not required to match to the end of the string.
|
139 |
|
|
-- The first attempt does indeed attempt to match starting at the first
|
140 |
|
|
-- character of the string, trying all the possible alternatives. But
|
141 |
|
|
-- if all alternatives fail, then the starting point of the match is
|
142 |
|
|
-- moved one character, and all possible alternatives are attempted at
|
143 |
|
|
-- the new anchor point.
|
144 |
|
|
|
145 |
|
|
-- The entire match fails only when every possible starting point has
|
146 |
|
|
-- been attempted. As an example, suppose that we had the subject
|
147 |
|
|
-- string
|
148 |
|
|
|
149 |
|
|
-- "ABABCDEIJKL"
|
150 |
|
|
|
151 |
|
|
-- matched using the pattern in the previous example:
|
152 |
|
|
|
153 |
|
|
-- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
|
154 |
|
|
|
155 |
|
|
-- would succeed, after two anchor point moves:
|
156 |
|
|
|
157 |
|
|
-- "ABABCDEIJKL"
|
158 |
|
|
-- ^^^^^^^
|
159 |
|
|
-- matched
|
160 |
|
|
-- section
|
161 |
|
|
|
162 |
|
|
-- This mode of pattern matching is called the unanchored mode. It is
|
163 |
|
|
-- also possible to put the pattern matcher into anchored mode by
|
164 |
|
|
-- setting the global variable Anchored_Mode to True. This will cause
|
165 |
|
|
-- all subsequent matches to be performed in anchored mode, where the
|
166 |
|
|
-- match is required to start at the first character.
|
167 |
|
|
|
168 |
|
|
-- We will also see later how the effect of an anchored match can be
|
169 |
|
|
-- obtained for a single specified anchor point if this is desired.
|
170 |
|
|
|
171 |
|
|
-- Other Pattern Elements
|
172 |
|
|
-- ======================
|
173 |
|
|
|
174 |
|
|
-- In addition to strings (or single characters), there are many special
|
175 |
|
|
-- pattern elements that correspond to special predefined alternations:
|
176 |
|
|
|
177 |
|
|
-- Arb Matches any string. First it matches the null string, and
|
178 |
|
|
-- then on a subsequent failure, matches one character, and
|
179 |
|
|
-- then two characters, and so on. It only fails if the
|
180 |
|
|
-- entire remaining string is matched.
|
181 |
|
|
|
182 |
|
|
-- Bal Matches a non-empty string that is parentheses balanced
|
183 |
|
|
-- with respect to ordinary () characters. Examples of
|
184 |
|
|
-- balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E".
|
185 |
|
|
-- Bal matches the shortest possible balanced string on the
|
186 |
|
|
-- first attempt, and if there is a subsequent failure,
|
187 |
|
|
-- attempts to extend the string.
|
188 |
|
|
|
189 |
|
|
-- Cancel Immediately aborts the entire pattern match, signalling
|
190 |
|
|
-- failure. This is a specialized pattern element, which is
|
191 |
|
|
-- useful in conjunction with some of the special pattern
|
192 |
|
|
-- elements that have side effects.
|
193 |
|
|
|
194 |
|
|
-- Fail The null alternation. Matches no possible strings, so it
|
195 |
|
|
-- always signals failure. This is a specialized pattern
|
196 |
|
|
-- element, which is useful in conjunction with some of the
|
197 |
|
|
-- special pattern elements that have side effects.
|
198 |
|
|
|
199 |
|
|
-- Fence Matches the null string at first, and then if a failure
|
200 |
|
|
-- causes alternatives to be sought, aborts the match (like
|
201 |
|
|
-- a Cancel). Note that using Fence at the start of a pattern
|
202 |
|
|
-- has the same effect as matching in anchored mode.
|
203 |
|
|
|
204 |
|
|
-- Rest Matches from the current point to the last character in
|
205 |
|
|
-- the string. This is a specialized pattern element, which
|
206 |
|
|
-- is useful in conjunction with some of the special pattern
|
207 |
|
|
-- elements that have side effects.
|
208 |
|
|
|
209 |
|
|
-- Succeed Repeatedly matches the null string (it is equivalent to
|
210 |
|
|
-- the alternation ("" or "" or "" ....). This is a special
|
211 |
|
|
-- pattern element, which is useful in conjunction with some
|
212 |
|
|
-- of the special pattern elements that have side effects.
|
213 |
|
|
|
214 |
|
|
-- Pattern Construction Functions
|
215 |
|
|
-- ==============================
|
216 |
|
|
|
217 |
|
|
-- The following functions construct additional pattern elements
|
218 |
|
|
|
219 |
|
|
-- Any(S) Where S is a string, matches a single character that is
|
220 |
|
|
-- any one of the characters in S. Fails if the current
|
221 |
|
|
-- character is not one of the given set of characters.
|
222 |
|
|
|
223 |
|
|
-- Arbno(P) Where P is any pattern, matches any number of instances
|
224 |
|
|
-- of the pattern, starting with zero occurrences. It is
|
225 |
|
|
-- thus equivalent to ("" or (P & ("" or (P & ("" ....)))).
|
226 |
|
|
-- The pattern P may contain any number of pattern elements
|
227 |
|
|
-- including the use of alternation and concatenation.
|
228 |
|
|
|
229 |
|
|
-- Break(S) Where S is a string, matches a string of zero or more
|
230 |
|
|
-- characters up to but not including a break character
|
231 |
|
|
-- that is one of the characters given in the string S.
|
232 |
|
|
-- Can match the null string, but cannot match the last
|
233 |
|
|
-- character in the string, since a break character is
|
234 |
|
|
-- required to be present.
|
235 |
|
|
|
236 |
|
|
-- BreakX(S) Where S is a string, behaves exactly like Break(S) when
|
237 |
|
|
-- it first matches, but if a string is successfully matched,
|
238 |
|
|
-- then a subsequent failure causes an attempt to extend the
|
239 |
|
|
-- matched string.
|
240 |
|
|
|
241 |
|
|
-- Fence(P) Where P is a pattern, attempts to match the pattern P
|
242 |
|
|
-- including trying all possible alternatives of P. If none
|
243 |
|
|
-- of these alternatives succeeds, then the Fence pattern
|
244 |
|
|
-- fails. If one alternative succeeds, then the pattern
|
245 |
|
|
-- match proceeds, but on a subsequent failure, no attempt
|
246 |
|
|
-- is made to search for alternative matches of P. The
|
247 |
|
|
-- pattern P may contain any number of pattern elements
|
248 |
|
|
-- including the use of alternation and concatenation.
|
249 |
|
|
|
250 |
|
|
-- Len(N) Where N is a natural number, matches the given number of
|
251 |
|
|
-- characters. For example, Len(10) matches any string that
|
252 |
|
|
-- is exactly ten characters long.
|
253 |
|
|
|
254 |
|
|
-- NotAny(S) Where S is a string, matches a single character that is
|
255 |
|
|
-- not one of the characters of S. Fails if the current
|
256 |
|
|
-- character is one of the given set of characters.
|
257 |
|
|
|
258 |
|
|
-- NSpan(S) Where S is a string, matches a string of zero or more
|
259 |
|
|
-- characters that is among the characters given in the
|
260 |
|
|
-- string. Always matches the longest possible such string.
|
261 |
|
|
-- Always succeeds, since it can match the null string.
|
262 |
|
|
|
263 |
|
|
-- Pos(N) Where N is a natural number, matches the null string
|
264 |
|
|
-- if exactly N characters have been matched so far, and
|
265 |
|
|
-- otherwise fails.
|
266 |
|
|
|
267 |
|
|
-- Rpos(N) Where N is a natural number, matches the null string
|
268 |
|
|
-- if exactly N characters remain to be matched, and
|
269 |
|
|
-- otherwise fails.
|
270 |
|
|
|
271 |
|
|
-- Rtab(N) Where N is a natural number, matches characters from
|
272 |
|
|
-- the current position until exactly N characters remain
|
273 |
|
|
-- to be matched in the string. Fails if fewer than N
|
274 |
|
|
-- unmatched characters remain in the string.
|
275 |
|
|
|
276 |
|
|
-- Tab(N) Where N is a natural number, matches characters from
|
277 |
|
|
-- the current position until exactly N characters have
|
278 |
|
|
-- been matched in all. Fails if more than N characters
|
279 |
|
|
-- have already been matched.
|
280 |
|
|
|
281 |
|
|
-- Span(S) Where S is a string, matches a string of one or more
|
282 |
|
|
-- characters that is among the characters given in the
|
283 |
|
|
-- string. Always matches the longest possible such string.
|
284 |
|
|
-- Fails if the current character is not one of the given
|
285 |
|
|
-- set of characters.
|
286 |
|
|
|
287 |
|
|
-- Recursive Pattern Matching
|
288 |
|
|
-- ==========================
|
289 |
|
|
|
290 |
|
|
-- The plus operator (+P) where P is a pattern variable, creates
|
291 |
|
|
-- a recursive pattern that will, at pattern matching time, follow
|
292 |
|
|
-- the pointer to obtain the referenced pattern, and then match this
|
293 |
|
|
-- pattern. This may be used to construct recursive patterns. Consider
|
294 |
|
|
-- for example:
|
295 |
|
|
|
296 |
|
|
-- P := ("A" or ("B" & (+P)))
|
297 |
|
|
|
298 |
|
|
-- On the first attempt, this pattern attempts to match the string "A".
|
299 |
|
|
-- If this fails, then the alternative matches a "B", followed by an
|
300 |
|
|
-- attempt to match P again. This second attempt first attempts to
|
301 |
|
|
-- match "A", and so on. The result is a pattern that will match a
|
302 |
|
|
-- string of B's followed by a single A.
|
303 |
|
|
|
304 |
|
|
-- This particular example could simply be written as NSpan('B') & 'A',
|
305 |
|
|
-- but the use of recursive patterns in the general case can construct
|
306 |
|
|
-- complex patterns which could not otherwise be built.
|
307 |
|
|
|
308 |
|
|
-- Pattern Assignment Operations
|
309 |
|
|
-- =============================
|
310 |
|
|
|
311 |
|
|
-- In addition to the overall result of a pattern match, which indicates
|
312 |
|
|
-- success or failure, it is often useful to be able to keep track of
|
313 |
|
|
-- the pieces of the subject string that are matched by individual
|
314 |
|
|
-- pattern elements, or subsections of the pattern.
|
315 |
|
|
|
316 |
|
|
-- The pattern assignment operators allow this capability. The first
|
317 |
|
|
-- form is the immediate assignment:
|
318 |
|
|
|
319 |
|
|
-- P * S
|
320 |
|
|
|
321 |
|
|
-- Here P is an arbitrary pattern, and S is a variable of type VString
|
322 |
|
|
-- that will be set to the substring matched by P. This assignment
|
323 |
|
|
-- happens during pattern matching, so if P matches more than once,
|
324 |
|
|
-- then the assignment happens more than once.
|
325 |
|
|
|
326 |
|
|
-- The deferred assignment operation:
|
327 |
|
|
|
328 |
|
|
-- P ** S
|
329 |
|
|
|
330 |
|
|
-- avoids these multiple assignments by deferring the assignment to the
|
331 |
|
|
-- end of the match. If the entire match is successful, and if the
|
332 |
|
|
-- pattern P was part of the successful match, then at the end of the
|
333 |
|
|
-- matching operation the assignment to S of the string matching P is
|
334 |
|
|
-- performed.
|
335 |
|
|
|
336 |
|
|
-- The cursor assignment operation:
|
337 |
|
|
|
338 |
|
|
-- Setcur(N'Access)
|
339 |
|
|
|
340 |
|
|
-- assigns the current cursor position to the natural variable N. The
|
341 |
|
|
-- cursor position is defined as the count of characters that have been
|
342 |
|
|
-- matched so far (including any start point moves).
|
343 |
|
|
|
344 |
|
|
-- Finally the operations * and ** may be used with values of type
|
345 |
|
|
-- Text_IO.File_Access. The effect is to do a Put_Line operation of
|
346 |
|
|
-- the matched substring. These are particularly useful in debugging
|
347 |
|
|
-- pattern matches.
|
348 |
|
|
|
349 |
|
|
-- Deferred Matching
|
350 |
|
|
-- =================
|
351 |
|
|
|
352 |
|
|
-- The pattern construction functions (such as Len and Any) all permit
|
353 |
|
|
-- the use of pointers to natural or string values, or functions that
|
354 |
|
|
-- return natural or string values. These forms cause the actual value
|
355 |
|
|
-- to be obtained at pattern matching time. This allows interesting
|
356 |
|
|
-- possibilities for constructing dynamic patterns as illustrated in
|
357 |
|
|
-- the examples section.
|
358 |
|
|
|
359 |
|
|
-- In addition the (+S) operator may be used where S is a pointer to
|
360 |
|
|
-- string or function returning string, with a similar deferred effect.
|
361 |
|
|
|
362 |
|
|
-- A special use of deferred matching is the construction of predicate
|
363 |
|
|
-- functions. The element (+P) where P is an access to a function that
|
364 |
|
|
-- returns a Boolean value, causes the function to be called at the
|
365 |
|
|
-- time the element is matched. If the function returns True, then the
|
366 |
|
|
-- null string is matched, if the function returns False, then failure
|
367 |
|
|
-- is signalled and previous alternatives are sought.
|
368 |
|
|
|
369 |
|
|
-- Deferred Replacement
|
370 |
|
|
-- ====================
|
371 |
|
|
|
372 |
|
|
-- The simple model given for pattern replacement (where the matched
|
373 |
|
|
-- substring is replaced by the string given as the third argument to
|
374 |
|
|
-- Match) works fine in simple cases, but this approach does not work
|
375 |
|
|
-- in the case where the expression used as the replacement string is
|
376 |
|
|
-- dependent on values set by the match.
|
377 |
|
|
|
378 |
|
|
-- For example, suppose we want to find an instance of a parenthesized
|
379 |
|
|
-- character, and replace the parentheses with square brackets. At first
|
380 |
|
|
-- glance it would seem that:
|
381 |
|
|
|
382 |
|
|
-- Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']');
|
383 |
|
|
|
384 |
|
|
-- would do the trick, but that does not work, because the third
|
385 |
|
|
-- argument to Match gets evaluated too early, before the call to
|
386 |
|
|
-- Match, and before the pattern match has had a chance to set Char.
|
387 |
|
|
|
388 |
|
|
-- To solve this problem we provide the deferred replacement capability.
|
389 |
|
|
-- With this approach, which of course is only needed if the pattern
|
390 |
|
|
-- involved has side effects, is to do the match in two stages. The
|
391 |
|
|
-- call to Match sets a pattern result in a variable of the private
|
392 |
|
|
-- type Match_Result, and then a subsequent Replace operation uses
|
393 |
|
|
-- this Match_Result object to perform the required replacement.
|
394 |
|
|
|
395 |
|
|
-- Using this approach, we can now write the above operation properly
|
396 |
|
|
-- in a manner that will work:
|
397 |
|
|
|
398 |
|
|
-- M : Match_Result;
|
399 |
|
|
-- ...
|
400 |
|
|
-- Match (Subject, '(' & Len (1) * Char & ')', M);
|
401 |
|
|
-- Replace (M, '[' & Char & ']');
|
402 |
|
|
|
403 |
|
|
-- As with other Match cases, there is a function and procedure form
|
404 |
|
|
-- of this match call. A call to Replace after a failed match has no
|
405 |
|
|
-- effect. Note that Subject should not be modified between the calls.
|
406 |
|
|
|
407 |
|
|
-- Examples of Pattern Matching
|
408 |
|
|
-- ============================
|
409 |
|
|
|
410 |
|
|
-- First a simple example of the use of pattern replacement to remove
|
411 |
|
|
-- a line number from the start of a string. We assume that the line
|
412 |
|
|
-- number has the form of a string of decimal digits followed by a
|
413 |
|
|
-- period, followed by one or more spaces.
|
414 |
|
|
|
415 |
|
|
-- Digs : constant Pattern := Span("0123456789");
|
416 |
|
|
|
417 |
|
|
-- Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' ');
|
418 |
|
|
|
419 |
|
|
-- Now to use this pattern we simply do a match with a replacement:
|
420 |
|
|
|
421 |
|
|
-- Match (Line, Lnum, "");
|
422 |
|
|
|
423 |
|
|
-- which replaces the line number by the null string. Note that it is
|
424 |
|
|
-- also possible to use an Ada.Strings.Maps.Character_Set value as an
|
425 |
|
|
-- argument to Span and similar functions, and in particular all the
|
426 |
|
|
-- useful constants 'in Ada.Strings.Maps.Constants are available. This
|
427 |
|
|
-- means that we could define Digs as:
|
428 |
|
|
|
429 |
|
|
-- Digs : constant Pattern := Span(Decimal_Digit_Set);
|
430 |
|
|
|
431 |
|
|
-- The style we use here, of defining constant patterns and then using
|
432 |
|
|
-- them is typical. It is possible to build up patterns dynamically,
|
433 |
|
|
-- but it is usually more efficient to build them in pieces in advance
|
434 |
|
|
-- using constant declarations. Note in particular that although it is
|
435 |
|
|
-- possible to construct a pattern directly as an argument for the
|
436 |
|
|
-- Match routine, it is much more efficient to preconstruct the pattern
|
437 |
|
|
-- as we did in this example.
|
438 |
|
|
|
439 |
|
|
-- Now let's look at the use of pattern assignment to break a
|
440 |
|
|
-- string into sections. Suppose that the input string has two
|
441 |
|
|
-- unsigned decimal integers, separated by spaces or a comma,
|
442 |
|
|
-- with spaces allowed anywhere. Then we can isolate the two
|
443 |
|
|
-- numbers with the following pattern:
|
444 |
|
|
|
445 |
|
|
-- Num1, Num2 : aliased VString;
|
446 |
|
|
|
447 |
|
|
-- B : constant Pattern := NSpan(' ');
|
448 |
|
|
|
449 |
|
|
-- N : constant Pattern := Span("0123456789");
|
450 |
|
|
|
451 |
|
|
-- T : constant Pattern :=
|
452 |
|
|
-- NSpan(' ') & N * Num1 & Span(" ,") & N * Num2;
|
453 |
|
|
|
454 |
|
|
-- The match operation Match (" 124, 257 ", T) would assign the
|
455 |
|
|
-- string 124 to Num1 and the string 257 to Num2.
|
456 |
|
|
|
457 |
|
|
-- Now let's see how more complex elements can be built from the
|
458 |
|
|
-- set of primitive elements. The following pattern matches strings
|
459 |
|
|
-- that have the syntax of Ada 95 based literals:
|
460 |
|
|
|
461 |
|
|
-- Digs : constant Pattern := Span(Decimal_Digit_Set);
|
462 |
|
|
-- UDigs : constant Pattern := Digs & Arbno('_' & Digs);
|
463 |
|
|
|
464 |
|
|
-- Edig : constant Pattern := Span(Hexadecimal_Digit_Set);
|
465 |
|
|
-- UEdig : constant Pattern := Edig & Arbno('_' & Edig);
|
466 |
|
|
|
467 |
|
|
-- Bnum : constant Pattern := Udigs & '#' & UEdig & '#';
|
468 |
|
|
|
469 |
|
|
-- A match against Bnum will now match the desired strings, e.g.
|
470 |
|
|
-- it will match 16#123_abc#, but not a#b#. However, this pattern
|
471 |
|
|
-- is not quite complete, since it does not allow colons to replace
|
472 |
|
|
-- the pound signs. The following is more complete:
|
473 |
|
|
|
474 |
|
|
-- Bchar : constant Pattern := Any("#:");
|
475 |
|
|
-- Bnum : constant Pattern := Udigs & Bchar & UEdig & Bchar;
|
476 |
|
|
|
477 |
|
|
-- but that is still not quite right, since it allows # and : to be
|
478 |
|
|
-- mixed, and they are supposed to be used consistently. We solve
|
479 |
|
|
-- this by using a deferred match.
|
480 |
|
|
|
481 |
|
|
-- Temp : aliased VString;
|
482 |
|
|
|
483 |
|
|
-- Bnum : constant Pattern :=
|
484 |
|
|
-- Udigs & Bchar * Temp & UEdig & (+Temp)
|
485 |
|
|
|
486 |
|
|
-- Here the first instance of the base character is stored in Temp, and
|
487 |
|
|
-- then later in the pattern we rematch the value that was assigned.
|
488 |
|
|
|
489 |
|
|
-- For an example of a recursive pattern, let's define a pattern
|
490 |
|
|
-- that is like the built in Bal, but the string matched is balanced
|
491 |
|
|
-- with respect to square brackets or curly brackets.
|
492 |
|
|
|
493 |
|
|
-- The language for such strings might be defined in extended BNF as
|
494 |
|
|
|
495 |
|
|
-- ELEMENT ::= <any character other than [] or {}>
|
496 |
|
|
-- | '[' BALANCED_STRING ']'
|
497 |
|
|
-- | '{' BALANCED_STRING '}'
|
498 |
|
|
|
499 |
|
|
-- BALANCED_STRING ::= ELEMENT {ELEMENT}
|
500 |
|
|
|
501 |
|
|
-- Here we use {} to indicate zero or more occurrences of a term, as
|
502 |
|
|
-- is common practice in extended BNF. Now we can translate the above
|
503 |
|
|
-- BNF into recursive patterns as follows:
|
504 |
|
|
|
505 |
|
|
-- Element, Balanced_String : aliased Pattern;
|
506 |
|
|
-- .
|
507 |
|
|
-- .
|
508 |
|
|
-- .
|
509 |
|
|
-- Element := NotAny ("[]{}")
|
510 |
|
|
-- or
|
511 |
|
|
-- ('[' & (+Balanced_String) & ']')
|
512 |
|
|
-- or
|
513 |
|
|
-- ('{' & (+Balanced_String) & '}');
|
514 |
|
|
|
515 |
|
|
-- Balanced_String := Element & Arbno (Element);
|
516 |
|
|
|
517 |
|
|
-- Note the important use of + here to refer to a pattern not yet
|
518 |
|
|
-- defined. Note also that we use assignments precisely because we
|
519 |
|
|
-- cannot refer to as yet undeclared variables in initializations.
|
520 |
|
|
|
521 |
|
|
-- Now that this pattern is constructed, we can use it as though it
|
522 |
|
|
-- were a new primitive pattern element, and for example, the match:
|
523 |
|
|
|
524 |
|
|
-- Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail);
|
525 |
|
|
|
526 |
|
|
-- will generate the output:
|
527 |
|
|
|
528 |
|
|
-- x
|
529 |
|
|
-- xy
|
530 |
|
|
-- xy[ab{cd}]
|
531 |
|
|
-- y
|
532 |
|
|
-- y[ab{cd}]
|
533 |
|
|
-- [ab{cd}]
|
534 |
|
|
-- a
|
535 |
|
|
-- ab
|
536 |
|
|
-- ab{cd}
|
537 |
|
|
-- b
|
538 |
|
|
-- b{cd}
|
539 |
|
|
-- {cd}
|
540 |
|
|
-- c
|
541 |
|
|
-- cd
|
542 |
|
|
-- d
|
543 |
|
|
|
544 |
|
|
-- Note that the function of the fail here is simply to force the
|
545 |
|
|
-- pattern Balanced_String to match all possible alternatives. Studying
|
546 |
|
|
-- the operation of this pattern in detail is highly instructive.
|
547 |
|
|
|
548 |
|
|
-- Finally we give a rather elaborate example of the use of deferred
|
549 |
|
|
-- matching. The following declarations build up a pattern which will
|
550 |
|
|
-- find the longest string of decimal digits in the subject string.
|
551 |
|
|
|
552 |
|
|
-- Max, Cur : VString;
|
553 |
|
|
-- Loc : Natural;
|
554 |
|
|
|
555 |
|
|
-- function GtS return Boolean is
|
556 |
|
|
-- begin
|
557 |
|
|
-- return Length (Cur) > Length (Max);
|
558 |
|
|
-- end GtS;
|
559 |
|
|
|
560 |
|
|
-- Digit : constant Character_Set := Decimal_Digit_Set;
|
561 |
|
|
|
562 |
|
|
-- Digs : constant Pattern := Span(Digit);
|
563 |
|
|
|
564 |
|
|
-- Find : constant Pattern :=
|
565 |
|
|
-- "" * Max & Fence & -- initialize Max to null
|
566 |
|
|
-- BreakX (Digit) & -- scan looking for digits
|
567 |
|
|
-- ((Span(Digit) * Cur & -- assign next string to Cur
|
568 |
|
|
-- (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max)
|
569 |
|
|
-- Setcur(Loc'Access)) -- if so, save location
|
570 |
|
|
-- * Max) & -- and assign to Max
|
571 |
|
|
-- Fail; -- seek all alternatives
|
572 |
|
|
|
573 |
|
|
-- As we see from the comments here, complex patterns like this take
|
574 |
|
|
-- on aspects of sequential programs. In fact they are sequential
|
575 |
|
|
-- programs with general backtracking. In this pattern, we first use
|
576 |
|
|
-- a pattern assignment that matches null and assigns it to Max, so
|
577 |
|
|
-- that it is initialized for the new match. Now BreakX scans to the
|
578 |
|
|
-- next digit. Arb would do here, but BreakX will be more efficient.
|
579 |
|
|
-- Once we have found a digit, we scan out the longest string of
|
580 |
|
|
-- digits with Span, and assign it to Cur. The deferred call to GtS
|
581 |
|
|
-- tests if the string we assigned to Cur is the longest so far. If
|
582 |
|
|
-- not, then failure is signalled, and we seek alternatives (this
|
583 |
|
|
-- means that BreakX will extend and look for the next digit string).
|
584 |
|
|
-- If the call to GtS succeeds then the matched string is assigned
|
585 |
|
|
-- as the largest string so far into Max and its location is saved
|
586 |
|
|
-- in Loc. Finally Fail forces the match to fail and seek alternatives,
|
587 |
|
|
-- so that the entire string is searched.
|
588 |
|
|
|
589 |
|
|
-- If the pattern Find is matched against a string, the variable Max
|
590 |
|
|
-- at the end of the pattern will have the longest string of digits,
|
591 |
|
|
-- and Loc will be the starting character location of the string. For
|
592 |
|
|
-- example, Match("ab123cd4657ef23", Find) will assign "4657" to Max
|
593 |
|
|
-- and 11 to Loc (indicating that the string ends with the eleventh
|
594 |
|
|
-- character of the string).
|
595 |
|
|
|
596 |
|
|
-- Note: the use of Unrestricted_Access to reference GtS will not
|
597 |
|
|
-- be needed if GtS is defined at the outer level, but definitely
|
598 |
|
|
-- will be necessary if GtS is a nested function (in which case of
|
599 |
|
|
-- course the scope of the pattern Find will be restricted to this
|
600 |
|
|
-- nested scope, and this cannot be checked, i.e. use of the pattern
|
601 |
|
|
-- outside this scope is erroneous). Generally it is a good idea to
|
602 |
|
|
-- define patterns and the functions they call at the outer level
|
603 |
|
|
-- where possible, to avoid such problems.
|
604 |
|
|
|
605 |
|
|
-- Correspondence with Pattern Matching in SPITBOL
|
606 |
|
|
-- ===============================================
|
607 |
|
|
|
608 |
|
|
-- Generally the Ada syntax and names correspond closely to SPITBOL
|
609 |
|
|
-- syntax for pattern matching construction.
|
610 |
|
|
|
611 |
|
|
-- The basic pattern construction operators are renamed as follows:
|
612 |
|
|
|
613 |
|
|
-- Spitbol Ada
|
614 |
|
|
|
615 |
|
|
-- (space) &
|
616 |
|
|
-- | or
|
617 |
|
|
-- $ *
|
618 |
|
|
-- . **
|
619 |
|
|
|
620 |
|
|
-- The Ada operators were chosen so that the relative precedences of
|
621 |
|
|
-- these operators corresponds to that of the Spitbol operators, but
|
622 |
|
|
-- as always, the use of parentheses is advisable to clarify.
|
623 |
|
|
|
624 |
|
|
-- The pattern construction operators all have similar names except for
|
625 |
|
|
|
626 |
|
|
-- Spitbol Ada
|
627 |
|
|
|
628 |
|
|
-- Abort Cancel
|
629 |
|
|
-- Rem Rest
|
630 |
|
|
|
631 |
|
|
-- where we have clashes with Ada reserved names
|
632 |
|
|
|
633 |
|
|
-- Ada requires the use of 'Access to refer to functions used in the
|
634 |
|
|
-- pattern match, and often the use of 'Unrestricted_Access may be
|
635 |
|
|
-- necessary to get around the scope restrictions if the functions
|
636 |
|
|
-- are not declared at the outer level.
|
637 |
|
|
|
638 |
|
|
-- The actual pattern matching syntax is modified in Ada as follows:
|
639 |
|
|
|
640 |
|
|
-- Spitbol Ada
|
641 |
|
|
|
642 |
|
|
-- X Y Match (X, Y);
|
643 |
|
|
-- X Y = Z Match (X, Y, Z);
|
644 |
|
|
|
645 |
|
|
-- and pattern failure is indicated by returning a Boolean result from
|
646 |
|
|
-- the Match function (True for success, False for failure).
|
647 |
|
|
|
648 |
|
|
-----------------------
|
649 |
|
|
-- Type Declarations --
|
650 |
|
|
-----------------------
|
651 |
|
|
|
652 |
|
|
type Pattern is private;
|
653 |
|
|
-- Type representing a pattern. This package provides a complete set of
|
654 |
|
|
-- operations for constructing patterns that can be used in the pattern
|
655 |
|
|
-- matching operations provided.
|
656 |
|
|
|
657 |
|
|
type Boolean_Func is access function return Boolean;
|
658 |
|
|
-- General Boolean function type. When this type is used as a formal
|
659 |
|
|
-- parameter type in this package, it indicates a deferred predicate
|
660 |
|
|
-- pattern. The function will be called when the pattern element is
|
661 |
|
|
-- matched and failure signalled if False is returned.
|
662 |
|
|
|
663 |
|
|
type Natural_Func is access function return Natural;
|
664 |
|
|
-- General Natural function type. When this type is used as a formal
|
665 |
|
|
-- parameter type in this package, it indicates a deferred pattern.
|
666 |
|
|
-- The function will be called when the pattern element is matched
|
667 |
|
|
-- to obtain the currently referenced Natural value.
|
668 |
|
|
|
669 |
|
|
type VString_Func is access function return VString;
|
670 |
|
|
-- General VString function type. When this type is used as a formal
|
671 |
|
|
-- parameter type in this package, it indicates a deferred pattern.
|
672 |
|
|
-- The function will be called when the pattern element is matched
|
673 |
|
|
-- to obtain the currently referenced string value.
|
674 |
|
|
|
675 |
|
|
subtype PString is String;
|
676 |
|
|
-- This subtype is used in the remainder of the package to indicate a
|
677 |
|
|
-- formal parameter that is converted to its corresponding pattern,
|
678 |
|
|
-- i.e. a pattern that matches the characters of the string.
|
679 |
|
|
|
680 |
|
|
subtype PChar is Character;
|
681 |
|
|
-- Similarly, this subtype is used in the remainder of the package to
|
682 |
|
|
-- indicate a formal parameter that is converted to its corresponding
|
683 |
|
|
-- pattern, i.e. a pattern that matches this one character.
|
684 |
|
|
|
685 |
|
|
subtype VString_Var is VString;
|
686 |
|
|
subtype Pattern_Var is Pattern;
|
687 |
|
|
-- These synonyms are used as formal parameter types to a function where,
|
688 |
|
|
-- if the language allowed, we would use in out parameters, but we are
|
689 |
|
|
-- not allowed to have in out parameters for functions. Instead we pass
|
690 |
|
|
-- actuals which must be variables, and with a bit of trickery in the
|
691 |
|
|
-- body, manage to interpret them properly as though they were indeed
|
692 |
|
|
-- in out parameters.
|
693 |
|
|
|
694 |
|
|
pragma Warnings (Off, VString_Var);
|
695 |
|
|
pragma Warnings (Off, Pattern_Var);
|
696 |
|
|
-- We turn off warnings for these two types so that when variables are used
|
697 |
|
|
-- as arguments in this context, warnings about them not being assigned in
|
698 |
|
|
-- the source program will be suppressed.
|
699 |
|
|
|
700 |
|
|
--------------------------------
|
701 |
|
|
-- Basic Pattern Construction --
|
702 |
|
|
--------------------------------
|
703 |
|
|
|
704 |
|
|
function "&" (L : Pattern; R : Pattern) return Pattern;
|
705 |
|
|
function "&" (L : PString; R : Pattern) return Pattern;
|
706 |
|
|
function "&" (L : Pattern; R : PString) return Pattern;
|
707 |
|
|
function "&" (L : PChar; R : Pattern) return Pattern;
|
708 |
|
|
function "&" (L : Pattern; R : PChar) return Pattern;
|
709 |
|
|
|
710 |
|
|
-- Pattern concatenation. Matches L followed by R
|
711 |
|
|
|
712 |
|
|
function "or" (L : Pattern; R : Pattern) return Pattern;
|
713 |
|
|
function "or" (L : PString; R : Pattern) return Pattern;
|
714 |
|
|
function "or" (L : Pattern; R : PString) return Pattern;
|
715 |
|
|
function "or" (L : PString; R : PString) return Pattern;
|
716 |
|
|
function "or" (L : PChar; R : Pattern) return Pattern;
|
717 |
|
|
function "or" (L : Pattern; R : PChar) return Pattern;
|
718 |
|
|
function "or" (L : PChar; R : PChar) return Pattern;
|
719 |
|
|
function "or" (L : PString; R : PChar) return Pattern;
|
720 |
|
|
function "or" (L : PChar; R : PString) return Pattern;
|
721 |
|
|
-- Pattern alternation. Creates a pattern that will first try to match
|
722 |
|
|
-- L and then on a subsequent failure, attempts to match R instead.
|
723 |
|
|
|
724 |
|
|
----------------------------------
|
725 |
|
|
-- Pattern Assignment Functions --
|
726 |
|
|
----------------------------------
|
727 |
|
|
|
728 |
|
|
function "*" (P : Pattern; Var : VString_Var) return Pattern;
|
729 |
|
|
function "*" (P : PString; Var : VString_Var) return Pattern;
|
730 |
|
|
function "*" (P : PChar; Var : VString_Var) return Pattern;
|
731 |
|
|
-- Matches P, and if the match succeeds, assigns the matched substring
|
732 |
|
|
-- to the given VString variable S. This assignment happens as soon as
|
733 |
|
|
-- the substring is matched, and if the pattern P1 is matched more than
|
734 |
|
|
-- once during the course of the match, then the assignment will occur
|
735 |
|
|
-- more than once.
|
736 |
|
|
|
737 |
|
|
function "**" (P : Pattern; Var : VString_Var) return Pattern;
|
738 |
|
|
function "**" (P : PString; Var : VString_Var) return Pattern;
|
739 |
|
|
function "**" (P : PChar; Var : VString_Var) return Pattern;
|
740 |
|
|
-- Like "*" above, except that the assignment happens at most once
|
741 |
|
|
-- after the entire match is completed successfully. If the match
|
742 |
|
|
-- fails, then no assignment takes place.
|
743 |
|
|
|
744 |
|
|
----------------------------------
|
745 |
|
|
-- Deferred Matching Operations --
|
746 |
|
|
----------------------------------
|
747 |
|
|
|
748 |
|
|
function "+" (Str : VString_Var) return Pattern;
|
749 |
|
|
-- Here Str must be a VString variable. This function constructs a
|
750 |
|
|
-- pattern which at pattern matching time will access the current
|
751 |
|
|
-- value of this variable, and match against these characters.
|
752 |
|
|
|
753 |
|
|
function "+" (Str : VString_Func) return Pattern;
|
754 |
|
|
-- Constructs a pattern which at pattern matching time calls the given
|
755 |
|
|
-- function, and then matches against the string or character value
|
756 |
|
|
-- that is returned by the call.
|
757 |
|
|
|
758 |
|
|
function "+" (P : Pattern_Var) return Pattern;
|
759 |
|
|
-- Here P must be a Pattern variable. This function constructs a
|
760 |
|
|
-- pattern which at pattern matching time will access the current
|
761 |
|
|
-- value of this variable, and match against the pattern value.
|
762 |
|
|
|
763 |
|
|
function "+" (P : Boolean_Func) return Pattern;
|
764 |
|
|
-- Constructs a predicate pattern function that at pattern matching time
|
765 |
|
|
-- calls the given function. If True is returned, then the pattern matches.
|
766 |
|
|
-- If False is returned, then failure is signalled.
|
767 |
|
|
|
768 |
|
|
--------------------------------
|
769 |
|
|
-- Pattern Building Functions --
|
770 |
|
|
--------------------------------
|
771 |
|
|
|
772 |
|
|
function Arb return Pattern;
|
773 |
|
|
-- Constructs a pattern that will match any string. On the first attempt,
|
774 |
|
|
-- the pattern matches a null string, then on each successive failure, it
|
775 |
|
|
-- matches one more character, and only fails if matching the entire rest
|
776 |
|
|
-- of the string.
|
777 |
|
|
|
778 |
|
|
function Arbno (P : Pattern) return Pattern;
|
779 |
|
|
function Arbno (P : PString) return Pattern;
|
780 |
|
|
function Arbno (P : PChar) return Pattern;
|
781 |
|
|
-- Pattern repetition. First matches null, then on a subsequent failure
|
782 |
|
|
-- attempts to match an additional instance of the given pattern.
|
783 |
|
|
-- Equivalent to (but more efficient than) P & ("" or (P & ("" or ...
|
784 |
|
|
|
785 |
|
|
function Any (Str : String) return Pattern;
|
786 |
|
|
function Any (Str : VString) return Pattern;
|
787 |
|
|
function Any (Str : Character) return Pattern;
|
788 |
|
|
function Any (Str : Character_Set) return Pattern;
|
789 |
|
|
function Any (Str : not null access VString) return Pattern;
|
790 |
|
|
function Any (Str : VString_Func) return Pattern;
|
791 |
|
|
-- Constructs a pattern that matches a single character that is one of
|
792 |
|
|
-- the characters in the given argument. The pattern fails if the current
|
793 |
|
|
-- character is not in Str.
|
794 |
|
|
|
795 |
|
|
function Bal return Pattern;
|
796 |
|
|
-- Constructs a pattern that will match any non-empty string that is
|
797 |
|
|
-- parentheses balanced with respect to the normal parentheses characters.
|
798 |
|
|
-- Attempts to extend the string if a subsequent failure occurs.
|
799 |
|
|
|
800 |
|
|
function Break (Str : String) return Pattern;
|
801 |
|
|
function Break (Str : VString) return Pattern;
|
802 |
|
|
function Break (Str : Character) return Pattern;
|
803 |
|
|
function Break (Str : Character_Set) return Pattern;
|
804 |
|
|
function Break (Str : not null access VString) return Pattern;
|
805 |
|
|
function Break (Str : VString_Func) return Pattern;
|
806 |
|
|
-- Constructs a pattern that matches a (possibly null) string which
|
807 |
|
|
-- is immediately followed by a character in the given argument. This
|
808 |
|
|
-- character is not part of the matched string. The pattern fails if
|
809 |
|
|
-- the remaining characters to be matched do not include any of the
|
810 |
|
|
-- characters in Str.
|
811 |
|
|
|
812 |
|
|
function BreakX (Str : String) return Pattern;
|
813 |
|
|
function BreakX (Str : VString) return Pattern;
|
814 |
|
|
function BreakX (Str : Character) return Pattern;
|
815 |
|
|
function BreakX (Str : Character_Set) return Pattern;
|
816 |
|
|
function BreakX (Str : not null access VString) return Pattern;
|
817 |
|
|
function BreakX (Str : VString_Func) return Pattern;
|
818 |
|
|
-- Like Break, but the pattern attempts to extend on a failure to find
|
819 |
|
|
-- the next occurrence of a character in Str, and only fails when the
|
820 |
|
|
-- last such instance causes a failure.
|
821 |
|
|
|
822 |
|
|
function Cancel return Pattern;
|
823 |
|
|
-- Constructs a pattern that immediately aborts the entire match
|
824 |
|
|
|
825 |
|
|
function Fail return Pattern;
|
826 |
|
|
-- Constructs a pattern that always fails
|
827 |
|
|
|
828 |
|
|
function Fence return Pattern;
|
829 |
|
|
-- Constructs a pattern that matches null on the first attempt, and then
|
830 |
|
|
-- causes the entire match to be aborted if a subsequent failure occurs.
|
831 |
|
|
|
832 |
|
|
function Fence (P : Pattern) return Pattern;
|
833 |
|
|
-- Constructs a pattern that first matches P. If P fails, then the
|
834 |
|
|
-- constructed pattern fails. If P succeeds, then the match proceeds,
|
835 |
|
|
-- but if subsequent failure occurs, alternatives in P are not sought.
|
836 |
|
|
-- The idea of Fence is that each time the pattern is matched, just
|
837 |
|
|
-- one attempt is made to match P, without trying alternatives.
|
838 |
|
|
|
839 |
|
|
function Len (Count : Natural) return Pattern;
|
840 |
|
|
function Len (Count : not null access Natural) return Pattern;
|
841 |
|
|
function Len (Count : Natural_Func) return Pattern;
|
842 |
|
|
-- Constructs a pattern that matches exactly the given number of
|
843 |
|
|
-- characters. The pattern fails if fewer than this number of characters
|
844 |
|
|
-- remain to be matched in the string.
|
845 |
|
|
|
846 |
|
|
function NotAny (Str : String) return Pattern;
|
847 |
|
|
function NotAny (Str : VString) return Pattern;
|
848 |
|
|
function NotAny (Str : Character) return Pattern;
|
849 |
|
|
function NotAny (Str : Character_Set) return Pattern;
|
850 |
|
|
function NotAny (Str : not null access VString) return Pattern;
|
851 |
|
|
function NotAny (Str : VString_Func) return Pattern;
|
852 |
|
|
-- Constructs a pattern that matches a single character that is not
|
853 |
|
|
-- one of the characters in the given argument. The pattern Fails if
|
854 |
|
|
-- the current character is in Str.
|
855 |
|
|
|
856 |
|
|
function NSpan (Str : String) return Pattern;
|
857 |
|
|
function NSpan (Str : VString) return Pattern;
|
858 |
|
|
function NSpan (Str : Character) return Pattern;
|
859 |
|
|
function NSpan (Str : Character_Set) return Pattern;
|
860 |
|
|
function NSpan (Str : not null access VString) return Pattern;
|
861 |
|
|
function NSpan (Str : VString_Func) return Pattern;
|
862 |
|
|
-- Constructs a pattern that matches the longest possible string
|
863 |
|
|
-- consisting entirely of characters from the given argument. The
|
864 |
|
|
-- string may be empty, so this pattern always succeeds.
|
865 |
|
|
|
866 |
|
|
function Pos (Count : Natural) return Pattern;
|
867 |
|
|
function Pos (Count : not null access Natural) return Pattern;
|
868 |
|
|
function Pos (Count : Natural_Func) return Pattern;
|
869 |
|
|
-- Constructs a pattern that matches the null string if exactly Count
|
870 |
|
|
-- characters have already been matched, and otherwise fails.
|
871 |
|
|
|
872 |
|
|
function Rest return Pattern;
|
873 |
|
|
-- Constructs a pattern that always succeeds, matching the remaining
|
874 |
|
|
-- unmatched characters in the pattern.
|
875 |
|
|
|
876 |
|
|
function Rpos (Count : Natural) return Pattern;
|
877 |
|
|
function Rpos (Count : not null access Natural) return Pattern;
|
878 |
|
|
function Rpos (Count : Natural_Func) return Pattern;
|
879 |
|
|
-- Constructs a pattern that matches the null string if exactly Count
|
880 |
|
|
-- characters remain to be matched in the string, and otherwise fails.
|
881 |
|
|
|
882 |
|
|
function Rtab (Count : Natural) return Pattern;
|
883 |
|
|
function Rtab (Count : not null access Natural) return Pattern;
|
884 |
|
|
function Rtab (Count : Natural_Func) return Pattern;
|
885 |
|
|
-- Constructs a pattern that matches from the current location until
|
886 |
|
|
-- exactly Count characters remain to be matched in the string. The
|
887 |
|
|
-- pattern fails if fewer than Count characters remain to be matched.
|
888 |
|
|
|
889 |
|
|
function Setcur (Var : not null access Natural) return Pattern;
|
890 |
|
|
-- Constructs a pattern that matches the null string, and assigns the
|
891 |
|
|
-- current cursor position in the string. This value is the number of
|
892 |
|
|
-- characters matched so far. So it is zero at the start of the match.
|
893 |
|
|
|
894 |
|
|
function Span (Str : String) return Pattern;
|
895 |
|
|
function Span (Str : VString) return Pattern;
|
896 |
|
|
function Span (Str : Character) return Pattern;
|
897 |
|
|
function Span (Str : Character_Set) return Pattern;
|
898 |
|
|
function Span (Str : not null access VString) return Pattern;
|
899 |
|
|
function Span (Str : VString_Func) return Pattern;
|
900 |
|
|
-- Constructs a pattern that matches the longest possible string
|
901 |
|
|
-- consisting entirely of characters from the given argument. The
|
902 |
|
|
-- string cannot be empty , so the pattern fails if the current
|
903 |
|
|
-- character is not one of the characters in Str.
|
904 |
|
|
|
905 |
|
|
function Succeed return Pattern;
|
906 |
|
|
-- Constructs a pattern that succeeds matching null, both on the first
|
907 |
|
|
-- attempt, and on any rematch attempt, i.e. it is equivalent to an
|
908 |
|
|
-- infinite alternation of null strings.
|
909 |
|
|
|
910 |
|
|
function Tab (Count : Natural) return Pattern;
|
911 |
|
|
function Tab (Count : not null access Natural) return Pattern;
|
912 |
|
|
function Tab (Count : Natural_Func) return Pattern;
|
913 |
|
|
-- Constructs a pattern that from the current location until Count
|
914 |
|
|
-- characters have been matched. The pattern fails if more than Count
|
915 |
|
|
-- characters have already been matched.
|
916 |
|
|
|
917 |
|
|
---------------------------------
|
918 |
|
|
-- Pattern Matching Operations --
|
919 |
|
|
---------------------------------
|
920 |
|
|
|
921 |
|
|
-- The Match function performs an actual pattern matching operation.
|
922 |
|
|
-- The versions with three parameters perform a match without modifying
|
923 |
|
|
-- the subject string and return a Boolean result indicating if the
|
924 |
|
|
-- match is successful or not. The Anchor parameter is set to True to
|
925 |
|
|
-- obtain an anchored match in which the pattern is required to match
|
926 |
|
|
-- the first character of the string. In an unanchored match, which is
|
927 |
|
|
|
928 |
|
|
-- the default, successive attempts are made to match the given pattern
|
929 |
|
|
-- at each character of the subject string until a match succeeds, or
|
930 |
|
|
-- until all possibilities have failed.
|
931 |
|
|
|
932 |
|
|
-- Note that pattern assignment functions in the pattern may generate
|
933 |
|
|
-- side effects, so these functions are not necessarily pure.
|
934 |
|
|
|
935 |
|
|
Anchored_Mode : Boolean := False;
|
936 |
|
|
-- This global variable can be set True to cause all subsequent pattern
|
937 |
|
|
-- matches to operate in anchored mode. In anchored mode, no attempt is
|
938 |
|
|
-- made to move the anchor point, so that if the match succeeds it must
|
939 |
|
|
-- succeed starting at the first character. Note that the effect of
|
940 |
|
|
-- anchored mode may be achieved in individual pattern matches by using
|
941 |
|
|
-- Fence or Pos(0) at the start of the pattern.
|
942 |
|
|
|
943 |
|
|
Pattern_Stack_Overflow : exception;
|
944 |
|
|
-- Exception raised if internal pattern matching stack overflows. This
|
945 |
|
|
-- is typically the result of runaway pattern recursion. If there is a
|
946 |
|
|
-- genuine case of stack overflow, then either the match must be broken
|
947 |
|
|
-- down into simpler steps, or the stack limit must be reset.
|
948 |
|
|
|
949 |
|
|
Stack_Size : constant Positive := 2000;
|
950 |
|
|
-- Size used for internal pattern matching stack. Increase this size if
|
951 |
|
|
-- complex patterns cause Pattern_Stack_Overflow to be raised.
|
952 |
|
|
|
953 |
|
|
-- Simple match functions. The subject is matched against the pattern.
|
954 |
|
|
-- Any immediate or deferred assignments or writes are executed, and
|
955 |
|
|
-- the returned value indicates whether or not the match succeeded.
|
956 |
|
|
|
957 |
|
|
function Match
|
958 |
|
|
(Subject : VString;
|
959 |
|
|
Pat : Pattern) return Boolean;
|
960 |
|
|
|
961 |
|
|
function Match
|
962 |
|
|
(Subject : VString;
|
963 |
|
|
Pat : PString) return Boolean;
|
964 |
|
|
|
965 |
|
|
function Match
|
966 |
|
|
(Subject : String;
|
967 |
|
|
Pat : Pattern) return Boolean;
|
968 |
|
|
|
969 |
|
|
function Match
|
970 |
|
|
(Subject : String;
|
971 |
|
|
Pat : PString) return Boolean;
|
972 |
|
|
|
973 |
|
|
-- Replacement functions. The subject is matched against the pattern.
|
974 |
|
|
-- Any immediate or deferred assignments or writes are executed, and
|
975 |
|
|
-- the returned value indicates whether or not the match succeeded.
|
976 |
|
|
-- If the match succeeds, then the matched part of the subject string
|
977 |
|
|
-- is replaced by the given Replace string.
|
978 |
|
|
|
979 |
|
|
function Match
|
980 |
|
|
(Subject : VString_Var;
|
981 |
|
|
Pat : Pattern;
|
982 |
|
|
Replace : VString) return Boolean;
|
983 |
|
|
|
984 |
|
|
function Match
|
985 |
|
|
(Subject : VString_Var;
|
986 |
|
|
Pat : PString;
|
987 |
|
|
Replace : VString) return Boolean;
|
988 |
|
|
|
989 |
|
|
function Match
|
990 |
|
|
(Subject : VString_Var;
|
991 |
|
|
Pat : Pattern;
|
992 |
|
|
Replace : String) return Boolean;
|
993 |
|
|
|
994 |
|
|
function Match
|
995 |
|
|
(Subject : VString_Var;
|
996 |
|
|
Pat : PString;
|
997 |
|
|
Replace : String) return Boolean;
|
998 |
|
|
|
999 |
|
|
-- Simple match procedures. The subject is matched against the pattern.
|
1000 |
|
|
-- Any immediate or deferred assignments or writes are executed. No
|
1001 |
|
|
-- indication of success or failure is returned.
|
1002 |
|
|
|
1003 |
|
|
procedure Match
|
1004 |
|
|
(Subject : VString;
|
1005 |
|
|
Pat : Pattern);
|
1006 |
|
|
|
1007 |
|
|
procedure Match
|
1008 |
|
|
(Subject : VString;
|
1009 |
|
|
Pat : PString);
|
1010 |
|
|
|
1011 |
|
|
procedure Match
|
1012 |
|
|
(Subject : String;
|
1013 |
|
|
Pat : Pattern);
|
1014 |
|
|
|
1015 |
|
|
procedure Match
|
1016 |
|
|
(Subject : String;
|
1017 |
|
|
Pat : PString);
|
1018 |
|
|
|
1019 |
|
|
-- Replacement procedures. The subject is matched against the pattern.
|
1020 |
|
|
-- Any immediate or deferred assignments or writes are executed. No
|
1021 |
|
|
-- indication of success or failure is returned. If the match succeeds,
|
1022 |
|
|
-- then the matched part of the subject string is replaced by the given
|
1023 |
|
|
-- Replace string.
|
1024 |
|
|
|
1025 |
|
|
procedure Match
|
1026 |
|
|
(Subject : in out VString;
|
1027 |
|
|
Pat : Pattern;
|
1028 |
|
|
Replace : VString);
|
1029 |
|
|
|
1030 |
|
|
procedure Match
|
1031 |
|
|
(Subject : in out VString;
|
1032 |
|
|
Pat : PString;
|
1033 |
|
|
Replace : VString);
|
1034 |
|
|
|
1035 |
|
|
procedure Match
|
1036 |
|
|
(Subject : in out VString;
|
1037 |
|
|
Pat : Pattern;
|
1038 |
|
|
Replace : String);
|
1039 |
|
|
|
1040 |
|
|
procedure Match
|
1041 |
|
|
(Subject : in out VString;
|
1042 |
|
|
Pat : PString;
|
1043 |
|
|
Replace : String);
|
1044 |
|
|
|
1045 |
|
|
-- Deferred Replacement
|
1046 |
|
|
|
1047 |
|
|
type Match_Result is private;
|
1048 |
|
|
-- Type used to record result of pattern match
|
1049 |
|
|
|
1050 |
|
|
subtype Match_Result_Var is Match_Result;
|
1051 |
|
|
-- This synonyms is used as a formal parameter type to a function where,
|
1052 |
|
|
-- if the language allowed, we would use an in out parameter, but we are
|
1053 |
|
|
-- not allowed to have in out parameters for functions. Instead we pass
|
1054 |
|
|
-- actuals which must be variables, and with a bit of trickery in the
|
1055 |
|
|
-- body, manage to interpret them properly as though they were indeed
|
1056 |
|
|
-- in out parameters.
|
1057 |
|
|
|
1058 |
|
|
function Match
|
1059 |
|
|
(Subject : VString_Var;
|
1060 |
|
|
Pat : Pattern;
|
1061 |
|
|
Result : Match_Result_Var) return Boolean;
|
1062 |
|
|
|
1063 |
|
|
procedure Match
|
1064 |
|
|
(Subject : in out VString;
|
1065 |
|
|
Pat : Pattern;
|
1066 |
|
|
Result : out Match_Result);
|
1067 |
|
|
|
1068 |
|
|
procedure Replace
|
1069 |
|
|
(Result : in out Match_Result;
|
1070 |
|
|
Replace : VString);
|
1071 |
|
|
-- Given a previous call to Match which set Result, performs a pattern
|
1072 |
|
|
-- replacement if the match was successful. Has no effect if the match
|
1073 |
|
|
-- failed. This call should immediately follow the Match call.
|
1074 |
|
|
|
1075 |
|
|
------------------------
|
1076 |
|
|
-- Debugging Routines --
|
1077 |
|
|
------------------------
|
1078 |
|
|
|
1079 |
|
|
-- Debugging pattern matching operations can often be quite complex,
|
1080 |
|
|
-- since there is no obvious way to trace the progress of the match.
|
1081 |
|
|
-- The declarations in this section provide some debugging assistance.
|
1082 |
|
|
|
1083 |
|
|
Debug_Mode : Boolean := False;
|
1084 |
|
|
-- This global variable can be set True to generate debugging on all
|
1085 |
|
|
-- subsequent calls to Match. The debugging output is a full trace of
|
1086 |
|
|
-- the actions of the pattern matcher, written to Standard_Output. The
|
1087 |
|
|
-- level of this information is intended to be comprehensible at the
|
1088 |
|
|
-- abstract level of this package declaration. However, note that the
|
1089 |
|
|
-- use of this switch often generates large amounts of output.
|
1090 |
|
|
|
1091 |
|
|
function "*" (P : Pattern; Fil : File_Access) return Pattern;
|
1092 |
|
|
function "*" (P : PString; Fil : File_Access) return Pattern;
|
1093 |
|
|
function "*" (P : PChar; Fil : File_Access) return Pattern;
|
1094 |
|
|
function "**" (P : Pattern; Fil : File_Access) return Pattern;
|
1095 |
|
|
function "**" (P : PString; Fil : File_Access) return Pattern;
|
1096 |
|
|
function "**" (P : PChar; Fil : File_Access) return Pattern;
|
1097 |
|
|
-- These are similar to the corresponding pattern assignment operations
|
1098 |
|
|
-- except that instead of setting the value of a variable, the matched
|
1099 |
|
|
-- substring is written to the appropriate file. This can be useful in
|
1100 |
|
|
-- following the progress of a match without generating the full amount
|
1101 |
|
|
-- of information obtained by setting Debug_Mode to True.
|
1102 |
|
|
|
1103 |
|
|
Terminal : constant File_Access := Standard_Error;
|
1104 |
|
|
Output : constant File_Access := Standard_Output;
|
1105 |
|
|
-- Two handy synonyms for use with the above pattern write operations
|
1106 |
|
|
|
1107 |
|
|
-- Finally we have some routines that are useful for determining what
|
1108 |
|
|
-- patterns are in use, particularly if they are constructed dynamically.
|
1109 |
|
|
|
1110 |
|
|
function Image (P : Pattern) return String;
|
1111 |
|
|
function Image (P : Pattern) return VString;
|
1112 |
|
|
-- This procedures yield strings that corresponds to the syntax needed
|
1113 |
|
|
-- to create the given pattern using the functions in this package. The
|
1114 |
|
|
-- form of this string is such that it could actually be compiled and
|
1115 |
|
|
-- evaluated to yield the required pattern except for references to
|
1116 |
|
|
-- variables and functions, which are output using one of the following
|
1117 |
|
|
-- forms:
|
1118 |
|
|
--
|
1119 |
|
|
-- access Natural NP(16#...#)
|
1120 |
|
|
-- access Pattern PP(16#...#)
|
1121 |
|
|
-- access VString VP(16#...#)
|
1122 |
|
|
--
|
1123 |
|
|
-- Natural_Func NF(16#...#)
|
1124 |
|
|
-- VString_Func VF(16#...#)
|
1125 |
|
|
--
|
1126 |
|
|
-- where 16#...# is the hex representation of the integer address that
|
1127 |
|
|
-- corresponds to the given access value
|
1128 |
|
|
|
1129 |
|
|
procedure Dump (P : Pattern);
|
1130 |
|
|
-- This procedure writes information about the pattern to Standard_Out.
|
1131 |
|
|
-- The format of this information is keyed to the internal data structures
|
1132 |
|
|
-- used to implement patterns. The information provided by Dump is thus
|
1133 |
|
|
-- more precise than that yielded by Image, but is also a bit more obscure
|
1134 |
|
|
-- (i.e. it cannot be interpreted solely in terms of this spec, you have
|
1135 |
|
|
-- to know something about the data structures).
|
1136 |
|
|
|
1137 |
|
|
------------------
|
1138 |
|
|
-- Private Part --
|
1139 |
|
|
------------------
|
1140 |
|
|
|
1141 |
|
|
private
|
1142 |
|
|
type PE;
|
1143 |
|
|
-- Pattern element, a pattern is a complex structure of PE's. This type
|
1144 |
|
|
-- is defined and described in the body of this package.
|
1145 |
|
|
|
1146 |
|
|
type PE_Ptr is access all PE;
|
1147 |
|
|
-- Pattern reference. PE's use PE_Ptr values to reference other PE's
|
1148 |
|
|
|
1149 |
|
|
type Pattern is new Controlled with record
|
1150 |
|
|
Stk : Natural := 0;
|
1151 |
|
|
-- Maximum number of stack entries required for matching this
|
1152 |
|
|
-- pattern. See description of pattern history stack in body.
|
1153 |
|
|
|
1154 |
|
|
P : PE_Ptr := null;
|
1155 |
|
|
-- Pointer to initial pattern element for pattern
|
1156 |
|
|
end record;
|
1157 |
|
|
|
1158 |
|
|
pragma Finalize_Storage_Only (Pattern);
|
1159 |
|
|
|
1160 |
|
|
procedure Adjust (Object : in out Pattern);
|
1161 |
|
|
-- Adjust routine used to copy pattern objects
|
1162 |
|
|
|
1163 |
|
|
procedure Finalize (Object : in out Pattern);
|
1164 |
|
|
-- Finalization routine used to release storage allocated for a pattern
|
1165 |
|
|
|
1166 |
|
|
type VString_Ptr is access all VString;
|
1167 |
|
|
|
1168 |
|
|
type Match_Result is record
|
1169 |
|
|
Var : VString_Ptr;
|
1170 |
|
|
-- Pointer to subject string. Set to null if match failed
|
1171 |
|
|
|
1172 |
|
|
Start : Natural := 1;
|
1173 |
|
|
-- Starting index position (1's origin) of matched section of
|
1174 |
|
|
-- subject string. Only valid if Var is non-null.
|
1175 |
|
|
|
1176 |
|
|
Stop : Natural := 0;
|
1177 |
|
|
-- Ending index position (1's origin) of matched section of
|
1178 |
|
|
-- subject string. Only valid if Var is non-null.
|
1179 |
|
|
|
1180 |
|
|
end record;
|
1181 |
|
|
|
1182 |
|
|
pragma Volatile (Match_Result);
|
1183 |
|
|
-- This ensures that the Result parameter is passed by reference, so
|
1184 |
|
|
-- that we can play our games with the bogus Match_Result_Var parameter
|
1185 |
|
|
-- in the function case to treat it as though it were an in out parameter.
|
1186 |
|
|
|
1187 |
|
|
end GNAT.Spitbol.Patterns;
|