• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

dryruby / ebnf / 11355616381

15 Oct 2024 10:53PM UTC coverage: 94.066% (-0.2%) from 94.25%
11355616381

push

github

gkellogg
Check for terminals that also match a string production but match longer than that string; they should not match.

7 of 7 new or added lines in 1 file covered. (100.0%)

23 existing lines in 2 files now uncovered.

2140 of 2275 relevant lines covered (94.07%)

26130.0 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

89.71
/lib/ebnf/peg/rule.rb
1
module EBNF::PEG
2✔
2
  # Behaviior for parsing a PEG rule
3
  module Rule
2✔
4
    include ::EBNF::Unescape
2✔
5

6
    ##
7
    # Initialized by parser when loading rules.
8
    # Used for finding rules and invoking elements of the parse process.
9
    #
10
    # @return [EBNF::PEG::Parser] parser
11
    attr_accessor :parser
2✔
12

13
    ##
14
    # Parse a rule or terminal, invoking callbacks, as appropriate
15

16
    # If there are `start_production` and/or `production` handlers,
17
    # they are invoked with a `prod_data` stack, the input stream and offset.
18
    # Otherwise, the results are added as an array value
19
    # to a hash indexed by the rule name.
20
    #
21
    # If matched, the input position is updated and the results returned in a Hash.
22
    #
23
    # * `alt`: returns the value of the matched production or `:unmatched`.
24
    # * `diff`: returns the value matched, or `:unmatched`.
25
    # * `hex`: returns a string composed of the matched hex character, or `:unmatched`.
26
    # * `opt`: returns the value matched, or `nil` if unmatched.
27
    # * `plus`: returns an array of the values matched for the specified production, or `:unmatched`, if none are matched. For Terminals, these are concatenated into a single string.
28
    # * `range`: returns a string composed of the values matched, or `:unmatched`, if less than `min` are matched.
29
    # * `rept`: returns an array of the values matched for the speficied production, or `:unmatched`, if none are matched. For Terminals, these are concatenated into a single string.
30
    # * `seq`: returns an array composed of single-entry hashes for each matched production indexed by the production name, or `:unmatched` if any production fails to match. For Terminals, returns a string created by concatenating these values. Via option in a `production` or definition, the result can be a single hash with values for each matched production; note that this is not always possible due to the possibility of repeated productions within the sequence.
31
    # * `star`: returns an array of the values matched for the specified production. For Terminals, these are concatenated into a single string.
32
    #
33
    # @param [Scanner] input
34
    # @param [Hash] **options Other data that may be passed to handlers.
35
    # @return [Hash{Symbol => Object}, :unmatched] A hash with keys for matched component of the expression. Returns :unmatched if the input does not match the production.
36
    def parse(input, **options)
2✔
37
      # Save position and linenumber for backtracking
38
      pos, lineno = input.pos, input.lineno
766,416✔
39

40
      parser.packrat[sym] ||= {}
766,416✔
41
      if parser.packrat[sym][pos]
766,416✔
42
        parser.debug("#{sym}(:memo)", lineno: lineno) { "#{parser.packrat[sym][pos].inspect}(@#{pos})"}
25,120✔
43
        input.pos, input.lineno = parser.packrat[sym][pos][:pos], parser.packrat[sym][pos][:lineno]
25,070✔
44
        return parser.packrat[sym][pos][:result]
25,070✔
45
      end
46

47
      if terminal?
741,346✔
48
        # If the terminal is defined with a regular expression,
49
        # use that to match the input,
50
        # otherwise,
51
        if regexp = parser.terminal_regexp(sym)
300,000✔
52
          term_opts = parser.terminal_options(sym)
295,252✔
53
          if matched = input.scan(regexp)
295,252✔
54
            # Optionally map matched
55
            matched = term_opts.fetch(:map, {}).fetch(matched.downcase, matched)
57,754✔
56

57
            # Optionally unescape matched
58
            matched = unescape(matched) if term_opts[:unescape]
57,754✔
59
          end
60

61
          result = parser.onTerminal(sym, (matched ? matched : :unmatched))
295,252✔
62

63
          # Update furthest failure for strings and terminals
64
          parser.update_furthest_failure(input.pos, input.lineno, sym) if result == :unmatched
295,252✔
65
          parser.packrat[sym][pos] = {
295,252✔
66
            pos: input.pos,
67
            lineno: input.lineno,
68
            result: result
69
          }
70
          return parser.packrat[sym][pos][:result]
295,252✔
71
        end
72
      else
73
        eat_whitespace(input)
441,346✔
74
      end
75
      start_options = options.merge(parser.onStart(sym, **options))
446,094✔
76
      string_regexp_opts = start_options[:insensitive_strings] ? Regexp::IGNORECASE : 0
446,094✔
77

78
      result = case expr.first
446,094✔
79
      when :alt
80
        # Return the first expression to match. Look at strings before terminals before non-terminals, with strings ordered by longest first
81
        # Result is either :unmatched, or the value of the matching rule
82
        alt = :unmatched
73,374✔
83
        expr[1..-1].each do |prod|
73,374✔
84
          alt = case prod
300,922✔
85
          when Symbol
86
            rule = parser.find_rule(prod)
294,524✔
87
            raise "No rule found for #{prod}" unless rule
294,524✔
88
            rule.parse(input, **options)
294,524✔
89
          when String
90
            # If the input matches a terminal for which the string is a prefix, don't match the string
91
            if terminal_also_matches(input, prod, string_regexp_opts)
6,398✔
UNCOV
92
              :unmatched
×
93
            else
94
              s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
6,398✔
95
              case start_options[:insensitive_strings]
6,398✔
UNCOV
96
              when :lower then s && s.downcase
×
UNCOV
97
              when :upper then s && s.upcase
×
98
              else s
6,398✔
99
              end || :unmatched
100
            end
101
          end
102
          if alt == :unmatched
300,922✔
103
            # Update furthest failure for strings and terminals
104
            parser.update_furthest_failure(input.pos, input.lineno, prod) if prod.is_a?(String) || rule.terminal?
258,428✔
105
          else
106
            break
42,494✔
107
          end
108
        end
109
        alt
73,374✔
110
      when :diff
111
        # matches any string that matches A but does not match B.
112
        # (Note, this is only used for Terminal rules, non-terminals will use :not)
113
        raise "Diff used on non-terminal #{prod}" unless terminal?
8✔
114
        re1, re2 = Regexp.new(translate_codepoints(expr[1])), Regexp.new(translate_codepoints(expr[2]))
8✔
115
        matched = input.scan(re1)
8✔
116
        if !matched || re2.match?(matched)
8✔
117
          # Update furthest failure for terminals
118
          parser.update_furthest_failure(input.pos, input.lineno, sym)
6✔
119
          :unmatched
6✔
120
        else
121
          matched
2✔
122
        end
123
      when :hex
124
        # Matches the given hex character if expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant.
125
        input.scan(to_regexp) || begin
24✔
126
          # Update furthest failure for terminals
127
          parser.update_furthest_failure(input.pos, input.lineno, expr.last)
22✔
128
          :unmatched
22✔
129
        end
130
      when :not
131
        # matches any string that does not match B.
132
        res = case prod = expr[1]
8✔
133
        when Symbol
UNCOV
134
          rule = parser.find_rule(prod)
×
UNCOV
135
          raise "No rule found for #{prod}" unless rule
×
UNCOV
136
          rule.parse(input, **options)
×
137
        when String
138
          if terminal_also_matches(input, prod, string_regexp_opts)
8✔
UNCOV
139
            :unmatched
×
140
          else
141
            s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
8✔
142
            case start_options[:insensitive_strings]
8✔
UNCOV
143
            when :lower then s && s.downcase
×
UNCOV
144
            when :upper then s && s.upcase
×
145
            else s
8✔
146
            end || :unmatched
147
          end
148
        end
149
        if res != :unmatched
8✔
150
          # Update furthest failure for terminals
151
          parser.update_furthest_failure(input.pos, input.lineno, sym) if terminal?
4✔
152
          :unmatched
4✔
153
        else
154
          nil
4✔
155
        end
156
      when :opt
157
        # Result is the matched value or nil
158
        opt = rept(input, 0, 1, expr[1], string_regexp_opts, **start_options)
66,402✔
159

160
        # Update furthest failure for strings and terminals
161
        parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
66,402✔
162
        opt.first
66,402✔
163
      when :plus
164
        # Result is an array of all expressions while they match,
165
        # at least one must match
166
        plus = rept(input, 1, '*', expr[1], string_regexp_opts, **options)
25,428✔
167

168
        # Update furthest failure for strings and terminals
169
        parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
25,428✔
170
        plus.is_a?(Array) && terminal? ? plus.join("") : plus
25,428✔
171
      when :range, :istr
172
        # Matches the specified character range
173
        input.scan(to_regexp) || begin
2,264✔
174
          # Update furthest failure for strings and terminals
175
          parser.update_furthest_failure(input.pos, input.lineno, expr[1])
534✔
176
          :unmatched
534✔
177
        end
178
      when :rept
179
        # Result is an array of all expressions while they match,
180
        # an empty array of none match
181
        rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts, **options)
12✔
182

183
        # # Update furthest failure for strings and terminals
184
        parser.update_furthest_failure(input.pos, input.lineno, expr[3]) if terminal?
12✔
185
        rept.is_a?(Array) && terminal? ? rept.join("") : rept
12✔
186
      when :seq
187
        # Evaluate each expression into an array of hashes where each hash contains a key from the associated production and the value is the parsed value of that production. Returns :unmatched if the input does not match the production. Value ordering is ensured by native Hash ordering.
188
        seq = expr[1..-1].each_with_object([]) do |prod, accumulator|
248,274✔
189
          eat_whitespace(input) unless accumulator.empty? || terminal?
377,948✔
190
          res = case prod
377,948✔
191
          when Symbol
192
            rule = parser.find_rule(prod)
276,834✔
193
            raise "No rule found for #{prod}" unless rule
276,834✔
194
            rule.parse(input, **options.merge(_rept_data: accumulator))
276,834✔
195
          when String
196
            if terminal_also_matches(input, prod, string_regexp_opts)
101,114✔
UNCOV
197
              :unmatched
×
198
            else
199
              s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
101,114✔
200
              case start_options[:insensitive_strings]
101,114✔
201
              when :lower then s && s.downcase
4✔
202
              when :upper then s && s.upcase
8✔
203
              else s
101,102✔
204
              end || :unmatched
205
            end
206
          end
207
          if res == :unmatched
377,948✔
208
            # Update furthest failure for strings and terminals
209
            parser.update_furthest_failure(input.pos, input.lineno, prod)
132,480✔
210
            break :unmatched 
132,480✔
211
          end
212
          accumulator << {prod.to_sym => res}
245,468✔
213
        end
214
        if seq == :unmatched
248,274✔
215
          :unmatched
132,480✔
216
        elsif terminal?
115,794✔
217
          seq.map(&:values).compact.join("") # Concat values for terminal production
88✔
218
        elsif start_options[:as_hash]
115,706✔
219
          seq.inject {|memo, h| memo.merge(h)}
168,906✔
220
        else
221
          seq
33,418✔
222
        end
223
      when :star
224
        # Result is an array of all expressions while they match,
225
        # an empty array of none match
226
        star = rept(input, 0, '*', expr[1], string_regexp_opts, **options)
30,300✔
227

228
        # Update furthest failure for strings and terminals
229
        parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
30,300✔
230
        star.is_a?(Array) && terminal? ? star.join("") : star
30,300✔
231
      else
UNCOV
232
        raise "attempt to parse unknown rule type: #{expr.first}"
×
233
      end
234

235
      if result == :unmatched
446,094✔
236
        # Rewind input to entry point if unmatched.
237
        input.pos, input.lineno = pos, lineno
169,704✔
238
      end
239

240
      result = parser.onFinish(result, **options)
446,094✔
241
      (parser.packrat[sym] ||= {})[pos] = {
446,094✔
242
        pos: input.pos,
243
        lineno: input.lineno,
244
        result: result
245
      }
246
      return parser.packrat[sym][pos][:result]
446,094✔
247
    end
248

249
    ##
250
    # Repitition, 0-1, 0-n, 1-n, ...
251
    #
252
    # Note, nil results are removed from the result, but count towards min/max calculations.
253
    # Saves temporary production data to prod_data stack.
254
    #
255
    # @param [Scanner] input
256
    # @param [Integer] min
257
    # @param [Integer] max
258
    #   If it is an integer, it stops matching after max entries.
259
    # @param [Symbol, String] prod
260
    # @param [Integer] string_regexp_opts
261
    # @return [:unmatched, Array]
262
    def rept(input, min, max, prod, string_regexp_opts, **options)
2✔
263
      result = []
122,142✔
264

265
      case prod
122,142✔
266
      when Symbol
267
        rule = parser.find_rule(prod)
119,376✔
268
        raise "No rule found for #{prod}" unless rule
119,376✔
269
        while (max == '*' || result.length < max) && (res = rule.parse(input, **options.merge(_rept_data: result))) != :unmatched
318,528✔
270
          eat_whitespace(input) unless terminal?
79,776✔
271
          result << res
79,776✔
272
        end
273
      when String
274
        # FIXME: don't match a string, if input matches a terminal
275
        while (res = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))) && (max == '*' || result.length < max)
5,638✔
276
          eat_whitespace(input) unless terminal?
106✔
277
          result << case options[:insensitive_strings]
106✔
UNCOV
278
          when :lower then res.downcase
×
UNCOV
279
          when :upper then res.upcase
×
280
          else res
106✔
281
          end
282
        end
283
      end
284

285
      result.length < min ? :unmatched : result.compact
122,142✔
286
    end
287

288
    ##
289
    # See if a terminal could have a longer match than a string
290
    def terminal_also_matches(input, prod, string_regexp_opts)
2✔
291
      str_regex = Regexp.new(Regexp.quote(prod), string_regexp_opts)
107,520✔
292
      input.match?(str_regex) && parser.class.terminal_regexps.any? do |sym, re|
107,520✔
293
        (match_len = input.match?(re)) && match_len > prod.length
166,298✔
294
      end
295
    end
296
      
297
    ##
298
    # Eat whitespace between non-terminal rules
299
    def eat_whitespace(input)
2✔
300
      if parser.whitespace.is_a?(Regexp)
651,296✔
301
        # Eat whitespace before a non-terminal
302
        input.skip(parser.whitespace)
499,842✔
303
      elsif parser.whitespace.is_a?(Rule)
151,454✔
UNCOV
304
        parser.whitespace.parse(input) # throw away result
×
305
      end
306
    end
307
  end
308
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc