• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

biojppm / rapidyaml / 18649960093

20 Oct 2025 11:02AM UTC coverage: 97.642% (-0.008%) from 97.65%
18649960093

Pull #503

github

web-flow
Merge 779b983dc into 48acea949
Pull Request #503: Improve error model, callbacks

1823 of 1870 new or added lines in 32 files covered. (97.49%)

38 existing lines in 4 files now uncovered.

13623 of 13952 relevant lines covered (97.64%)

537812.42 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

82.61
/src/c4/yml/parse_engine.hpp
1
#ifndef _C4_YML_PARSE_ENGINE_HPP_
2
#define _C4_YML_PARSE_ENGINE_HPP_
3

4
#ifndef _C4_YML_PARSER_STATE_HPP_
5
#include "c4/yml/parser_state.hpp"
6
#endif
7

8

9
#if defined(_MSC_VER)
10
#   pragma warning(push)
11
#   pragma warning(disable: 4251/*needs to have dll-interface to be used by clients of struct*/)
12
#endif
13

14
// NOLINTBEGIN(hicpp-signed-bitwise)
15

16
namespace c4 {
17
namespace yml {
18

19
/** @addtogroup doc_parse
20
 * @{ */
21

22
/** @defgroup doc_event_handlers Event Handlers
23
 *
24
 * @brief rapidyaml implements its parsing logic with a two-level
25
 * model, where a @ref ParseEngine object reads through the YAML
26
 * source, and dispatches events to an EventHandler bound to the @ref
27
 * ParseEngine. Because @ref ParseEngine is templated on the event
28
 * handler, the binding uses static polymorphism, without any virtual
29
 * functions. The actual handler object can be changed at run time,
30
 * (but of course needs to be the type of the template parameter).
31
 * This is thus a very efficient architecture, and further enables the
32
 * user to provide his own custom handler if he wishes to bypass the
33
 * rapidyaml @ref Tree.
34
 *
35
 * There are two handlers implemented in this project:
36
 *
37
 * - @ref EventHandlerTree is the handler responsible for creating the
38
 *   ryml @ref Tree
39
 *
40
 * - @ref extra::EventHandlerInts parses YAML into an integer array
41
     representation of the tree and scalars.
42
 *
43
 * - @ref extra::EventHandlerTestSuite is the handler responsible for emitting
44
 *   standardized [YAML test suite
45
 *   events](https://github.com/yaml/yaml-test-suite), used (only) in
46
 *   the CI of this project.
47
 *
48
 *
49
 * ### Event model
50
 *
51
 * The event model used by the parse engine and event handlers follows
52
 * very closely the event model in the [YAML test
53
 * suite](https://github.com/yaml/yaml-test-suite).
54
 *
55
 * Consider for example this YAML,
56
 * ```yaml
57
 * {foo: bar,foo2: bar2}
58
 * ```
59
 * which would produce these events in the test-suite parlance:
60
 * ```
61
 * +STR
62
 * +DOC
63
 * +MAP {}
64
 * =VAL :foo
65
 * =VAL :bar
66
 * =VAL :foo2
67
 * =VAL :bar2
68
 * -MAP
69
 * -DOC
70
 * -STR
71
 * ```
72
 *
73
 * For reference, the @ref ParseEngine object will produce this
74
 * sequence of calls to its bound EventHandler:
75
 * ```cpp
76
 * handler.begin_stream();
77
 * handler.begin_doc();
78
 * handler.begin_map_val_flow();
79
 * handler.set_key_scalar_plain("foo");
80
 * handler.set_val_scalar_plain("bar");
81
 * handler.add_sibling();
82
 * handler.set_key_scalar_plain("foo2");
83
 * handler.set_val_scalar_plain("bar2");
84
 * handler.end_map();
85
 * handler.end_doc();
86
 * handler.end_stream();
87
 * ```
88
 *
89
 * For many other examples of all areas of YAML and how ryml's parse
90
 * model corresponds to the YAML standard model, refer to the [unit
91
 * tests for the parse
92
 * engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).
93
 *
94
 *
95
 * ### Special events
96
 *
97
 * Most of the parsing events adopted by rapidyaml in its event model
98
 * are fairly obvious, but there are two less-obvious events requiring
99
 * some explanation.
100
 *
101
 * These events exist to make it easier to parse some special YAML
102
 * cases. They are called by the parser when a just-handled
103
 * value/container is actually the first key of a new map:
104
 *
105
 *   - `actually_val_is_first_key_of_new_map_flow()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTree" / @ref EventHandlerTestSuite::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTestSuite")
106
 *   - `actually_val_is_first_key_of_new_map_block()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTree" / @ref EventHandlerTestSuite::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTestSuite")
107
 *
108
 * For example, consider an implicit map inside a seq: `[a: b, c:
109
 * d]` which is parsed as `[{a: b}, {c: d}]`. The standard event
110
 * sequence for this YAML would be the following:
111
 * ```cpp
112
 * handler.begin_seq_val_flow();
113
 * handler.begin_map_val_flow();
114
 * handler.set_key_scalar_plain("a");
115
 * handler.set_val_scalar_plain("b");
116
 * handler.end_map();
117
 * handler.add_sibling();
118
 * handler.begin_map_val_flow();
119
 * handler.set_key_scalar_plain("c");
120
 * handler.set_val_scalar_plain("d");
121
 * handler.end_map();
122
 * handler.end_seq();
123
 * ```
124
 * The problem with this event sequence is that it forces the
125
 * parser to delay setting the val scalar (in this case "a" and
126
 * "c") until it knows whether the scalar is a key or a val. This
127
 * would require the parser to store the scalar until this
128
 * time. For instance, in the example above, the parser should
129
 * delay setting "a" and "c", because they are in fact keys and
130
 * not vals. Until then, the parser would have to store "a" and
131
 * "c" in its internal state. The downside is that this complexity
132
 * cost would apply even if there is no implicit map -- every val
133
 * in a seq would have to be delayed until one of the
134
 * disambiguating subsequent tokens `,-]:` is found.
135
 * By calling this function, the parser can avoid this complexity,
136
 * by preemptively setting the scalar as a val. Then a call to
137
 * this function will create the map and rearrange the scalar as
138
 * key. Now the cost applies only once: when a seqimap starts. So
139
 * the following (easier and cheaper) event sequence below has the
140
 * same effect as the event sequence above:
141
 * ```cpp
142
 * handler.begin_seq_val_flow();
143
 * handler.set_val_scalar_plain("notmap");
144
 * handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
145
 * handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
146
 * handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
147
 * handler.end_map();
148
 * handler.set_val_scalar_plain("c"); // "c" also as val!
149
 * handler.actually_as_block_flow();  // likewise
150
 * handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
151
 * handler.end_map();
152
 * handler.end_seq();
153
 * ```
154
 * This also applies to container keys (although ryml's tree
155
 * cannot accomodate these): the parser can preemptively set a
156
 * container as a val, and call this event to turn that container
157
 * into a key. For example, consider this yaml:
158
 * ```yaml
159
 *   [aa, bb]: [cc, dd]
160
 * # ^       ^ ^
161
 * # |       | |
162
 * # (2)   (1) (3)     <- event sequence
163
 * ```
164
 * The standard event sequence for this YAML would be the
165
 * following:
166
 * ```cpp
167
 * handler.begin_map_val_block();       // (1)
168
 * handler.begin_seq_key_flow();        // (2)
169
 * handler.set_val_scalar_plain("aa");
170
 * handler.add_sibling();
171
 * handler.set_val_scalar_plain("bb");
172
 * handler.end_seq();
173
 * handler.begin_seq_val_flow();        // (3)
174
 * handler.set_val_scalar_plain("cc");
175
 * handler.add_sibling();
176
 * handler.set_val_scalar_plain("dd");
177
 * handler.end_seq();
178
 * handler.end_map();
179
 * ```
180
 * The problem with the sequence above is that, reading from
181
 * left-to-right, the parser can only detect the proper calls at
182
 * (1) and (2) once it reaches (1) in the YAML source. So, the
183
 * parser would have to buffer the entire event sequence starting
184
 * from the beginning until it reaches (1). Using this function,
185
 * the parser can do instead:
186
 * ```cpp
187
 * handler.begin_seq_val_flow();        // (2) -- preemptively as val!
188
 * handler.set_val_scalar_plain("aa");
189
 * handler.add_sibling();
190
 * handler.set_val_scalar_plain("bb");
191
 * handler.end_seq();
192
 * handler.actually_as_new_map_key();   // (1) -- adjust when finding that the prev val was actually a key.
193
 * handler.begin_seq_val_flow();        // (3) -- go on as before
194
 * handler.set_val_scalar_plain("cc");
195
 * handler.add_sibling();
196
 * handler.set_val_scalar_plain("dd");
197
 * handler.end_seq();
198
 * handler.end_map();
199
 * ```
200
 */
201

202
class Tree;
203
class NodeRef;
204
class ConstNodeRef;
205
struct FilterResult;
206
struct FilterResultExtending;
207

208

209
typedef enum BlockChomp_ {
210
    CHOMP_CLIP,    //!< single newline at end (default)
211
    CHOMP_STRIP,   //!< no newline at end     (-)
212
    CHOMP_KEEP     //!< all newlines from end (+)
213
} BlockChomp_e;
214

215

216
//-----------------------------------------------------------------------------
217
//-----------------------------------------------------------------------------
218
//-----------------------------------------------------------------------------
219

220
/** Options to give to the parser to control its behavior. */
221
struct RYML_EXPORT ParserOptions
222
{
223
private:
224

225
    typedef enum : uint32_t {
226
        SCALAR_FILTERING = (1u << 0u),
227
        LOCATIONS = (1u << 1u),
228
        DEFAULTS = SCALAR_FILTERING,
229
    } Flags_e;
230

231
    uint32_t flags = DEFAULTS;
232

233
public:
234

235
    ParserOptions() = default;
3,150,854✔
236

237
public:
238

239
    /** @name source location tracking */
240
    /** @{ */
241

242
    /** enable/disable source location tracking */
243
    ParserOptions& locations(bool enabled) noexcept
150✔
244
    {
245
        if(enabled)
150✔
246
            flags |= LOCATIONS;
138✔
247
        else
248
            flags &= ~LOCATIONS;
12✔
249
        return *this;
150✔
250
    }
251
    /** query source location tracking status */
252
    C4_ALWAYS_INLINE bool locations() const noexcept { return (flags & LOCATIONS); }
740,656✔
253

254
    /** @} */
255

256
public:
257

258
    /** @name scalar filtering status (experimental; disable at your discretion) */
259
    /** @{ */
260

261
    /** enable/disable scalar filtering while parsing */
262
    ParserOptions& scalar_filtering(bool enabled) noexcept
36✔
263
    {
264
        if(enabled)
36✔
265
            flags |= SCALAR_FILTERING;
6✔
266
        else
267
            flags &= ~SCALAR_FILTERING;
30✔
268
        return *this;
36✔
269
    }
270
    /** query scalar filtering status */
271
    C4_ALWAYS_INLINE bool scalar_filtering() const noexcept { return (flags & SCALAR_FILTERING); }
404,454✔
272

273
    /** @} */
274
};
275

276

277
//-----------------------------------------------------------------------------
278
//-----------------------------------------------------------------------------
279
//-----------------------------------------------------------------------------
280

281
/** This is the main driver of parsing logic: it scans the YAML or
282
 * JSON source for tokens, and emits the appropriate sequence of
283
 * parsing events to its event handler. The parse engine itself has no
284
 * special limitations, and *can* accomodate containers as keys; it is the
285
 * event handler may introduce additional constraints.
286
 *
287
 * There are two implemented handlers (see @ref doc_event_handlers,
288
 * which has important notes about the event model):
289
 *
290
 * - @ref EventHandlerTree is the handler responsible for creating the
291
 *   ryml @ref Tree
292
 *
293
 * - @ref extra::EventHandlerTestSuite is a handler responsible for emitting
294
 *   standardized [YAML test suite
295
 *   events](https://github.com/yaml/yaml-test-suite), used (only) in
296
 *   the CI of this project. This is not part of the library and is
297
 *   not installed.
298
 *
299
 * - @ref extra::EventHandlerInts is the handler responsible for
300
 *   emitting integer-coded events. It is intended for implementing
301
 *   fully-conformant parsing in other programming languages
302
 *   (integration is currently under work for
303
 *   [YamlScript](https://github.com/yaml/yamlscript) and
304
 *   [go-yaml](https://github.com/yaml/go-yaml/)). It is not part of
305
 *   the library and is not installed.
306
 *
307
 */
308
template<class EventHandler>
309
class ParseEngine
310
{
311
public:
312

313
    using handler_type = EventHandler;
314

315
public:
316

317
    /** @name construction and assignment */
318
    /** @{ */
319

320
    ParseEngine(EventHandler *evt_handler, ParserOptions opts={});
321
    ~ParseEngine();
322

323
    ParseEngine(ParseEngine &&) noexcept;
324
    ParseEngine(ParseEngine const&);
325
    ParseEngine& operator=(ParseEngine &&) noexcept;
326
    ParseEngine& operator=(ParseEngine const&);
327

328
    /** @} */
329

330
public:
331

332
    /** @name modifiers */
333
    /** @{ */
334

335
    /** Reserve a certain capacity for the parsing stack.
336
     * This should be larger than the expected depth of the parsed
337
     * YAML tree.
338
     *
339
     * The parsing stack is the only (potential) heap memory used
340
     * directly by the parser.
341
     *
342
     * If the requested capacity is below the default
343
     * stack size of 16, the memory is used directly in the parser
344
     * object; otherwise it will be allocated from the heap.
345
     *
346
     * @note this reserves memory only for the parser itself; all the
347
     * allocations for the parsed tree will go through the tree's
348
     * allocator (when different).
349
     *
350
     * @note for maximum efficiency, the tree and the arena can (and
351
     * should) also be reserved. */
352
    void reserve_stack(id_type capacity)
126✔
353
    {
354
        m_evt_handler->m_stack.reserve(capacity);
126✔
355
    }
126✔
356

357
    /** Reserve a certain capacity for the array used to track node
358
     * locations in the source buffer. */
359
    void reserve_locations(size_t num_source_lines)
84✔
360
    {
361
        _resize_locations(num_source_lines);
84✔
362
    }
84✔
363

364
    RYML_DEPRECATED("filter arena no longer needed")
365
    void reserve_filter_arena(size_t) {}
×
366

367
    /** @} */
368

369
public:
370

371
    /** @name getters */
372
    /** @{ */
373

374
    /** Get the options used to build this parser object. */
375
    ParserOptions const& options() const { return m_options; }
108✔
376

377
    /** Get the current callbacks in the parser. */
378
    Callbacks const& callbacks() const { _RYML_ASSERT_BASIC(m_evt_handler); return m_evt_handler->m_stack.m_callbacks; }
306,310✔
379

380
    /** Get the name of the latest file parsed by this object. */
381
    csubstr filename() const { return m_file; }
30✔
382

383
    /** Get the latest YAML buffer parsed by this object. */
384
    csubstr source() const { return m_buf; }
6,648✔
385

386
    /** Get the encoding of the latest YAML buffer parsed by this object.
387
     * If no encoding was specified, UTF8 is assumed as per the YAML standard. */
388
    Encoding_e encoding() const { return m_encoding != NOBOM ? m_encoding : UTF8; }
1,980✔
389

NEW
390
    id_type stack_capacity() const { _RYML_ASSERT_BASIC(m_evt_handler); return m_evt_handler->m_stack.capacity(); }
×
391
    size_t locations_capacity() const { return m_newline_offsets_capacity; }
×
392

393
    RYML_DEPRECATED("filter arena no longer needed")
394
    size_t filter_arena_capacity() const { return 0u; }
×
395

396
    /** @} */
397

398
public:
399

400
    /** @name parse methods */
401
    /** @{ */
402

403
    /** parse YAML in place, emitting events to the current handler */
404
    void parse_in_place_ev(csubstr filename, substr src);
405

406
    /** parse JSON in place, emitting events to the current handler */
407
    void parse_json_in_place_ev(csubstr filename, substr src);
408

409
    /** @} */
410

411
public:
412

413
    // deprecated parse methods
414

415
    /** @cond dev */
416
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t, size_t node_id);
417
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(                  substr yaml, Tree *t, size_t node_id);
418
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t                );
419
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(                  substr yaml, Tree *t                );
420
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, NodeRef node           );
421
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(                  substr yaml, NodeRef node           );
422
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(csubstr filename, substr yaml                         );
423
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(                  substr yaml                         );
424
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t, size_t node_id);
425
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  csubstr yaml, Tree *t, size_t node_id);
426
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t                );
427
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  csubstr yaml, Tree *t                );
428
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, NodeRef node           );
429
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  csubstr yaml, NodeRef node           );
430
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, csubstr yaml                         );
431
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(                  csubstr yaml                         );
432
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t, size_t node_id);
433
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  substr yaml, Tree *t, size_t node_id);
434
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t                );
435
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  substr yaml, Tree *t                );
436
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, NodeRef node           );
437
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  substr yaml, NodeRef node           );
438
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, substr yaml                         );
439
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(                  substr yaml                         );
440
    /** @endcond */
441

442
public:
443

444
    /** @name locations */
445
    /** @{ */
446

447
    /** Get the string starting at a particular location, to the end
448
     * of the parsed source buffer. */
449
    csubstr location_contents(Location const& loc) const;
450

451
    /** Given a pointer to a buffer position, get the location.
452
     * @param[in] val must be pointing to somewhere in the source
453
     * buffer that was last parsed by this object. */
454
    Location val_location(const char *val) const;
455

456
    /** @} */
457

458
public:
459

460
    /** @cond dev */
461
    template<class U>
462
    RYML_DEPRECATED("moved to Tree::location(Parser const&). deliberately undefined here.")
463
    auto location(Tree const&, id_type node) const -> typename std::enable_if<U::is_wtree, Location>::type;
464

465
    template<class U>
466
    RYML_DEPRECATED("moved to ConstNodeRef::location(Parser const&), deliberately undefined here.")
467
    auto location(ConstNodeRef const&) const -> typename std::enable_if<U::is_wtree, Location>::type;
468
    /** @endcond */
469

470
public:
471

472
    /** @name scalar filtering */
473
    /** @{*/
474

475
    /** filter a plain scalar */
476
    FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation);
477
    /** filter a plain scalar in place */
478
    FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation);
479

480
    /** filter a single-quoted scalar */
481
    FilterResult filter_scalar_squoted(csubstr scalar, substr dst);
482
    /** filter a single-quoted scalar in place */
483
    FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap);
484

485
    /** filter a double-quoted scalar */
486
    FilterResult filter_scalar_dquoted(csubstr scalar, substr dst);
487
    /** filter a double-quoted scalar in place */
488
    FilterResultExtending filter_scalar_dquoted_in_place(substr scalar, size_t cap);
489

490
    /** filter a block-literal scalar */
491
    FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
492
    /** filter a block-literal scalar in place */
493
    FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
494

495
    /** filter a block-folded scalar */
496
    FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
497
    /** filter a block-folded scalar in place */
498
    FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
499

500
    /** @} */
501

502
private:
503

504
    struct ScannedScalar
505
    {
506
        substr scalar;
507
        bool needs_filter;
508
    };
509

510
    struct ScannedBlock
511
    {
512
        substr scalar;
513
        size_t indentation;
514
        BlockChomp_e chomp;
515
    };
516

517
private:
518

519
    bool    _is_doc_begin(csubstr s);
520
    bool    _is_doc_end(csubstr s);
521

522
    bool    _scan_scalar_plain_blck(ScannedScalar *C4_RESTRICT sc, size_t indentation);
523
    bool    _scan_scalar_plain_seq_flow(ScannedScalar *C4_RESTRICT sc);
524
    bool    _scan_scalar_plain_seq_blck(ScannedScalar *C4_RESTRICT sc);
525
    bool    _scan_scalar_plain_map_flow(ScannedScalar *C4_RESTRICT sc);
526
    bool    _scan_scalar_plain_map_blck(ScannedScalar *C4_RESTRICT sc);
527
    bool    _scan_scalar_map_json(ScannedScalar *C4_RESTRICT sc);
528
    bool    _scan_scalar_seq_json(ScannedScalar *C4_RESTRICT sc);
529
    bool    _scan_scalar_plain_unk(ScannedScalar *C4_RESTRICT sc);
530
    bool    _is_valid_start_scalar_plain_flow(csubstr s);
531

532
    ScannedScalar _scan_scalar_squot();
533
    ScannedScalar _scan_scalar_dquot();
534

535
    void    _scan_block(ScannedBlock *C4_RESTRICT sb, size_t indref);
536

537
    csubstr _scan_anchor();
538
    csubstr _scan_ref_seq();
539
    csubstr _scan_ref_map();
540
    csubstr _scan_tag();
541

542
public: // exposed for testing
543

544
    /** @cond dev */
545
    csubstr _filter_scalar_plain(substr s, size_t indentation);
546
    csubstr _filter_scalar_squot(substr s);
547
    csubstr _filter_scalar_dquot(substr s);
548
    csubstr _filter_scalar_literal(substr s, size_t indentation, BlockChomp_e chomp);
549
    csubstr _filter_scalar_folded(substr s, size_t indentation, BlockChomp_e chomp);
550
    csubstr _move_scalar_left_and_add_newline(substr s);
551

552
    csubstr _maybe_filter_key_scalar_plain(ScannedScalar const& sc, size_t indendation);
553
    csubstr _maybe_filter_val_scalar_plain(ScannedScalar const& sc, size_t indendation);
554
    csubstr _maybe_filter_key_scalar_squot(ScannedScalar const& sc);
555
    csubstr _maybe_filter_val_scalar_squot(ScannedScalar const& sc);
556
    csubstr _maybe_filter_key_scalar_dquot(ScannedScalar const& sc);
557
    csubstr _maybe_filter_val_scalar_dquot(ScannedScalar const& sc);
558
    csubstr _maybe_filter_key_scalar_literal(ScannedBlock const& sb);
559
    csubstr _maybe_filter_val_scalar_literal(ScannedBlock const& sb);
560
    csubstr _maybe_filter_key_scalar_folded(ScannedBlock const& sb);
561
    csubstr _maybe_filter_val_scalar_folded(ScannedBlock const& sb);
562
    /** @endcond */
563

564
private:
565

566
    void  _handle_map_block();
567
    void  _handle_seq_block();
568
    void  _handle_map_flow();
569
    void  _handle_seq_flow();
570
    void  _handle_seq_imap();
571
    void  _handle_map_json();
572
    void  _handle_seq_json();
573

574
    void  _handle_unk();
575
    void  _handle_unk_json();
576
    void  _handle_usty();
577

578
    void  _handle_flow_skip_whitespace();
579

580
    void  _end_map_blck();
581
    void  _end_seq_blck();
582
    void  _end2_map();
583
    void  _end2_seq();
584

585
    void  _begin2_doc();
586
    void  _begin2_doc_expl();
587
    void  _end2_doc();
588
    void  _end2_doc_expl();
589

590
    void  _maybe_begin_doc();
591
    void  _maybe_end_doc();
592

593
    void  _start_doc_suddenly();
594
    void  _end_doc_suddenly();
595
    void  _end_doc_suddenly__pop();
596
    void  _end_stream();
597

598
    void  _set_indentation(size_t indentation);
599
    void  _save_indentation();
600
    void  _handle_indentation_pop_from_block_seq();
601
    void  _handle_indentation_pop_from_block_map();
602
    void  _handle_indentation_pop(ParserState const* dst);
603

604
    void _maybe_skip_comment();
605
    void _skip_comment();
606
    void _maybe_skip_whitespace_tokens();
607
    void _maybe_skipchars(char c);
608
    #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
609
    void _maybe_skipchars_up_to(char c, size_t max_to_skip);
610
    #endif
611
    template<size_t N>
612
    void _skipchars(const char (&chars)[N]);
613
    bool _maybe_scan_following_colon() noexcept;
614
    bool _maybe_scan_following_comma() noexcept;
615

616
public:
617

618
    /** @cond dev */
619
    template<class FilterProcessor> auto _filter_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation) -> decltype(proc.result());
620
    template<class FilterProcessor> auto _filter_squoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
621
    template<class FilterProcessor> auto _filter_dquoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
622
    template<class FilterProcessor> auto _filter_block_literal(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
623
    template<class FilterProcessor> auto _filter_block_folded(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
624
    /** @endcond */
625

626
public:
627

628
    /** @cond dev */
629
    template<class FilterProcessor> void   _filter_nl_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation);
630
    template<class FilterProcessor> void   _filter_nl_squoted(FilterProcessor &C4_RESTRICT proc);
631
    template<class FilterProcessor> void   _filter_nl_dquoted(FilterProcessor &C4_RESTRICT proc);
632

633
    template<class FilterProcessor> bool   _filter_ws_handle_to_first_non_space(FilterProcessor &C4_RESTRICT proc);
634
    template<class FilterProcessor> void   _filter_ws_copy_trailing(FilterProcessor &C4_RESTRICT proc);
635
    template<class FilterProcessor> void   _filter_ws_skip_trailing(FilterProcessor &C4_RESTRICT proc);
636

637
    template<class FilterProcessor> void   _filter_dquoted_backslash(FilterProcessor &C4_RESTRICT proc);
638

639
    template<class FilterProcessor> void   _filter_chomp(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp, size_t indentation);
640
    template<class FilterProcessor> size_t _handle_all_whitespace(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp);
641
    template<class FilterProcessor> size_t _extend_to_chomp(FilterProcessor &C4_RESTRICT proc, size_t contents_len);
642
    template<class FilterProcessor> void   _filter_block_indentation(FilterProcessor &C4_RESTRICT proc, size_t indentation);
643
    template<class FilterProcessor> void   _filter_block_folded_newlines(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
644
    template<class FilterProcessor> size_t _filter_block_folded_newlines_compress(FilterProcessor &C4_RESTRICT proc, size_t num_newl, size_t wpos_at_first_newl);
645
    template<class FilterProcessor> void   _filter_block_folded_newlines_leading(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
646
    template<class FilterProcessor> void   _filter_block_folded_indented_block(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len, size_t curr_indentation) noexcept;
647

648
    /** @endcond */
649

650
private:
651

652
    void _line_progressed(size_t ahead);
653
    void _line_ended();
654
    void _line_ended_undo();
655

656
    bool  _finished_file() const;
657
    bool  _finished_line() const;
658

659
    void   _scan_line();
660
    substr _peek_next_line(size_t pos=npos) const;
661

662
    bool _at_line_begin() const
1,270,384✔
663
    {
664
        return m_evt_handler->m_curr->line_contents.rem.begin() == m_evt_handler->m_curr->line_contents.full.begin();
3,811,152✔
665
    }
666

667
    void _relocate_arena(csubstr prev_arena, substr next_arena);
668
    static void _s_relocate_arena(void*, csubstr prev_arena, substr next_arena);
669

670
private:
671

672
    C4_ALWAYS_INLINE bool has_all(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == f; }
10,824,972✔
673
    C4_ALWAYS_INLINE bool has_any(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) != 0; }
35,643,840✔
674
    C4_ALWAYS_INLINE bool has_none(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == 0; }
17,741,486✔
675
    static C4_ALWAYS_INLINE bool has_all(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == f; }
79,632✔
676
    static C4_ALWAYS_INLINE bool has_any(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) != 0; }
14,280✔
677
    static C4_ALWAYS_INLINE bool has_none(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == 0; }
13,416✔
678

679
    #ifndef RYML_DBG
680
    C4_ALWAYS_INLINE static void add_flags(ParserFlag_t on, ParserState *C4_RESTRICT s) noexcept { s->flags |= on; }
×
681
    C4_ALWAYS_INLINE static void addrem_flags(ParserFlag_t on, ParserFlag_t off, ParserState *C4_RESTRICT s) noexcept { s->flags &= ~off; s->flags |= on; }
×
682
    C4_ALWAYS_INLINE static void rem_flags(ParserFlag_t off, ParserState *C4_RESTRICT s) noexcept { s->flags &= ~off; }
×
683
    C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { m_evt_handler->m_curr->flags |= on; }
552,152✔
684
    C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; m_evt_handler->m_curr->flags |= on; }
2,279,054✔
685
    C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; }
84✔
686
    #else
687
    static void add_flags(ParserFlag_t on, ParserState *C4_RESTRICT s);
688
    static void addrem_flags(ParserFlag_t on, ParserFlag_t off, ParserState *C4_RESTRICT s);
689
    static void rem_flags(ParserFlag_t off, ParserState *C4_RESTRICT s);
690
    C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { add_flags(on, m_evt_handler->m_curr); }
276,040✔
691
    C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { addrem_flags(on, off, m_evt_handler->m_curr); }
1,139,204✔
692
    C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { rem_flags(off, m_evt_handler->m_curr); }
42✔
693
    #endif
694

695
private:
696

697
    void _prepare_locations();
698
    void _resize_locations(size_t sz);
699
    bool _locations_dirty() const;
700

701
private:
702

703
    void _reset();
704
    void _free();
705
    void _clr();
706

707
    #ifdef RYML_DBG
708
    template<class ...Args> C4_NO_INLINE void _dbg(csubstr fmt, Args const& ...args) const;
709
    template<class DumpFn>  C4_NO_INLINE void _fmt_msg(DumpFn &&dumpfn) const;
710
    #endif
711
    template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, const char *fmt, Args const& ...args) const;
712
    template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, Location const& ymlloc, const char *fmt, Args const& ...args) const;
713

714

715
private:
716

717
    /** store pending tag or anchor/ref annotations */
718
    struct Annotation
719
    {
720
        struct Entry
721
        {
722
            csubstr str;
723
            size_t indentation;
724
            size_t line;
725
        };
726
        Entry annotations[2];
727
        size_t num_entries;
728
    };
729

730
    void _handle_colon();
731
    void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line);
732
    void _clear_annotations(Annotation *C4_RESTRICT dst);
733
    bool _has_pending_annotations() const { return m_pending_tags.num_entries || m_pending_anchors.num_entries; }
×
734
    #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
735
    bool _handle_indentation_from_annotations();
736
    #endif
737
    bool _annotations_require_key_container() const;
738
    void _handle_annotations_before_blck_key_scalar();
739
    void _handle_annotations_before_blck_val_scalar();
740
    void _handle_annotations_before_start_mapblck(size_t current_line);
741
    void _handle_annotations_before_start_mapblck_as_key();
742
    void _handle_annotations_and_indentation_after_start_mapblck(size_t key_indentation, size_t key_line);
743
    size_t _select_indentation_from_annotations(size_t val_indentation, size_t val_line);
744
    void _handle_directive(csubstr rem);
745
    bool _handle_bom();
746
    void _handle_bom(Encoding_e enc);
747

748
    void _check_tag(csubstr tag);
749

750
private:
751

752
    ParserOptions m_options;
753

754
    csubstr m_file;
755
    substr  m_buf;
756

757
public:
758

759
    /** @cond dev */
760
    EventHandler *C4_RESTRICT m_evt_handler; // NOLINT
761
    /** @endcond */
762

763
private:
764

765
    Annotation m_pending_anchors;
766
    Annotation m_pending_tags;
767

768
    bool m_was_inside_qmrk;
769
    bool m_doc_empty = true;
770
    size_t m_prev_colon = npos;
771

772
    Encoding_e m_encoding = UTF8;
773

774
private:
775

776
    size_t *m_newline_offsets;
777
    size_t  m_newline_offsets_size;
778
    size_t  m_newline_offsets_capacity;
779
    csubstr m_newline_offsets_buf;
780

781
};
782

783

784
/** Quickly inspect the source to estimate the number of nodes the
785
 * resulting tree is likely have. If a tree is empty before
786
 * parsing, considerable time will be spent growing it, so calling
787
 * this to reserve the tree size prior to parsing is likely to
788
 * result in a time gain. We encourage using this method before
789
 * parsing, but as always measure its impact in performance to
790
 * obtain a good trade-off.
791
 *
792
 * @note since this method is meant for optimizing performance, it
793
 * is approximate. The result may be actually smaller than the
794
 * resulting number of nodes, notably if the YAML uses implicit
795
 * maps as flow seq members as in `[these: are, individual:
796
 * maps]`. */
797
RYML_EXPORT id_type estimate_tree_capacity(csubstr src); // NOLINT(readability-redundant-declaration)
798

799
/** @} */
800

801
} // namespace yml
802
} // namespace c4
803

804
// NOLINTEND(hicpp-signed-bitwise)
805

806
#if defined(_MSC_VER)
807
#   pragma warning(pop)
808
#endif
809

810
#endif /* _C4_YML_PARSE_ENGINE_HPP_ */
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc