• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

biojppm / rapidyaml / 18640477834

20 Oct 2025 02:44AM UTC coverage: 97.759% (+0.1%) from 97.65%
18640477834

Pull #550

github

web-flow
Merge 5a981cd8d into 48acea949
Pull Request #550: Implement FLOW_ML style

823 of 856 new or added lines in 12 files covered. (96.14%)

160 existing lines in 15 files now uncovered.

13830 of 14147 relevant lines covered (97.76%)

548857.81 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

82.69
/src/c4/yml/parse_engine.hpp
1
#ifndef _C4_YML_PARSE_ENGINE_HPP_
2
#define _C4_YML_PARSE_ENGINE_HPP_
3

4
#ifndef _C4_YML_PARSER_STATE_HPP_
5
#include "c4/yml/parser_state.hpp"
6
#endif
7

8

9
#if defined(_MSC_VER)
10
#   pragma warning(push)
11
#   pragma warning(disable: 4251/*needs to have dll-interface to be used by clients of struct*/)
12
#endif
13

14
// NOLINTBEGIN(hicpp-signed-bitwise)
15

16
namespace c4 {
17
namespace yml {
18

19
/** @addtogroup doc_parse
20
 * @{ */
21

22
/** @defgroup doc_event_handlers Event Handlers
23
 *
24
 * @brief rapidyaml implements its parsing logic with a two-level
25
 * model, where a @ref ParseEngine object reads through the YAML
26
 * source, and dispatches events to an EventHandler bound to the @ref
27
 * ParseEngine. Because @ref ParseEngine is templated on the event
28
 * handler, the binding uses static polymorphism, without any virtual
29
 * functions. The actual handler object can be changed at run time,
30
 * (but of course needs to be the type of the template parameter).
31
 * This is thus a very efficient architecture, and further enables the
32
 * user to provide his own custom handler if he wishes to bypass the
33
 * rapidyaml @ref Tree.
34
 *
35
 * There are two handlers implemented in this project:
36
 *
37
 * - @ref EventHandlerTree is the handler responsible for creating the
38
 *   ryml @ref Tree
39
 *
40
 * - @ref extra::EventHandlerInts parses YAML into an integer array
41
     representation of the tree and scalars.
42
 *
43
 * - @ref extra::EventHandlerTestSuite is the handler responsible for emitting
44
 *   standardized [YAML test suite
45
 *   events](https://github.com/yaml/yaml-test-suite), used (only) in
46
 *   the CI of this project.
47
 *
48
 *
49
 * ### Event model
50
 *
51
 * The event model used by the parse engine and event handlers follows
52
 * very closely the event model in the [YAML test
53
 * suite](https://github.com/yaml/yaml-test-suite).
54
 *
55
 * Consider for example this YAML,
56
 * ```yaml
57
 * {foo: bar,foo2: bar2}
58
 * ```
59
 * which would produce these events in the test-suite parlance:
60
 * ```
61
 * +STR
62
 * +DOC
63
 * +MAP {}
64
 * =VAL :foo
65
 * =VAL :bar
66
 * =VAL :foo2
67
 * =VAL :bar2
68
 * -MAP
69
 * -DOC
70
 * -STR
71
 * ```
72
 *
73
 * For reference, the @ref ParseEngine object will produce this
74
 * sequence of calls to its bound EventHandler:
75
 * ```cpp
76
 * handler.begin_stream();
77
 * handler.begin_doc();
78
 * handler.begin_map_val_flow();
79
 * handler.set_key_scalar_plain("foo");
80
 * handler.set_val_scalar_plain("bar");
81
 * handler.add_sibling();
82
 * handler.set_key_scalar_plain("foo2");
83
 * handler.set_val_scalar_plain("bar2");
84
 * handler.end_map();
85
 * handler.end_doc();
86
 * handler.end_stream();
87
 * ```
88
 *
89
 * For many other examples of all areas of YAML and how ryml's parse
90
 * model corresponds to the YAML standard model, refer to the [unit
91
 * tests for the parse
92
 * engine](https://github.com/biojppm/rapidyaml/tree/master/test/test_parse_engine.cpp).
93
 *
94
 *
95
 * ### Special events
96
 *
97
 * Most of the parsing events adopted by rapidyaml in its event model
98
 * are fairly obvious, but there are two less-obvious events requiring
99
 * some explanation.
100
 *
101
 * These events exist to make it easier to parse some special YAML
102
 * cases. They are called by the parser when a just-handled
103
 * value/container is actually the first key of a new map:
104
 *
105
 *   - `actually_val_is_first_key_of_new_map_flow()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTree" / @ref EventHandlerTestSuite::actually_val_is_first_key_of_new_map_flow() "see implementation in EventHandlerTestSuite")
106
 *   - `actually_val_is_first_key_of_new_map_block()` (@ref EventHandlerTree::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTree" / @ref EventHandlerTestSuite::actually_val_is_first_key_of_new_map_block() "see implementation in EventHandlerTestSuite")
107
 *
108
 * For example, consider an implicit map inside a seq: `[a: b, c:
109
 * d]` which is parsed as `[{a: b}, {c: d}]`. The standard event
110
 * sequence for this YAML would be the following:
111
 * ```cpp
112
 * handler.begin_seq_val_flow();
113
 * handler.begin_map_val_flow();
114
 * handler.set_key_scalar_plain("a");
115
 * handler.set_val_scalar_plain("b");
116
 * handler.end_map();
117
 * handler.add_sibling();
118
 * handler.begin_map_val_flow();
119
 * handler.set_key_scalar_plain("c");
120
 * handler.set_val_scalar_plain("d");
121
 * handler.end_map();
122
 * handler.end_seq();
123
 * ```
124
 * The problem with this event sequence is that it forces the
125
 * parser to delay setting the val scalar (in this case "a" and
126
 * "c") until it knows whether the scalar is a key or a val. This
127
 * would require the parser to store the scalar until this
128
 * time. For instance, in the example above, the parser should
129
 * delay setting "a" and "c", because they are in fact keys and
130
 * not vals. Until then, the parser would have to store "a" and
131
 * "c" in its internal state. The downside is that this complexity
132
 * cost would apply even if there is no implicit map -- every val
133
 * in a seq would have to be delayed until one of the
134
 * disambiguating subsequent tokens `,-]:` is found.
135
 * By calling this function, the parser can avoid this complexity,
136
 * by preemptively setting the scalar as a val. Then a call to
137
 * this function will create the map and rearrange the scalar as
138
 * key. Now the cost applies only once: when a seqimap starts. So
139
 * the following (easier and cheaper) event sequence below has the
140
 * same effect as the event sequence above:
141
 * ```cpp
142
 * handler.begin_seq_val_flow();
143
 * handler.set_val_scalar_plain("notmap");
144
 * handler.set_val_scalar_plain("a"); // preemptively set "a" as val!
145
 * handler.actually_as_new_map_key(); // create a map, move the "a" val as the key of the first child of the new map
146
 * handler.set_val_scalar_plain("b"); // now "a" is a key and "b" the val
147
 * handler.end_map();
148
 * handler.set_val_scalar_plain("c"); // "c" also as val!
149
 * handler.actually_as_block_flow();  // likewise
150
 * handler.set_val_scalar_plain("d"); // now "c" is a key and "b" the val
151
 * handler.end_map();
152
 * handler.end_seq();
153
 * ```
154
 * This also applies to container keys (although ryml's tree
155
 * cannot accomodate these): the parser can preemptively set a
156
 * container as a val, and call this event to turn that container
157
 * into a key. For example, consider this yaml:
158
 * ```yaml
159
 *   [aa, bb]: [cc, dd]
160
 * # ^       ^ ^
161
 * # |       | |
162
 * # (2)   (1) (3)     <- event sequence
163
 * ```
164
 * The standard event sequence for this YAML would be the
165
 * following:
166
 * ```cpp
167
 * handler.begin_map_val_block();       // (1)
168
 * handler.begin_seq_key_flow();        // (2)
169
 * handler.set_val_scalar_plain("aa");
170
 * handler.add_sibling();
171
 * handler.set_val_scalar_plain("bb");
172
 * handler.end_seq();
173
 * handler.begin_seq_val_flow();        // (3)
174
 * handler.set_val_scalar_plain("cc");
175
 * handler.add_sibling();
176
 * handler.set_val_scalar_plain("dd");
177
 * handler.end_seq();
178
 * handler.end_map();
179
 * ```
180
 * The problem with the sequence above is that, reading from
181
 * left-to-right, the parser can only detect the proper calls at
182
 * (1) and (2) once it reaches (1) in the YAML source. So, the
183
 * parser would have to buffer the entire event sequence starting
184
 * from the beginning until it reaches (1). Using this function,
185
 * the parser can do instead:
186
 * ```cpp
187
 * handler.begin_seq_val_flow();        // (2) -- preemptively as val!
188
 * handler.set_val_scalar_plain("aa");
189
 * handler.add_sibling();
190
 * handler.set_val_scalar_plain("bb");
191
 * handler.end_seq();
192
 * handler.actually_as_new_map_key();   // (1) -- adjust when finding that the prev val was actually a key.
193
 * handler.begin_seq_val_flow();        // (3) -- go on as before
194
 * handler.set_val_scalar_plain("cc");
195
 * handler.add_sibling();
196
 * handler.set_val_scalar_plain("dd");
197
 * handler.end_seq();
198
 * handler.end_map();
199
 * ```
200
 */
201

202
class Tree;
203
class NodeRef;
204
class ConstNodeRef;
205
struct FilterResult;
206
struct FilterResultExtending;
207

208

209
typedef enum BlockChomp_ {
210
    CHOMP_CLIP,    //!< single newline at end (default)
211
    CHOMP_STRIP,   //!< no newline at end     (-)
212
    CHOMP_KEEP     //!< all newlines from end (+)
213
} BlockChomp_e;
214

215

216
//-----------------------------------------------------------------------------
217
//-----------------------------------------------------------------------------
218
//-----------------------------------------------------------------------------
219

220
/** Options to give to the parser to control its behavior. */
221
struct RYML_EXPORT ParserOptions
222
{
223
private:
224

225
    typedef enum : uint32_t {
226
        SCALAR_FILTERING = (1u << 0u),
227
        LOCATIONS = (1u << 1u),
228
        DETECT_FLOW_ML = (1u << 2u),
229
        DEFAULTS = SCALAR_FILTERING|DETECT_FLOW_ML,
230
    } Flags_e;
231

232
    uint32_t flags = DEFAULTS;
233

234
public:
235

236
    ParserOptions() = default;
3,151,994✔
237

238
public:
239

240
    /** @name source location tracking */
241
    /** @{ */
242

243
    /** enable/disable source location tracking */
244
    ParserOptions& locations(bool enabled) noexcept
150✔
245
    {
246
        if(enabled)
150✔
247
            flags |= LOCATIONS;
138✔
248
        else
249
            flags &= ~LOCATIONS;
12✔
250
        return *this;
150✔
251
    }
252
    /** query source location tracking status */
253
    C4_ALWAYS_INLINE bool locations() const noexcept { return (flags & LOCATIONS); }
741,736✔
254

255
    /** @} */
256

257
public:
258

259
    /** @name detection of @ref FLOW_ML container style */
260
    /** @{ */
261

262
    /** enable/disable detection of @ref FLOW_ML container style. When
263
     * enabled, the parser will set @ref FLOW_ML as the style of flow
264
     * containers which have the terminating bracket on a line
265
     * different from that of the opening bracket. */
266
    ParserOptions& detect_flow_ml(bool enabled) noexcept
6✔
267
    {
268
        if(enabled)
6✔
UNCOV
269
            flags |= DETECT_FLOW_ML;
×
270
        else
271
            flags &= ~DETECT_FLOW_ML;
6✔
272
        return *this;
6✔
273
    }
274
    /** query status of detection of @ref FLOW_ML container style. */
275
    C4_ALWAYS_INLINE bool detect_flow_ml() const noexcept { return (flags & DETECT_FLOW_ML); }
203,916✔
276

277
    /** @} */
278

279
public:
280

281
    /** @name scalar filtering status (experimental; disable at your discretion) */
282
    /** @{ */
283

284
    /** enable/disable scalar filtering while parsing */
285
    ParserOptions& scalar_filtering(bool enabled) noexcept
36✔
286
    {
287
        if(enabled)
36✔
288
            flags |= SCALAR_FILTERING;
6✔
289
        else
290
            flags &= ~SCALAR_FILTERING;
30✔
291
        return *this;
36✔
292
    }
293
    /** query scalar filtering status */
294
    C4_ALWAYS_INLINE bool scalar_filtering() const noexcept { return (flags & SCALAR_FILTERING); }
423,492✔
295

296
    /** @} */
297
};
298

299

300
//-----------------------------------------------------------------------------
301
//-----------------------------------------------------------------------------
302
//-----------------------------------------------------------------------------
303

304
/** This is the main driver of parsing logic: it scans the YAML or
305
 * JSON source for tokens, and emits the appropriate sequence of
306
 * parsing events to its event handler. The parse engine itself has no
307
 * special limitations, and *can* accomodate containers as keys; it is the
308
 * event handler may introduce additional constraints.
309
 *
310
 * There are two implemented handlers (see @ref doc_event_handlers,
311
 * which has important notes about the event model):
312
 *
313
 * - @ref EventHandlerTree is the handler responsible for creating the
314
 *   ryml @ref Tree
315
 *
316
 * - @ref extra::EventHandlerTestSuite is a handler responsible for emitting
317
 *   standardized [YAML test suite
318
 *   events](https://github.com/yaml/yaml-test-suite), used (only) in
319
 *   the CI of this project. This is not part of the library and is
320
 *   not installed.
321
 *
322
 * - @ref extra::EventHandlerInts is the handler responsible for
323
 *   emitting integer-coded events. It is intended for implementing
324
 *   fully-conformant parsing in other programming languages
325
 *   (integration is currently under work for
326
 *   [YamlScript](https://github.com/yaml/yamlscript) and
327
 *   [go-yaml](https://github.com/yaml/go-yaml/)). It is not part of
328
 *   the library and is not installed.
329
 *
330
 */
331
template<class EventHandler>
332
class ParseEngine
333
{
334
public:
335

336
    using handler_type = EventHandler;
337

338
public:
339

340
    /** @name construction and assignment */
341
    /** @{ */
342

343
    ParseEngine(EventHandler *evt_handler, ParserOptions opts={});
344
    ~ParseEngine();
345

346
    ParseEngine(ParseEngine &&) noexcept;
347
    ParseEngine(ParseEngine const&);
348
    ParseEngine& operator=(ParseEngine &&) noexcept;
349
    ParseEngine& operator=(ParseEngine const&);
350

351
    /** @} */
352

353
public:
354

355
    /** @name modifiers */
356
    /** @{ */
357

358
    /** Reserve a certain capacity for the parsing stack.
359
     * This should be larger than the expected depth of the parsed
360
     * YAML tree.
361
     *
362
     * The parsing stack is the only (potential) heap memory used
363
     * directly by the parser.
364
     *
365
     * If the requested capacity is below the default
366
     * stack size of 16, the memory is used directly in the parser
367
     * object; otherwise it will be allocated from the heap.
368
     *
369
     * @note this reserves memory only for the parser itself; all the
370
     * allocations for the parsed tree will go through the tree's
371
     * allocator (when different).
372
     *
373
     * @note for maximum efficiency, the tree and the arena can (and
374
     * should) also be reserved. */
375
    void reserve_stack(id_type capacity)
126✔
376
    {
377
        m_evt_handler->m_stack.reserve(capacity);
126✔
378
    }
126✔
379

380
    /** Reserve a certain capacity for the array used to track node
381
     * locations in the source buffer. */
382
    void reserve_locations(size_t num_source_lines)
84✔
383
    {
384
        _resize_locations(num_source_lines);
84✔
385
    }
84✔
386

387
    RYML_DEPRECATED("filter arena no longer needed")
UNCOV
388
    void reserve_filter_arena(size_t) {}
×
389

390
    /** @} */
391

392
public:
393

394
    /** @name getters */
395
    /** @{ */
396

397
    /** Get the options used to build this parser object. */
398
    ParserOptions const& options() const { return m_options; }
108✔
399

400
    /** Get the current callbacks in the parser. */
401
    Callbacks const& callbacks() const { _RYML_ASSERT_BASIC(m_evt_handler); return m_evt_handler->m_stack.m_callbacks; }
306,370✔
402

403
    /** Get the name of the latest file parsed by this object. */
404
    csubstr filename() const { return m_file; }
30✔
405

406
    /** Get the latest YAML buffer parsed by this object. */
407
    csubstr source() const { return m_buf; }
6,648✔
408

409
    /** Get the encoding of the latest YAML buffer parsed by this object.
410
     * If no encoding was specified, UTF8 is assumed as per the YAML standard. */
411
    Encoding_e encoding() const { return m_encoding != NOBOM ? m_encoding : UTF8; }
1,980✔
412

UNCOV
413
    id_type stack_capacity() const { _RYML_ASSERT_BASIC(m_evt_handler); return m_evt_handler->m_stack.capacity(); }
×
UNCOV
414
    size_t locations_capacity() const { return m_newline_offsets_capacity; }
×
415

416
    RYML_DEPRECATED("filter arena no longer needed")
UNCOV
417
    size_t filter_arena_capacity() const { return 0u; }
×
418

419
    /** @} */
420

421
public:
422

423
    /** @name parse methods */
424
    /** @{ */
425

426
    /** parse YAML in place, emitting events to the current handler */
427
    void parse_in_place_ev(csubstr filename, substr src);
428

429
    /** parse JSON in place, emitting events to the current handler */
430
    void parse_json_in_place_ev(csubstr filename, substr src);
431

432
    /** @} */
433

434
public:
435

436
    // deprecated parse methods
437

438
    /** @cond dev */
439
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t, size_t node_id);
440
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(                  substr yaml, Tree *t, size_t node_id);
441
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, Tree *t                );
442
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(                  substr yaml, Tree *t                );
443
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(csubstr filename, substr yaml, NodeRef node           );
444
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_place(                  substr yaml, NodeRef node           );
445
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(csubstr filename, substr yaml                         );
446
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_place(                  substr yaml                         );
447
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t, size_t node_id);
448
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  csubstr yaml, Tree *t, size_t node_id);
449
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, Tree *t                );
450
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  csubstr yaml, Tree *t                );
451
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, csubstr yaml, NodeRef node           );
452
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  csubstr yaml, NodeRef node           );
453
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, csubstr yaml                         );
454
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the function in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(                  csubstr yaml                         );
455
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t, size_t node_id);
456
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  substr yaml, Tree *t, size_t node_id);
457
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, Tree *t                );
458
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  substr yaml, Tree *t                );
459
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(csubstr filename, substr yaml, NodeRef node           );
460
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, void>::type parse_in_arena(                  substr yaml, NodeRef node           );
461
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(csubstr filename, substr yaml                         );
462
    template<class U=EventHandler> RYML_DEPRECATED("removed, deliberately undefined. use the csubstr version in parse.hpp.") typename std::enable_if<U::is_wtree, Tree>::type parse_in_arena(                  substr yaml                         );
463
    /** @endcond */
464

465
public:
466

467
    /** @name locations */
468
    /** @{ */
469

470
    /** Get the string starting at a particular location, to the end
471
     * of the parsed source buffer. */
472
    csubstr location_contents(Location const& loc) const;
473

474
    /** Given a pointer to a buffer position, get the location.
475
     * @param[in] val must be pointing to somewhere in the source
476
     * buffer that was last parsed by this object. */
477
    Location val_location(const char *val) const;
478

479
    /** @} */
480

481
public:
482

483
    /** @cond dev */
484
    template<class U>
485
    RYML_DEPRECATED("moved to Tree::location(Parser const&). deliberately undefined here.")
486
    auto location(Tree const&, id_type node) const -> typename std::enable_if<U::is_wtree, Location>::type;
487

488
    template<class U>
489
    RYML_DEPRECATED("moved to ConstNodeRef::location(Parser const&), deliberately undefined here.")
490
    auto location(ConstNodeRef const&) const -> typename std::enable_if<U::is_wtree, Location>::type;
491
    /** @endcond */
492

493
public:
494

495
    /** @name scalar filtering */
496
    /** @{*/
497

498
    /** filter a plain scalar */
499
    FilterResult filter_scalar_plain(csubstr scalar, substr dst, size_t indentation);
500
    /** filter a plain scalar in place */
501
    FilterResult filter_scalar_plain_in_place(substr scalar, size_t cap, size_t indentation);
502

503
    /** filter a single-quoted scalar */
504
    FilterResult filter_scalar_squoted(csubstr scalar, substr dst);
505
    /** filter a single-quoted scalar in place */
506
    FilterResult filter_scalar_squoted_in_place(substr scalar, size_t cap);
507

508
    /** filter a double-quoted scalar */
509
    FilterResult filter_scalar_dquoted(csubstr scalar, substr dst);
510
    /** filter a double-quoted scalar in place */
511
    FilterResultExtending filter_scalar_dquoted_in_place(substr scalar, size_t cap);
512

513
    /** filter a block-literal scalar */
514
    FilterResult filter_scalar_block_literal(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
515
    /** filter a block-literal scalar in place */
516
    FilterResult filter_scalar_block_literal_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
517

518
    /** filter a block-folded scalar */
519
    FilterResult filter_scalar_block_folded(csubstr scalar, substr dst, size_t indentation, BlockChomp_e chomp);
520
    /** filter a block-folded scalar in place */
521
    FilterResult filter_scalar_block_folded_in_place(substr scalar, size_t cap, size_t indentation, BlockChomp_e chomp);
522

523
    /** @} */
524

525
private:
526

527
    struct ScannedScalar
528
    {
529
        substr scalar;
530
        bool needs_filter;
531
    };
532

533
    struct ScannedBlock
534
    {
535
        substr scalar;
536
        size_t indentation;
537
        BlockChomp_e chomp;
538
    };
539

540
private:
541

542
    bool    _is_doc_begin(csubstr s);
543
    bool    _is_doc_end(csubstr s);
544

545
    bool    _scan_scalar_plain_blck(ScannedScalar *C4_RESTRICT sc, size_t indentation);
546
    bool    _scan_scalar_plain_seq_flow(ScannedScalar *C4_RESTRICT sc);
547
    bool    _scan_scalar_plain_seq_blck(ScannedScalar *C4_RESTRICT sc);
548
    bool    _scan_scalar_plain_map_flow(ScannedScalar *C4_RESTRICT sc);
549
    bool    _scan_scalar_plain_map_blck(ScannedScalar *C4_RESTRICT sc);
550
    bool    _scan_scalar_map_json(ScannedScalar *C4_RESTRICT sc);
551
    bool    _scan_scalar_seq_json(ScannedScalar *C4_RESTRICT sc);
552
    bool    _scan_scalar_plain_unk(ScannedScalar *C4_RESTRICT sc);
553
    bool    _is_valid_start_scalar_plain_flow(csubstr s);
554

555
    ScannedScalar _scan_scalar_squot();
556
    ScannedScalar _scan_scalar_dquot();
557

558
    void    _scan_block(ScannedBlock *C4_RESTRICT sb, size_t indref);
559

560
    csubstr _scan_anchor();
561
    csubstr _scan_ref_seq();
562
    csubstr _scan_ref_map();
563
    csubstr _scan_tag();
564

565
public: // exposed for testing
566

567
    /** @cond dev */
568
    csubstr _filter_scalar_plain(substr s, size_t indentation);
569
    csubstr _filter_scalar_squot(substr s);
570
    csubstr _filter_scalar_dquot(substr s);
571
    csubstr _filter_scalar_literal(substr s, size_t indentation, BlockChomp_e chomp);
572
    csubstr _filter_scalar_folded(substr s, size_t indentation, BlockChomp_e chomp);
573
    csubstr _move_scalar_left_and_add_newline(substr s);
574

575
    csubstr _maybe_filter_key_scalar_plain(ScannedScalar const& sc, size_t indendation);
576
    csubstr _maybe_filter_val_scalar_plain(ScannedScalar const& sc, size_t indendation);
577
    csubstr _maybe_filter_key_scalar_squot(ScannedScalar const& sc);
578
    csubstr _maybe_filter_val_scalar_squot(ScannedScalar const& sc);
579
    csubstr _maybe_filter_key_scalar_dquot(ScannedScalar const& sc);
580
    csubstr _maybe_filter_val_scalar_dquot(ScannedScalar const& sc);
581
    csubstr _maybe_filter_key_scalar_literal(ScannedBlock const& sb);
582
    csubstr _maybe_filter_val_scalar_literal(ScannedBlock const& sb);
583
    csubstr _maybe_filter_key_scalar_folded(ScannedBlock const& sb);
584
    csubstr _maybe_filter_val_scalar_folded(ScannedBlock const& sb);
585
    /** @endcond */
586

587
private:
588

589
    void  _handle_map_block();
590
    void  _handle_seq_block();
591
    void  _handle_map_flow();
592
    void  _handle_seq_flow();
593
    void  _handle_seq_imap();
594
    void  _handle_map_json();
595
    void  _handle_seq_json();
596

597
    void  _handle_unk();
598
    void  _handle_unk_json();
599
    void  _handle_usty();
600

601
    void  _handle_flow_skip_whitespace();
602

603
    void  _end_map_flow();
604
    void  _end_seq_flow();
605
    void  _end_map_blck();
606
    void  _end_seq_blck();
607
    void  _end2_map();
608
    void  _end2_seq();
609

610
    void  _begin2_doc();
611
    void  _begin2_doc_expl();
612
    void  _end2_doc();
613
    void  _end2_doc_expl();
614

615
    void  _maybe_begin_doc();
616
    void  _maybe_end_doc();
617

618
    void  _start_doc_suddenly();
619
    void  _end_doc_suddenly();
620
    void  _end_doc_suddenly__pop();
621
    void  _end_stream();
622

623
    void  _set_indentation(size_t indentation);
624
    void  _save_indentation();
625
    void  _handle_indentation_pop_from_block_seq();
626
    void  _handle_indentation_pop_from_block_map();
627
    void  _handle_indentation_pop(ParserState const* dst);
628

629
    void _maybe_skip_comment();
630
    void _skip_comment();
631
    void _maybe_skip_whitespace_tokens();
632
    void _maybe_skipchars(char c);
633
    #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
634
    void _maybe_skipchars_up_to(char c, size_t max_to_skip);
635
    #endif
636
    template<size_t N>
637
    void _skipchars(const char (&chars)[N]);
638
    bool _maybe_scan_following_colon() noexcept;
639
    bool _maybe_scan_following_comma() noexcept;
640

641
public:
642

643
    /** @cond dev */
644
    template<class FilterProcessor> auto _filter_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation) -> decltype(proc.result());
645
    template<class FilterProcessor> auto _filter_squoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
646
    template<class FilterProcessor> auto _filter_dquoted(FilterProcessor &C4_RESTRICT proc) -> decltype(proc.result());
647
    template<class FilterProcessor> auto _filter_block_literal(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
648
    template<class FilterProcessor> auto _filter_block_folded(FilterProcessor &C4_RESTRICT proc, size_t indentation, BlockChomp_e chomp) -> decltype(proc.result());
649
    /** @endcond */
650

651
public:
652

653
    /** @cond dev */
654
    template<class FilterProcessor> void   _filter_nl_plain(FilterProcessor &C4_RESTRICT proc, size_t indentation);
655
    template<class FilterProcessor> void   _filter_nl_squoted(FilterProcessor &C4_RESTRICT proc);
656
    template<class FilterProcessor> void   _filter_nl_dquoted(FilterProcessor &C4_RESTRICT proc);
657

658
    template<class FilterProcessor> bool   _filter_ws_handle_to_first_non_space(FilterProcessor &C4_RESTRICT proc);
659
    template<class FilterProcessor> void   _filter_ws_copy_trailing(FilterProcessor &C4_RESTRICT proc);
660
    template<class FilterProcessor> void   _filter_ws_skip_trailing(FilterProcessor &C4_RESTRICT proc);
661

662
    template<class FilterProcessor> void   _filter_dquoted_backslash(FilterProcessor &C4_RESTRICT proc);
663

664
    template<class FilterProcessor> void   _filter_chomp(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp, size_t indentation);
665
    template<class FilterProcessor> size_t _handle_all_whitespace(FilterProcessor &C4_RESTRICT proc, BlockChomp_e chomp);
666
    template<class FilterProcessor> size_t _extend_to_chomp(FilterProcessor &C4_RESTRICT proc, size_t contents_len);
667
    template<class FilterProcessor> void   _filter_block_indentation(FilterProcessor &C4_RESTRICT proc, size_t indentation);
668
    template<class FilterProcessor> void   _filter_block_folded_newlines(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
669
    template<class FilterProcessor> size_t _filter_block_folded_newlines_compress(FilterProcessor &C4_RESTRICT proc, size_t num_newl, size_t wpos_at_first_newl);
670
    template<class FilterProcessor> void   _filter_block_folded_newlines_leading(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len);
671
    template<class FilterProcessor> void   _filter_block_folded_indented_block(FilterProcessor &C4_RESTRICT proc, size_t indentation, size_t len, size_t curr_indentation) noexcept;
672

673
    /** @endcond */
674

675
private:
676

677
    void _line_progressed(size_t ahead);
678
    void _line_ended();
679
    void _line_ended_undo();
680

681
    bool  _finished_file() const;
682
    bool  _finished_line() const;
683

684
    void   _scan_line();
685
    substr _peek_next_line(size_t pos=npos) const;
686

687
    bool _at_line_begin() const
1,274,632✔
688
    {
689
        return m_evt_handler->m_curr->line_contents.rem.begin() == m_evt_handler->m_curr->line_contents.full.begin();
3,823,896✔
690
    }
691

692
    void _relocate_arena(csubstr prev_arena, substr next_arena);
693
    static void _s_relocate_arena(void*, csubstr prev_arena, substr next_arena);
694

695
private:
696

697
    C4_ALWAYS_INLINE bool has_all(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == f; }
10,850,202✔
698
    C4_ALWAYS_INLINE bool has_any(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) != 0; }
35,730,168✔
699
    C4_ALWAYS_INLINE bool has_none(ParserFlag_t f) const noexcept { return (m_evt_handler->m_curr->flags & f) == 0; }
17,780,570✔
700
    static C4_ALWAYS_INLINE bool has_all(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == f; }
79,644✔
701
    static C4_ALWAYS_INLINE bool has_any(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) != 0; }
14,280✔
702
    static C4_ALWAYS_INLINE bool has_none(ParserFlag_t f, ParserState const* C4_RESTRICT s) noexcept { return (s->flags & f) == 0; }
13,416✔
703

704
    #ifndef RYML_DBG
UNCOV
705
    C4_ALWAYS_INLINE static void add_flags(ParserFlag_t on, ParserState *C4_RESTRICT s) noexcept { s->flags |= on; }
×
UNCOV
706
    C4_ALWAYS_INLINE static void addrem_flags(ParserFlag_t on, ParserFlag_t off, ParserState *C4_RESTRICT s) noexcept { s->flags &= ~off; s->flags |= on; }
×
UNCOV
707
    C4_ALWAYS_INLINE static void rem_flags(ParserFlag_t off, ParserState *C4_RESTRICT s) noexcept { s->flags &= ~off; }
×
708
    C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { m_evt_handler->m_curr->flags |= on; }
553,128✔
709
    C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; m_evt_handler->m_curr->flags |= on; }
2,285,570✔
710
    C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { m_evt_handler->m_curr->flags &= ~off; }
84✔
711
    #else
712
    static void add_flags(ParserFlag_t on, ParserState *C4_RESTRICT s);
713
    static void addrem_flags(ParserFlag_t on, ParserFlag_t off, ParserState *C4_RESTRICT s);
714
    static void rem_flags(ParserFlag_t off, ParserState *C4_RESTRICT s);
715
    C4_ALWAYS_INLINE void add_flags(ParserFlag_t on) noexcept { add_flags(on, m_evt_handler->m_curr); }
276,528✔
716
    C4_ALWAYS_INLINE void addrem_flags(ParserFlag_t on, ParserFlag_t off) noexcept { addrem_flags(on, off, m_evt_handler->m_curr); }
1,142,462✔
717
    C4_ALWAYS_INLINE void rem_flags(ParserFlag_t off) noexcept { rem_flags(off, m_evt_handler->m_curr); }
42✔
718
    #endif
719

720
private:
721

722
    void _prepare_locations();
723
    void _resize_locations(size_t sz);
724
    bool _locations_dirty() const;
725

726
private:
727

728
    void _reset();
729
    void _free();
730
    void _clr();
731

732
    #ifdef RYML_DBG
733
    template<class ...Args> C4_NO_INLINE void _dbg(csubstr fmt, Args const& ...args) const;
734
    template<class DumpFn>  C4_NO_INLINE void _fmt_msg(DumpFn &&dumpfn) const;
735
    #endif
736
    template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, const char *fmt, Args const& ...args) const;
737
    template<class ...Args> C4_NORETURN C4_NO_INLINE void _err(Location const& cpploc, Location const& ymlloc, const char *fmt, Args const& ...args) const;
738

739

740
private:
741

742
    /** store pending tag or anchor/ref annotations */
743
    struct Annotation
744
    {
745
        struct Entry
746
        {
747
            csubstr str;
748
            size_t indentation;
749
            size_t line;
750
        };
751
        Entry annotations[2];
752
        size_t num_entries;
753
    };
754

755
    void _handle_colon();
756
    void _add_annotation(Annotation *C4_RESTRICT dst, csubstr str, size_t indentation, size_t line);
757
    void _clear_annotations(Annotation *C4_RESTRICT dst);
UNCOV
758
    bool _has_pending_annotations() const { return m_pending_tags.num_entries || m_pending_anchors.num_entries; }
×
759
    #ifdef RYML_NO_COVERAGE__TO_BE_DELETED
760
    bool _handle_indentation_from_annotations();
761
    #endif
762
    bool _annotations_require_key_container() const;
763
    void _handle_annotations_before_blck_key_scalar();
764
    void _handle_annotations_before_blck_val_scalar();
765
    void _handle_annotations_before_start_mapblck(size_t current_line);
766
    void _handle_annotations_before_start_mapblck_as_key();
767
    void _handle_annotations_and_indentation_after_start_mapblck(size_t key_indentation, size_t key_line);
768
    size_t _select_indentation_from_annotations(size_t val_indentation, size_t val_line);
769
    void _handle_directive(csubstr rem);
770
    bool _handle_bom();
771
    void _handle_bom(Encoding_e enc);
772

773
    void _check_tag(csubstr tag);
774

775
private:
776

777
    ParserOptions m_options;
778

779
    csubstr m_file;
780
    substr  m_buf;
781

782
public:
783

784
    /** @cond dev */
785
    EventHandler *C4_RESTRICT m_evt_handler; // NOLINT
786
    /** @endcond */
787

788
private:
789

790
    Annotation m_pending_anchors;
791
    Annotation m_pending_tags;
792

793
    bool m_was_inside_qmrk;
794
    bool m_doc_empty = true;
795
    size_t m_prev_colon = npos;
796

797
    Encoding_e m_encoding = UTF8;
798

799
private:
800

801
    size_t *m_newline_offsets;
802
    size_t  m_newline_offsets_size;
803
    size_t  m_newline_offsets_capacity;
804
    csubstr m_newline_offsets_buf;
805

806
};
807

808

809
/** Quickly inspect the source to estimate the number of nodes the
810
 * resulting tree is likely have. If a tree is empty before
811
 * parsing, considerable time will be spent growing it, so calling
812
 * this to reserve the tree size prior to parsing is likely to
813
 * result in a time gain. We encourage using this method before
814
 * parsing, but as always measure its impact in performance to
815
 * obtain a good trade-off.
816
 *
817
 * @note since this method is meant for optimizing performance, it
818
 * is approximate. The result may be actually smaller than the
819
 * resulting number of nodes, notably if the YAML uses implicit
820
 * maps as flow seq members as in `[these: are, individual:
821
 * maps]`. */
822
RYML_EXPORT id_type estimate_tree_capacity(csubstr src); // NOLINT(readability-redundant-declaration)
823

824
/** @} */
825

826
} // namespace yml
827
} // namespace c4
828

829
// NOLINTEND(hicpp-signed-bitwise)
830

831
#if defined(_MSC_VER)
832
#   pragma warning(pop)
833
#endif
834

835
#endif /* _C4_YML_PARSE_ENGINE_HPP_ */
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc