• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

Qiskit / qiskit / 9562946226

18 Jun 2024 09:12AM UTC coverage: 89.753% (+0.01%) from 89.739%
9562946226

push

github

web-flow
Remove Eric from Rust bot notifications (#12596)

63417 of 70657 relevant lines covered (89.75%)

302469.37 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

92.11
/crates/qasm2/src/lex.rs
1
// This code is part of Qiskit.
2
//
3
// (C) Copyright IBM 2023
4
//
5
// This code is licensed under the Apache License, Version 2.0. You may
6
// obtain a copy of this license in the LICENSE.txt file in the root directory
7
// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
8
//
9
// Any modifications or derivative works of this code must retain this
10
// copyright notice, and modified files need to carry a notice indicating
11
// that they have been altered from the originals.
12

13
//! The lexing logic for OpenQASM 2, responsible for turning a sequence of bytes into a
14
//! lexed [TokenStream] for consumption by the parsing machinery.  The general strategy here is
15
//! quite simple; for all the symbol-like tokens, the lexer can use a very simple single-byte
16
//! lookahead to determine what token it needs to emit.  For keywords and identifiers, we just read
17
//! the identifier in completely, then produce the right token once we see the end of the
18
//! identifier characters.
19
//!
20
//! We effectively use a custom lexing mode to handle the version information after the `OPENQASM`
21
//! keyword; the spec technically says that any real number is valid, but in reality that leads to
22
//! weirdness like `200.0e-2` being a valid version specifier.  We do things with a custom
23
//! context-dependent match after seeing an `OPENQASM` token, to avoid clashes with the general
24
//! real-number tokenisation.
25

26
use hashbrown::HashMap;
27
use pyo3::prelude::PyResult;
28

29
use std::path::Path;
30

31
use crate::error::{message_generic, Position, QASM2ParseError};
32

33
/// Tokenised version information data.  This is more structured than the real number suggested by
34
/// the specification.
35
#[derive(Clone, Debug)]
36
pub struct Version {
37
    pub major: usize,
38
    pub minor: Option<usize>,
39
}
40

41
/// The context that is necessary to fully extract the information from a [Token].  This owns, for
42
/// example, the text of each token (where a token does not have a static text representation),
43
/// from which the other properties can later be derived.  This struct is effectively entirely
44
/// opaque outside this module; the associated functions on [Token] take this context object,
45
/// however, and extract the information from it.
46
#[derive(Clone, Debug)]
47
pub struct TokenContext {
48
    text: Vec<String>,
49
    lookup: HashMap<Vec<u8>, usize>,
50
}
51

52
impl TokenContext {
53
    /// Create a new context for tokens.  Nothing is heap-allocated until required.
54
    pub fn new() -> Self {
1,236✔
55
        TokenContext {
1,236✔
56
            text: vec![],
1,236✔
57
            lookup: HashMap::new(),
1,236✔
58
        }
1,236✔
59
    }
1,236✔
60

61
    /// Intern the given `ascii_text` of a [Token], and return an index into the [TokenContext].
62
    /// This will not store strings that are already present in the context; instead, the previous
63
    /// index is transparently returned.
64
    fn index(&mut self, ascii_text: &[u8]) -> usize {
20,408✔
65
        match self.lookup.get(ascii_text) {
20,408✔
66
            Some(index) => *index,
11,652✔
67
            None => {
68
                let index = self.text.len();
8,756✔
69
                self.lookup.insert(ascii_text.to_vec(), index);
8,756✔
70
                self.text
8,756✔
71
                    .push(std::str::from_utf8(ascii_text).unwrap().to_owned());
8,756✔
72
                index
8,756✔
73
            }
74
        }
75
    }
20,408✔
76
}
77

78
// Clippy complains without this.
79
impl Default for TokenContext {
80
    fn default() -> Self {
×
81
        Self::new()
×
82
    }
×
83
}
84

85
/// An enumeration of the different types of [Token] that can be created during lexing.  This is
86
/// deliberately not a data enum, to make various abstract `expect` (and so on) methods more
87
/// ergonomic to use; one does not need to completely define the pattern match each time, but can
88
/// simply pass the type identifier.  This also saves memory, since the static variants do not need
89
/// to be aligned to include the space necessary for text pointers that would be in the non-static
90
/// forms, and allows strings to be shared between many tokens (using the [TokenContext] store).
91
#[derive(PartialEq, Eq, Clone, Copy, Debug)]
92
pub enum TokenType {
93
    // Keywords
94
    OpenQASM,
95
    Barrier,
96
    Cos,
97
    Creg,
98
    Exp,
99
    Gate,
100
    If,
101
    Include,
102
    Ln,
103
    Measure,
104
    Opaque,
105
    Qreg,
106
    Reset,
107
    Sin,
108
    Sqrt,
109
    Tan,
110
    Pi,
111
    // Symbols
112
    Plus,
113
    Minus,
114
    Arrow,
115
    Asterisk,
116
    Equals,
117
    Slash,
118
    Caret,
119
    Semicolon,
120
    Comma,
121
    LParen,
122
    RParen,
123
    LBracket,
124
    RBracket,
125
    LBrace,
126
    RBrace,
127
    // Content
128
    Id,
129
    Real,
130
    Integer,
131
    Filename,
132
    Version,
133
}
134

135
impl TokenType {
136
    pub fn variable_text(&self) -> bool {
51,166✔
137
        match self {
51,166✔
138
            TokenType::OpenQASM
139
            | TokenType::Barrier
140
            | TokenType::Cos
141
            | TokenType::Creg
142
            | TokenType::Exp
143
            | TokenType::Gate
144
            | TokenType::If
145
            | TokenType::Include
146
            | TokenType::Ln
147
            | TokenType::Measure
148
            | TokenType::Opaque
149
            | TokenType::Qreg
150
            | TokenType::Reset
151
            | TokenType::Sin
152
            | TokenType::Sqrt
153
            | TokenType::Tan
154
            | TokenType::Pi
155
            | TokenType::Plus
156
            | TokenType::Minus
157
            | TokenType::Arrow
158
            | TokenType::Asterisk
159
            | TokenType::Equals
160
            | TokenType::Slash
161
            | TokenType::Caret
162
            | TokenType::Semicolon
163
            | TokenType::Comma
164
            | TokenType::LParen
165
            | TokenType::RParen
166
            | TokenType::LBracket
167
            | TokenType::RBracket
168
            | TokenType::LBrace
169
            | TokenType::RBrace => false,
30,758✔
170
            TokenType::Id
171
            | TokenType::Real
172
            | TokenType::Integer
173
            | TokenType::Filename
174
            | TokenType::Version => true,
20,408✔
175
        }
176
    }
51,166✔
177

178
    /// Get a static description of the token type.  This is useful for producing messages when the
179
    /// full token context isn't available, or isn't important.
180
    pub fn describe(&self) -> &'static str {
160✔
181
        match self {
160✔
182
            TokenType::OpenQASM => "OPENQASM",
×
183
            TokenType::Barrier => "barrier",
×
184
            TokenType::Cos => "cos",
×
185
            TokenType::Creg => "creg",
×
186
            TokenType::Exp => "exp",
×
187
            TokenType::Gate => "gate",
2✔
188
            TokenType::If => "if",
2✔
189
            TokenType::Include => "include",
×
190
            TokenType::Ln => "ln",
×
191
            TokenType::Measure => "measure",
2✔
192
            TokenType::Opaque => "opaque",
×
193
            TokenType::Qreg => "qreg",
×
194
            TokenType::Reset => "reset",
30✔
195
            TokenType::Sin => "sin",
×
196
            TokenType::Sqrt => "sqrt",
×
197
            TokenType::Tan => "tan",
×
198
            TokenType::Pi => "pi",
×
199
            TokenType::Plus => "+",
×
200
            TokenType::Minus => "-",
×
201
            TokenType::Arrow => "->",
12✔
202
            TokenType::Asterisk => "*",
×
203
            TokenType::Equals => "==",
8✔
204
            TokenType::Slash => "/",
×
205
            TokenType::Caret => "^",
×
206
            TokenType::Semicolon => ";",
6✔
207
            TokenType::Comma => ",",
22✔
208
            TokenType::LParen => "(",
2✔
209
            TokenType::RParen => ")",
×
210
            TokenType::LBracket => "[",
4✔
211
            TokenType::RBracket => "]",
×
212
            TokenType::LBrace => "{",
×
213
            TokenType::RBrace => "}",
30✔
214
            TokenType::Id => "an identifier",
18✔
215
            TokenType::Real => "a real number",
20✔
216
            TokenType::Integer => "an integer",
2✔
217
            TokenType::Filename => "a filename string",
×
218
            TokenType::Version => "a '<major>.<minor>' version",
×
219
        }
220
    }
160✔
221
}
222

223
/// A representation of a token, including its type, span information and pointer to where its text
224
/// is stored in the context object.  These are relatively lightweight objects (though of course
225
/// not as light as the single type information).
226
#[derive(Clone, Copy, Debug)]
227
pub struct Token {
228
    pub ttype: TokenType,
229
    // The `line` and `col` refer only to the start of the token.  There are no tokens that span
230
    // more than one line (we don't tokenise comments), but the ending column offset can be
231
    // calculated by asking the associated `TokenContext` for the text associated with this token,
232
    // and inspecting the length of the returned value.
233
    pub line: usize,
234
    pub col: usize,
235
    // Index into the TokenContext object, to retrieve the text that makes up the token.  We don't
236
    // resolve this into a value during lexing; that comes with annoying typing issues or storage
237
    // wastage.  Instead, we only convert the text into a value type when asked to by calling a
238
    // relevant method on the token.
239
    index: usize,
240
}
241

242
impl Token {
243
    /// Get a reference to the string that was seen to generate this token.
244
    pub fn text<'a>(&self, context: &'a TokenContext) -> &'a str {
368✔
245
        match self.ttype {
368✔
246
            TokenType::Id
247
            | TokenType::Real
248
            | TokenType::Integer
249
            | TokenType::Filename
250
            | TokenType::Version => &context.text[self.index],
356✔
251
            _ => self.ttype.describe(),
12✔
252
        }
253
    }
368✔
254

255
    /// If the token is an identifier, this method can be called to get an owned string containing
256
    /// the text of the identifier.  Panics if the token is not an identifier.
257
    pub fn id(&self, context: &TokenContext) -> String {
10,966✔
258
        if self.ttype != TokenType::Id {
10,966✔
259
            panic!()
×
260
        }
10,966✔
261
        (&context.text[self.index]).into()
10,966✔
262
    }
10,966✔
263

264
    /// If the token is a real number, this method can be called to evaluate its value.  Panics if
265
    /// the token is not a float or an integer.
266
    pub fn real(&self, context: &TokenContext) -> f64 {
1,634✔
267
        if !(self.ttype == TokenType::Real || self.ttype == TokenType::Integer) {
1,634✔
268
            panic!()
×
269
        }
1,634✔
270
        context.text[self.index].parse().unwrap()
1,634✔
271
    }
1,634✔
272

273
    /// If the token is an integer (by type, not just by value), this method can be called to
274
    /// evaluate its value.  Panics if the token is not an integer type.
275
    pub fn int(&self, context: &TokenContext) -> usize {
6,392✔
276
        if self.ttype != TokenType::Integer {
6,392✔
277
            panic!()
×
278
        }
6,392✔
279
        context.text[self.index].parse().unwrap()
6,392✔
280
    }
6,392✔
281

282
    /// If the token is a filename path, this method can be called to get a (regular) string
283
    /// representing it.  Panics if the token type was not a filename.
284
    pub fn filename(&self, context: &TokenContext) -> String {
364✔
285
        if self.ttype != TokenType::Filename {
364✔
286
            panic!()
×
287
        }
364✔
288
        let out = &context.text[self.index];
364✔
289
        // String slicing is fine to assume bytes here, because the characters we're slicing out
364✔
290
        // must both be the ASCII '"', which is a single-byte UTF-8 character.
364✔
291
        out[1..out.len() - 1].into()
364✔
292
    }
364✔
293

294
    /// If the token is a version-information token, this method can be called to evaluate the
295
    /// version information.  Panics if the token was not of the correct type.
296
    pub fn version(&self, context: &TokenContext) -> Version {
556✔
297
        if self.ttype != TokenType::Version {
556✔
298
            panic!()
×
299
        }
556✔
300
        // Everything in the version token is a valid ASCII character, so must be a one-byte token.
556✔
301
        let text = &context.text[self.index];
556✔
302
        match text.chars().position(|c| c == '.') {
1,106✔
303
            Some(pos) => Version {
550✔
304
                major: text[0..pos].parse().unwrap(),
550✔
305
                minor: Some(text[pos + 1..text.len()].parse().unwrap()),
550✔
306
            },
550✔
307
            None => Version {
6✔
308
                major: text.parse().unwrap(),
6✔
309
                minor: None,
6✔
310
            },
6✔
311
        }
312
    }
556✔
313
}
314

315
/// The workhouse struct of the lexer.  This represents a peekable iterable object that is abstract
316
/// over some buffered reader.  The struct itself essentially represents the mutable state of the
317
/// lexer, with its main public associated functions being the iterable method [Self::next()] and
318
/// the [std::iter::Peekable]-like function [Self::peek()].
319
///
320
/// The stream exposes one public attributes directly: the [filename] that this stream comes from
321
/// (set to some placeholder value for streams that do not have a backing file).  The associated
322
/// `TokenContext` object is managed separately to the stream and is passed in each call to `next`;
323
/// this allows for multiple streams to operate on the same context, such as when a new stream
324
/// begins in order to handle an `include` statement.
325
pub struct TokenStream {
326
    /// The filename from which this stream is derived.  May be a placeholder if there is no
327
    /// backing file or other named resource.
328
    pub filename: std::ffi::OsString,
329
    strict: bool,
330
    source: Box<dyn std::io::BufRead + Send>,
331
    line_buffer: Vec<u8>,
332
    done: bool,
333
    line: usize,
334
    col: usize,
335
    try_version: bool,
336
    // This is a manual peekable structure (rather than using the `peekable` method of `Iterator`)
337
    // because we still want to be able to access the other members of the struct at the same time.
338
    peeked: Option<Option<Token>>,
339
}
340

341
impl TokenStream {
342
    /// Create and initialise a generic [TokenStream], given a source that implements
343
    /// [std::io::BufRead] and a filename (or resource path) that describes its source.
344
    fn new(
1,254✔
345
        source: Box<dyn std::io::BufRead + Send>,
1,254✔
346
        filename: std::ffi::OsString,
1,254✔
347
        strict: bool,
1,254✔
348
    ) -> Self {
1,254✔
349
        TokenStream {
1,254✔
350
            filename,
1,254✔
351
            strict,
1,254✔
352
            source,
1,254✔
353
            line_buffer: Vec::with_capacity(80),
1,254✔
354
            done: false,
1,254✔
355
            // The first line is numbered "1", and the first column is "0".  The counts are
1,254✔
356
            // initialised like this so the first call to `next_byte` can easily detect that it
1,254✔
357
            // needs to extract the next line.
1,254✔
358
            line: 0,
1,254✔
359
            col: 0,
1,254✔
360
            try_version: false,
1,254✔
361
            peeked: None,
1,254✔
362
        }
1,254✔
363
    }
1,254✔
364

365
    /// Create a [TokenStream] from a string containing the OpenQASM 2 program.
366
    pub fn from_string(string: String, strict: bool) -> Self {
1,142✔
367
        TokenStream::new(
1,142✔
368
            Box::new(std::io::Cursor::new(string)),
1,142✔
369
            "<input>".into(),
1,142✔
370
            strict,
1,142✔
371
        )
1,142✔
372
    }
1,142✔
373

374
    /// Create a [TokenStream] from a path containing the OpenQASM 2 program.
375
    pub fn from_path<P: AsRef<Path>>(path: P, strict: bool) -> Result<Self, std::io::Error> {
112✔
376
        let file = std::fs::File::open(path.as_ref())?;
112✔
377
        Ok(TokenStream::new(
112✔
378
            Box::new(std::io::BufReader::new(file)),
112✔
379
            Path::file_name(path.as_ref()).unwrap().into(),
112✔
380
            strict,
112✔
381
        ))
112✔
382
    }
112✔
383

384
    /// Read the next line into the managed buffer in the struct, updating the tracking information
385
    /// of the position, and the `done` state of the iterator.
386
    fn advance_line(&mut self) -> PyResult<usize> {
7,000✔
387
        if self.done {
7,000✔
388
            Ok(0)
812✔
389
        } else {
390
            self.line += 1;
6,188✔
391
            self.col = 0;
6,188✔
392
            self.line_buffer.clear();
6,188✔
393
            // We can assume that nobody's running this on ancient Mac software that uses only '\r'
6,188✔
394
            // as its linebreak character.
6,188✔
395
            match self.source.read_until(b'\n', &mut self.line_buffer) {
6,188✔
396
                Ok(count) => {
6,184✔
397
                    if count == 0 || self.line_buffer[count - 1] != b'\n' {
6,184✔
398
                        self.done = true;
1,184✔
399
                    }
5,000✔
400
                    Ok(count)
6,184✔
401
                }
402
                Err(err) => {
4✔
403
                    self.done = true;
4✔
404
                    Err(QASM2ParseError::new_err(message_generic(
4✔
405
                        Some(&Position::new(&self.filename, self.line, self.col)),
4✔
406
                        &format!("lexer failed to read stream: {}", err),
4✔
407
                    )))
4✔
408
                }
409
            }
410
        }
411
    }
7,000✔
412

413
    /// Get the next character in the stream.  This updates the line and column information for the
414
    /// current byte as well.
415
    fn next_byte(&mut self) -> PyResult<Option<u8>> {
135,092✔
416
        if self.col >= self.line_buffer.len() && self.advance_line()? == 0 {
135,092✔
417
            return Ok(None);
2✔
418
        }
135,090✔
419
        let out = self.line_buffer[self.col];
135,090✔
420
        self.col += 1;
135,090✔
421
        match out {
135,090✔
422
            b @ 0x80..=0xff => {
2✔
423
                self.done = true;
2✔
424
                Err(QASM2ParseError::new_err(message_generic(
2✔
425
                    Some(&Position::new(&self.filename, self.line, self.col)),
2✔
426
                    &format!("encountered a non-ASCII byte: {:02X?}", b),
2✔
427
                )))
2✔
428
            }
429
            b => Ok(Some(b)),
135,088✔
430
        }
431
    }
135,092✔
432

433
    /// Peek at the next byte in the stream without consuming it.  This still returns an error if
434
    /// the next byte isn't in the valid range for OpenQASM 2, or if the file/stream has failed to
435
    /// read into the buffer for some reason.
436
    fn peek_byte(&mut self) -> PyResult<Option<u8>> {
176,752✔
437
        if self.col >= self.line_buffer.len() && self.advance_line()? == 0 {
176,752✔
438
            return Ok(None);
892✔
439
        }
175,856✔
440
        match self.line_buffer[self.col] {
175,856✔
441
            b @ 0x80..=0xff => {
6✔
442
                self.done = true;
6✔
443
                Err(QASM2ParseError::new_err(message_generic(
6✔
444
                    Some(&Position::new(&self.filename, self.line, self.col)),
6✔
445
                    &format!("encountered a non-ASCII byte: {:02X?}", b),
6✔
446
                )))
6✔
447
            }
448
            b => Ok(Some(b)),
175,850✔
449
        }
450
    }
176,752✔
451

452
    /// Expect that the next byte is not a word continuation, providing a suitable error message if
453
    /// it is.
454
    fn expect_word_boundary(&mut self, after: &str, start_col: usize) -> PyResult<()> {
8,588✔
455
        match self.peek_byte()? {
8,588✔
456
            Some(c @ (b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_')) => {
6,654✔
457
                Err(QASM2ParseError::new_err(message_generic(
12✔
458
                    Some(&Position::new(&self.filename, self.line, start_col)),
12✔
459
                    &format!(
12✔
460
                        "expected a word boundary after {}, but saw '{}'",
12✔
461
                        after, c as char
12✔
462
                    ),
12✔
463
                )))
12✔
464
            }
465
            _ => Ok(()),
8,576✔
466
        }
467
    }
8,588✔
468

469
    /// Complete the lexing of a floating-point value from the position of maybe accepting an
470
    /// exponent.  The previous part of the token must be a valid stand-alone float, or the next
471
    /// byte must already have been peeked and known to be `b'e' | b'E'`.
472
    fn lex_float_exponent(&mut self, start_col: usize) -> PyResult<TokenType> {
1,162✔
473
        if !matches!(self.peek_byte()?, Some(b'e' | b'E')) {
1,162✔
474
            self.expect_word_boundary("a float", start_col)?;
1,116✔
475
            return Ok(TokenType::Real);
1,112✔
476
        }
46✔
477
        // Consume the rest of the exponent.
46✔
478
        self.next_byte()?;
46✔
479
        if let Some(b'+' | b'-') = self.peek_byte()? {
46✔
480
            self.next_byte()?;
40✔
481
        }
6✔
482
        // Exponents must have at least one digit in them.
483
        if !matches!(self.peek_byte()?, Some(b'0'..=b'9')) {
46✔
484
            return Err(QASM2ParseError::new_err(message_generic(
6✔
485
                Some(&Position::new(&self.filename, self.line, start_col)),
6✔
486
                "needed to see an integer exponent for this float",
6✔
487
            )));
6✔
488
        }
40✔
489
        while let Some(b'0'..=b'9') = self.peek_byte()? {
114✔
490
            self.next_byte()?;
74✔
491
        }
492
        self.expect_word_boundary("a float", start_col)?;
40✔
493
        Ok(TokenType::Real)
38✔
494
    }
1,162✔
495

496
    /// Lex a numeric token completely.  This can return a successful integer or a real number; the
497
    /// function distinguishes based on what it sees.  If `self.try_version`, this can also be a
498
    /// version identifier (will take precedence over either other type, if possible).
499
    fn lex_numeric(&mut self, start_col: usize) -> PyResult<TokenType> {
8,602✔
500
        let first = self.line_buffer[start_col];
8,602✔
501
        if first == b'.' {
8,602✔
502
            return match self.next_byte()? {
10✔
503
                // In the case of a float that begins with '.', we require at least one digit, so
504
                // just force consume it and then loop over the rest.
505
                Some(b'0'..=b'9') => {
6✔
506
                    while let Some(b'0'..=b'9') = self.peek_byte()? {
12✔
507
                        self.next_byte()?;
6✔
508
                    }
509
                    self.lex_float_exponent(start_col)
6✔
510
                }
511
                _ => Err(QASM2ParseError::new_err(message_generic(
2✔
512
                    Some(&Position::new(&self.filename, self.line, start_col)),
2✔
513
                    "expected a numeric fractional part after the bare decimal point",
2✔
514
                ))),
2✔
515
            };
516
        }
8,592✔
517
        while let Some(b'0'..=b'9') = self.peek_byte()? {
9,040✔
518
            self.next_byte()?;
448✔
519
        }
520
        match self.peek_byte()? {
8,592✔
521
            Some(b'.') => {
522
                self.next_byte()?;
1,704✔
523
                let mut has_fractional = false;
1,704✔
524
                while let Some(b'0'..=b'9') = self.peek_byte()? {
9,852✔
525
                    has_fractional = true;
8,148✔
526
                    self.next_byte()?;
8,148✔
527
                }
528
                if self.try_version
1,704✔
529
                    && has_fractional
556✔
530
                    && !matches!(self.peek_byte()?, Some(b'e' | b'E'))
554✔
531
                {
532
                    self.expect_word_boundary("a version identifier", start_col)?;
552✔
533
                    return Ok(TokenType::Version);
550✔
534
                }
1,152✔
535
                return self.lex_float_exponent(start_col);
1,152✔
536
            }
537
            // In this situation, what we've lexed so far is an integer (maybe with leading
538
            // zeroes), but it can still be a float if it's followed by an exponent.  This
539
            // particular path is not technically within the spec (so should be subject to `strict`
540
            // mode), but pragmatically that's more just a nuisance for OQ2 generators, since many
541
            // languages will happily spit out something like `5e-5` when formatting floats.
542
            Some(b'e' | b'E') => {
543
                return if self.strict {
6✔
544
                    Err(QASM2ParseError::new_err(message_generic(
2✔
545
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
546
                        "[strict] all floats must include a decimal point",
2✔
547
                    )))
2✔
548
                } else {
549
                    self.lex_float_exponent(start_col)
4✔
550
                }
551
            }
552
            _ => (),
6,882✔
553
        }
6,882✔
554
        if first == b'0' && self.col - start_col > 1 {
6,882✔
555
            // Integers can't start with a leading zero unless they are only the single '0', but we
556
            // didn't see a decimal point.
557
            Err(QASM2ParseError::new_err(message_generic(
2✔
558
                Some(&Position::new(&self.filename, self.line, start_col)),
2✔
559
                "integers cannot have leading zeroes",
2✔
560
            )))
2✔
561
        } else if self.try_version {
6,880✔
562
            self.expect_word_boundary("a version identifier", start_col)?;
8✔
563
            Ok(TokenType::Version)
6✔
564
        } else {
565
            self.expect_word_boundary("an integer", start_col)?;
6,872✔
566
            Ok(TokenType::Integer)
6,870✔
567
        }
568
    }
8,602✔
569

570
    /// Lex a text-like token into a complete token.  This can return any of the keyword-like
571
    /// tokens (e.g. [TokenType::Pi]), or a [TokenType::Id] if the token is not a built-in keyword.
572
    fn lex_textlike(&mut self, start_col: usize) -> PyResult<TokenType> {
16,332✔
573
        let first = self.line_buffer[start_col];
16,332✔
574
        while let Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_') = self.peek_byte()? {
45,416✔
575
            self.next_byte()?;
29,084✔
576
        }
577
        // No need to expect the word boundary after this, because it's the same check as above.
578
        let text = &self.line_buffer[start_col..self.col];
16,328✔
579
        if let b'A'..=b'Z' = first {
16,328✔
580
            match text {
960✔
581
                b"OPENQASM" => Ok(TokenType::OpenQASM),
960✔
582
                b"U" | b"CX" => Ok(TokenType::Id),
388✔
583
                _ => Err(QASM2ParseError::new_err(message_generic(
2✔
584
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
585
                        "identifiers cannot start with capital letters except for the builtins 'U' and 'CX'"))),
2✔
586
            }
587
        } else {
588
            match text {
15,368✔
589
                b"barrier" => Ok(TokenType::Barrier),
15,368✔
590
                b"cos" => Ok(TokenType::Cos),
14,166✔
591
                b"creg" => Ok(TokenType::Creg),
13,392✔
592
                b"exp" => Ok(TokenType::Exp),
12✔
593
                b"gate" => Ok(TokenType::Gate),
292✔
594
                b"if" => Ok(TokenType::If),
10,152✔
595
                b"include" => Ok(TokenType::Include),
378✔
596
                b"ln" => Ok(TokenType::Ln),
30✔
597
                b"measure" => Ok(TokenType::Measure),
430✔
598
                b"opaque" => Ok(TokenType::Opaque),
6,988✔
599
                b"qreg" => Ok(TokenType::Qreg),
1,248✔
600
                b"reset" => Ok(TokenType::Reset),
6,592✔
601
                b"sin" => Ok(TokenType::Sin),
20✔
602
                b"sqrt" => Ok(TokenType::Sqrt),
20✔
603
                b"tan" => Ok(TokenType::Tan),
12✔
604
                b"pi" => Ok(TokenType::Pi),
224✔
605
                _ => Ok(TokenType::Id),
11,080✔
606
            }
607
        }
608
    }
16,332✔
609

610
    /// Lex a filename token completely.  This is always triggered by seeing a `b'"'` byte in the
611
    /// input stream.
612
    fn lex_filename(&mut self, terminator: u8, start_col: usize) -> PyResult<TokenType> {
370✔
613
        loop {
614
            match self.next_byte()? {
4,086✔
615
                None => {
616
                    return Err(QASM2ParseError::new_err(message_generic(
2✔
617
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
618
                        "unexpected end-of-file while lexing string literal",
2✔
619
                    )))
2✔
620
                }
621
                Some(b'\n' | b'\r') => {
622
                    return Err(QASM2ParseError::new_err(message_generic(
2✔
623
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
624
                        "unexpected line break while lexing string literal",
2✔
625
                    )))
2✔
626
                }
627
                Some(c) if c == terminator => {
4,082✔
628
                    return Ok(TokenType::Filename);
366✔
629
                }
630
                Some(_) => (),
3,716✔
631
            }
632
        }
633
    }
370✔
634

635
    /// The actual core of the iterator.  Read from the stream (ignoring preceding whitespace)
636
    /// until a complete [Token] has been constructed, or the end of the iterator is reached.  This
637
    /// returns `Some` for all tokens, including the error token, and only returns `None` if there
638
    /// are no more tokens left to take.
639
    fn next_inner(&mut self, context: &mut TokenContext) -> PyResult<Option<Token>> {
52,022✔
640
        // Consume preceding whitespace.  Beware that this can still exhaust the underlying stream,
641
        // or scan through an invalid token in the encoding.
642
        loop {
643
            match self.peek_byte()? {
92,110✔
644
                Some(b' ' | b'\t' | b'\r' | b'\n') => {
645
                    self.next_byte()?;
40,088✔
646
                }
647
                None => return Ok(None),
658✔
648
                _ => break,
51,358✔
649
            }
51,358✔
650
        }
51,358✔
651
        let start_col = self.col;
51,358✔
652
        // The whitespace loop (or [Self::try_lex_version]) has already peeked the next token, so
653
        // we know it's going to be the `Some` variant.
654
        let ttype = match self.next_byte()?.unwrap() {
51,358✔
655
            b'+' => TokenType::Plus,
102✔
656
            b'*' => TokenType::Asterisk,
70✔
657
            b'^' => TokenType::Caret,
30✔
658
            b';' => TokenType::Semicolon,
6,110✔
659
            b',' => TokenType::Comma,
2,776✔
660
            b'(' => TokenType::LParen,
1,568✔
661
            b')' => TokenType::RParen,
1,408✔
662
            b'[' => TokenType::LBracket,
6,174✔
663
            b']' => TokenType::RBracket,
6,128✔
664
            b'{' => TokenType::LBrace,
240✔
665
            b'}' => TokenType::RBrace,
224✔
666
            b'/' => {
667
                if let Some(b'/') = self.peek_byte()? {
322✔
668
                    return if self.advance_line()? == 0 {
150✔
669
                        Ok(None)
12✔
670
                    } else {
671
                        self.next(context)
138✔
672
                    };
673
                } else {
674
                    TokenType::Slash
172✔
675
                }
676
            }
677
            b'-' => {
678
                if let Ok(Some(b'>')) = self.peek_byte() {
634✔
679
                    self.col += 1;
416✔
680
                    TokenType::Arrow
416✔
681
                } else {
682
                    TokenType::Minus
218✔
683
                }
684
            }
685
            b'=' => {
686
                if let Ok(Some(b'=')) = self.peek_byte() {
264✔
687
                    self.col += 1;
262✔
688
                    TokenType::Equals
262✔
689
                } else {
690
                    return Err(QASM2ParseError::new_err(
2✔
691
                        "single equals '=' is never valid".to_owned(),
2✔
692
                    ));
2✔
693
                }
694
            }
695
            b'0'..=b'9' | b'.' => self.lex_numeric(start_col)?,
24,924✔
696
            b'a'..=b'z' | b'A'..=b'Z' => self.lex_textlike(start_col)?,
16,332✔
697
            c @ (b'"' | b'\'') => {
372✔
698
                if self.strict && c != b'"' {
372✔
699
                    return Err(QASM2ParseError::new_err(message_generic(
2✔
700
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
701
                        "[strict] paths must be in double quotes (\"\")",
2✔
702
                    )));
2✔
703
                } else {
704
                    self.lex_filename(c, start_col)?
370✔
705
                }
706
            }
707
            c => {
2✔
708
                return Err(QASM2ParseError::new_err(message_generic(
2✔
709
                    Some(&Position::new(&self.filename, self.line, start_col)),
2✔
710
                    &format!(
2✔
711
                        "encountered '{}', which doesn't match any valid tokens",
2✔
712
                        // Non-ASCII bytes should already have been rejected by `next_byte()`.
2✔
713
                        c as char,
2✔
714
                    ),
2✔
715
                )));
2✔
716
            }
717
        };
718
        self.try_version = ttype == TokenType::OpenQASM;
51,166✔
719
        Ok(Some(Token {
51,166✔
720
            ttype,
51,166✔
721
            line: self.line,
51,166✔
722
            col: start_col,
51,166✔
723
            index: if ttype.variable_text() {
51,166✔
724
                context.index(&self.line_buffer[start_col..self.col])
20,408✔
725
            } else {
726
                usize::MAX
30,758✔
727
            },
728
        }))
729
    }
52,022✔
730

731
    /// Get an optional reference to the next token in the iterator stream without consuming it.
732
    /// This is a direct analogue of the same method on the [std::iter::Peekable] struct, except it
733
    /// is manually defined here to avoid hiding the rest of the public fields of the [TokenStream]
734
    /// struct itself.
735
    pub fn peek(&mut self, context: &mut TokenContext) -> PyResult<Option<&Token>> {
33,536✔
736
        if self.peeked.is_none() {
33,536✔
737
            self.peeked = Some(self.next_inner(context)?);
27,866✔
738
        }
5,670✔
739
        Ok(self.peeked.as_ref().unwrap().as_ref())
33,508✔
740
    }
33,536✔
741

742
    pub fn next(&mut self, context: &mut TokenContext) -> PyResult<Option<Token>> {
51,382✔
743
        match self.peeked.take() {
51,382✔
744
            Some(token) => Ok(token),
27,226✔
745
            None => self.next_inner(context),
24,156✔
746
        }
747
    }
51,382✔
748
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2024 Coveralls, Inc