• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

Qiskit / qiskit / 10168162794

30 Jul 2024 06:24PM UTC coverage: 89.861% (-0.09%) from 89.954%
10168162794

push

github

web-flow
Expose Sabre heuristic configuration to Python (#12171) (#12856)

* Expose Sabre heuristic configuration to Python

This exposes the entirety of the configuration of the Sabre heuristic to
Python space, making it modifiable without recompilation.  This includes
some additional configuration options that were not previously easily
modifiable, even with recompilation:

- the base weight of the "basic" component can be adjusted
- the weight of the "basic" and "lookahead" components can be adjusted
  to _either_ use a constant weight (previously not a thing) or use a
  weight that scales with the size of the set (previously the only
  option).
- the "decay" component is now entirely separated from the "lookahead"
  component, so in theory you can now have a decay without a lookahead.

This introduces a tracking `Vec` that stores the scores of _all_ the
swaps encountered, rather than just dynamically keeping hold of the best
swaps.  This has a couple of benefits:

- with the new dynamic structure for heuristics, this is rather more
  efficient because each heuristic component can be calculated in
  separate loops over the swaps, and we don't have to branch within the
  innermost loop.
- it makes it possible in the future to try things like assigning
  probabilities to each swap and randomly choosing from _all_ of them,
  not just the best swaps.  This is something I've actively wanted to
  try for quite some time.

The default heuristics in the transpiler-pass creators for the `basic`,
`lookahead` and `decay` strings are set to represent the same heuristics
as before, and this commit is entirely RNG compatible with its
predecessor (_technically_ for huge problems there's a possiblity that
pulling out some divisions into multiplications by reciprocals will
affect the floating-point maths enough to modify the swap selection).

* Update for PyO3 0.21

* Increase documentation of heuristic components

(cherry picked from commit 43d8372ef7350a348897afa9c7dbd51c... (continued)

135 of 240 new or added lines in 7 files covered. (56.25%)

8 existing lines in 3 files now uncovered.

66427 of 73922 relevant lines covered (89.86%)

233878.67 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

93.23
/crates/qasm2/src/lex.rs
1
// This code is part of Qiskit.
2
//
3
// (C) Copyright IBM 2023
4
//
5
// This code is licensed under the Apache License, Version 2.0. You may
6
// obtain a copy of this license in the LICENSE.txt file in the root directory
7
// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
8
//
9
// Any modifications or derivative works of this code must retain this
10
// copyright notice, and modified files need to carry a notice indicating
11
// that they have been altered from the originals.
12

13
//! The lexing logic for OpenQASM 2, responsible for turning a sequence of bytes into a
14
//! lexed [TokenStream] for consumption by the parsing machinery.  The general strategy here is
15
//! quite simple; for all the symbol-like tokens, the lexer can use a very simple single-byte
16
//! lookahead to determine what token it needs to emit.  For keywords and identifiers, we just read
17
//! the identifier in completely, then produce the right token once we see the end of the
18
//! identifier characters.
19
//!
20
//! We effectively use a custom lexing mode to handle the version information after the `OPENQASM`
21
//! keyword; the spec technically says that any real number is valid, but in reality that leads to
22
//! weirdness like `200.0e-2` being a valid version specifier.  We do things with a custom
23
//! context-dependent match after seeing an `OPENQASM` token, to avoid clashes with the general
24
//! real-number tokenization.
25

26
use hashbrown::HashMap;
27
use num_bigint::BigUint;
28
use pyo3::prelude::PyResult;
29

30
use std::path::Path;
31

32
use crate::error::{message_generic, Position, QASM2ParseError};
33

34
/// Tokenized version information data.  This is more structured than the real number suggested by
35
/// the specification.
36
#[derive(Clone, Debug)]
37
pub struct Version {
38
    pub major: usize,
39
    pub minor: Option<usize>,
40
}
41

42
/// The context that is necessary to fully extract the information from a [Token].  This owns, for
43
/// example, the text of each token (where a token does not have a static text representation),
44
/// from which the other properties can later be derived.  This struct is effectively entirely
45
/// opaque outside this module; the associated functions on [Token] take this context object,
46
/// however, and extract the information from it.
47
#[derive(Clone, Debug)]
48
pub struct TokenContext {
49
    text: Vec<String>,
50
    lookup: HashMap<Vec<u8>, usize>,
51
}
52

53
impl TokenContext {
54
    /// Create a new context for tokens.  Nothing is heap-allocated until required.
55
    pub fn new() -> Self {
1,302✔
56
        TokenContext {
1,302✔
57
            text: vec![],
1,302✔
58
            lookup: HashMap::new(),
1,302✔
59
        }
1,302✔
60
    }
1,302✔
61

62
    /// Intern the given `ascii_text` of a [Token], and return an index into the [TokenContext].
63
    /// This will not store strings that are already present in the context; instead, the previous
64
    /// index is transparently returned.
65
    fn index(&mut self, ascii_text: &[u8]) -> usize {
21,232✔
66
        match self.lookup.get(ascii_text) {
21,232✔
67
            Some(index) => *index,
11,988✔
68
            None => {
69
                let index = self.text.len();
9,244✔
70
                self.lookup.insert(ascii_text.to_vec(), index);
9,244✔
71
                self.text
9,244✔
72
                    .push(std::str::from_utf8(ascii_text).unwrap().to_owned());
9,244✔
73
                index
9,244✔
74
            }
75
        }
76
    }
21,232✔
77
}
78

79
// Clippy complains without this.
80
impl Default for TokenContext {
81
    fn default() -> Self {
×
82
        Self::new()
×
83
    }
×
84
}
85

86
/// An enumeration of the different types of [Token] that can be created during lexing.  This is
87
/// deliberately not a data enum, to make various abstract `expect` (and so on) methods more
88
/// ergonomic to use; one does not need to completely define the pattern match each time, but can
89
/// simply pass the type identifier.  This also saves memory, since the static variants do not need
90
/// to be aligned to include the space necessary for text pointers that would be in the non-static
91
/// forms, and allows strings to be shared between many tokens (using the [TokenContext] store).
92
#[derive(PartialEq, Eq, Clone, Copy, Debug)]
93
pub enum TokenType {
94
    // Keywords
95
    OpenQASM,
96
    Barrier,
97
    Cos,
98
    Creg,
99
    Exp,
100
    Gate,
101
    If,
102
    Include,
103
    Ln,
104
    Measure,
105
    Opaque,
106
    Qreg,
107
    Reset,
108
    Sin,
109
    Sqrt,
110
    Tan,
111
    Pi,
112
    // Symbols
113
    Plus,
114
    Minus,
115
    Arrow,
116
    Asterisk,
117
    Equals,
118
    Slash,
119
    Caret,
120
    Semicolon,
121
    Comma,
122
    LParen,
123
    RParen,
124
    LBracket,
125
    RBracket,
126
    LBrace,
127
    RBrace,
128
    // Content
129
    Id,
130
    Real,
131
    Integer,
132
    Filename,
133
    Version,
134
}
135

136
impl TokenType {
137
    pub fn variable_text(&self) -> bool {
53,384✔
138
        match self {
53,384✔
139
            TokenType::OpenQASM
140
            | TokenType::Barrier
141
            | TokenType::Cos
142
            | TokenType::Creg
143
            | TokenType::Exp
144
            | TokenType::Gate
145
            | TokenType::If
146
            | TokenType::Include
147
            | TokenType::Ln
148
            | TokenType::Measure
149
            | TokenType::Opaque
150
            | TokenType::Qreg
151
            | TokenType::Reset
152
            | TokenType::Sin
153
            | TokenType::Sqrt
154
            | TokenType::Tan
155
            | TokenType::Pi
156
            | TokenType::Plus
157
            | TokenType::Minus
158
            | TokenType::Arrow
159
            | TokenType::Asterisk
160
            | TokenType::Equals
161
            | TokenType::Slash
162
            | TokenType::Caret
163
            | TokenType::Semicolon
164
            | TokenType::Comma
165
            | TokenType::LParen
166
            | TokenType::RParen
167
            | TokenType::LBracket
168
            | TokenType::RBracket
169
            | TokenType::LBrace
170
            | TokenType::RBrace => false,
32,152✔
171
            TokenType::Id
172
            | TokenType::Real
173
            | TokenType::Integer
174
            | TokenType::Filename
175
            | TokenType::Version => true,
21,232✔
176
        }
177
    }
53,384✔
178

179
    /// Get a static description of the token type.  This is useful for producing messages when the
180
    /// full token context isn't available, or isn't important.
181
    pub fn describe(&self) -> &'static str {
224✔
182
        match self {
224✔
183
            TokenType::OpenQASM => "OPENQASM",
×
184
            TokenType::Barrier => "barrier",
8✔
185
            TokenType::Cos => "cos",
×
186
            TokenType::Creg => "creg",
×
187
            TokenType::Exp => "exp",
×
188
            TokenType::Gate => "gate",
2✔
189
            TokenType::If => "if",
2✔
190
            TokenType::Include => "include",
24✔
191
            TokenType::Ln => "ln",
×
192
            TokenType::Measure => "measure",
10✔
193
            TokenType::Opaque => "opaque",
×
194
            TokenType::Qreg => "qreg",
×
195
            TokenType::Reset => "reset",
2✔
196
            TokenType::Sin => "sin",
×
197
            TokenType::Sqrt => "sqrt",
×
198
            TokenType::Tan => "tan",
×
199
            TokenType::Pi => "pi",
2✔
200
            TokenType::Plus => "+",
×
201
            TokenType::Minus => "-",
×
202
            TokenType::Arrow => "->",
18✔
203
            TokenType::Asterisk => "*",
×
204
            TokenType::Equals => "==",
6✔
205
            TokenType::Slash => "/",
×
206
            TokenType::Caret => "^",
×
207
            TokenType::Semicolon => ";",
12✔
208
            TokenType::Comma => ",",
10✔
209
            TokenType::LParen => "(",
18✔
210
            TokenType::RParen => ")",
×
211
            TokenType::LBracket => "[",
20✔
212
            TokenType::RBracket => "]",
10✔
213
            TokenType::LBrace => "{",
28✔
214
            TokenType::RBrace => "}",
4✔
215
            TokenType::Id => "an identifier",
16✔
216
            TokenType::Real => "a real number",
2✔
217
            TokenType::Integer => "an integer",
30✔
UNCOV
218
            TokenType::Filename => "a filename string",
×
219
            TokenType::Version => "a '<major>.<minor>' version",
×
220
        }
221
    }
224✔
222
}
223

224
/// A representation of a token, including its type, span information and pointer to where its text
225
/// is stored in the context object.  These are relatively lightweight objects (though of course
226
/// not as light as the single type information).
227
#[derive(Clone, Copy, Debug)]
228
pub struct Token {
229
    pub ttype: TokenType,
230
    // The `line` and `col` refer only to the start of the token.  There are no tokens that span
231
    // more than one line (we don't tokenise comments), but the ending column offset can be
232
    // calculated by asking the associated `TokenContext` for the text associated with this token,
233
    // and inspecting the length of the returned value.
234
    pub line: usize,
235
    pub col: usize,
236
    // Index into the TokenContext object, to retrieve the text that makes up the token.  We don't
237
    // resolve this into a value during lexing; that comes with annoying typing issues or storage
238
    // wastage.  Instead, we only convert the text into a value type when asked to by calling a
239
    // relevant method on the token.
240
    index: usize,
241
}
242

243
impl Token {
244
    /// Get a reference to the string that was seen to generate this token.
245
    pub fn text<'a>(&self, context: &'a TokenContext) -> &'a str {
380✔
246
        match self.ttype {
380✔
247
            TokenType::Id
248
            | TokenType::Real
249
            | TokenType::Integer
250
            | TokenType::Filename
251
            | TokenType::Version => &context.text[self.index],
370✔
252
            _ => self.ttype.describe(),
10✔
253
        }
254
    }
380✔
255

256
    /// If the token is an identifier, this method can be called to get an owned string containing
257
    /// the text of the identifier.  Panics if the token is not an identifier.
258
    pub fn id(&self, context: &TokenContext) -> String {
11,424✔
259
        if self.ttype != TokenType::Id {
11,424✔
260
            panic!()
×
261
        }
11,424✔
262
        (&context.text[self.index]).into()
11,424✔
263
    }
11,424✔
264

265
    /// If the token is a real number, this method can be called to evaluate its value.  Panics if
266
    /// the token is not a float or an integer.
267
    pub fn real(&self, context: &TokenContext) -> f64 {
1,664✔
268
        if !(self.ttype == TokenType::Real || self.ttype == TokenType::Integer) {
1,664✔
269
            panic!()
×
270
        }
1,664✔
271
        context.text[self.index].parse().unwrap()
1,664✔
272
    }
1,664✔
273

274
    /// If the token is an integer (by type, not just by value), this method can be called to
275
    /// evaluate its value.  Panics if the token is not an integer type.
276
    pub fn int(&self, context: &TokenContext) -> usize {
6,376✔
277
        if self.ttype != TokenType::Integer {
6,376✔
278
            panic!()
×
279
        }
6,376✔
280
        context.text[self.index].parse().unwrap()
6,376✔
281
    }
6,376✔
282

283
    /// If the token is an integer (by type, not just by value), this method can be called to
284
    /// evaluate its value as a big integer.  Panics if the token is not an integer type.
285
    pub fn bigint(&self, context: &TokenContext) -> BigUint {
260✔
286
        if self.ttype != TokenType::Integer {
260✔
287
            panic!()
×
288
        }
260✔
289
        context.text[self.index].parse().unwrap()
260✔
290
    }
260✔
291

292
    /// If the token is a filename path, this method can be called to get a (regular) string
293
    /// representing it.  Panics if the token type was not a filename.
294
    pub fn filename(&self, context: &TokenContext) -> String {
366✔
295
        if self.ttype != TokenType::Filename {
366✔
296
            panic!()
×
297
        }
366✔
298
        let out = &context.text[self.index];
366✔
299
        // String slicing is fine to assume bytes here, because the characters we're slicing out
366✔
300
        // must both be the ASCII '"', which is a single-byte UTF-8 character.
366✔
301
        out[1..out.len() - 1].into()
366✔
302
    }
366✔
303

304
    /// If the token is a version-information token, this method can be called to evaluate the
305
    /// version information.  Panics if the token was not of the correct type.
306
    pub fn version(&self, context: &TokenContext) -> Version {
620✔
307
        if self.ttype != TokenType::Version {
620✔
308
            panic!()
×
309
        }
620✔
310
        // Everything in the version token is a valid ASCII character, so must be a one-byte token.
620✔
311
        let text = &context.text[self.index];
620✔
312
        match text.chars().position(|c| c == '.') {
1,234✔
313
            Some(pos) => Version {
614✔
314
                major: text[0..pos].parse().unwrap(),
614✔
315
                minor: Some(text[pos + 1..text.len()].parse().unwrap()),
614✔
316
            },
614✔
317
            None => Version {
6✔
318
                major: text.parse().unwrap(),
6✔
319
                minor: None,
6✔
320
            },
6✔
321
        }
322
    }
620✔
323
}
324

325
/// The workhouse struct of the lexer.  This represents a peekable iterable object that is abstract
326
/// over some buffered reader.  The struct itself essentially represents the mutable state of the
327
/// lexer, with its main public associated functions being the iterable method [Self::next()] and
328
/// the [std::iter::Peekable]-like function [Self::peek()].
329
///
330
/// The stream exposes one public attributes directly: the [filename] that this stream comes from
331
/// (set to some placeholder value for streams that do not have a backing file).  The associated
332
/// `TokenContext` object is managed separately to the stream and is passed in each call to `next`;
333
/// this allows for multiple streams to operate on the same context, such as when a new stream
334
/// begins in order to handle an `include` statement.
335
pub struct TokenStream {
336
    /// The filename from which this stream is derived.  May be a placeholder if there is no
337
    /// backing file or other named resource.
338
    pub filename: std::ffi::OsString,
339
    strict: bool,
340
    source: Box<dyn std::io::BufRead + Send>,
341
    line_buffer: Vec<u8>,
342
    done: bool,
343
    line: usize,
344
    col: usize,
345
    try_version: bool,
346
    // This is a manual peekable structure (rather than using the `peekable` method of `Iterator`)
347
    // because we still want to be able to access the other members of the struct at the same time.
348
    peeked: Option<Option<Token>>,
349
}
350

351
impl TokenStream {
352
    /// Create and initialise a generic [TokenStream], given a source that implements
353
    /// [std::io::BufRead] and a filename (or resource path) that describes its source.
354
    fn new(
1,320✔
355
        source: Box<dyn std::io::BufRead + Send>,
1,320✔
356
        filename: std::ffi::OsString,
1,320✔
357
        strict: bool,
1,320✔
358
    ) -> Self {
1,320✔
359
        TokenStream {
1,320✔
360
            filename,
1,320✔
361
            strict,
1,320✔
362
            source,
1,320✔
363
            line_buffer: Vec::with_capacity(80),
1,320✔
364
            done: false,
1,320✔
365
            // The first line is numbered "1", and the first column is "0".  The counts are
1,320✔
366
            // initialized like this so the first call to `next_byte` can easily detect that it
1,320✔
367
            // needs to extract the next line.
1,320✔
368
            line: 0,
1,320✔
369
            col: 0,
1,320✔
370
            try_version: false,
1,320✔
371
            peeked: None,
1,320✔
372
        }
1,320✔
373
    }
1,320✔
374

375
    /// Create a [TokenStream] from a string containing the OpenQASM 2 program.
376
    pub fn from_string(string: String, strict: bool) -> Self {
1,208✔
377
        TokenStream::new(
1,208✔
378
            Box::new(std::io::Cursor::new(string)),
1,208✔
379
            "<input>".into(),
1,208✔
380
            strict,
1,208✔
381
        )
1,208✔
382
    }
1,208✔
383

384
    /// Create a [TokenStream] from a path containing the OpenQASM 2 program.
385
    pub fn from_path<P: AsRef<Path>>(path: P, strict: bool) -> Result<Self, std::io::Error> {
112✔
386
        let file = std::fs::File::open(path.as_ref())?;
112✔
387
        Ok(TokenStream::new(
112✔
388
            Box::new(std::io::BufReader::new(file)),
112✔
389
            Path::file_name(path.as_ref()).unwrap().into(),
112✔
390
            strict,
112✔
391
        ))
112✔
392
    }
112✔
393

394
    /// Read the next line into the managed buffer in the struct, updating the tracking information
395
    /// of the position, and the `done` state of the iterator.
396
    fn advance_line(&mut self) -> PyResult<usize> {
7,138✔
397
        if self.done {
7,138✔
398
            Ok(0)
850✔
399
        } else {
400
            self.line += 1;
6,288✔
401
            self.col = 0;
6,288✔
402
            self.line_buffer.clear();
6,288✔
403
            // We can assume that nobody's running this on ancient Mac software that uses only '\r'
6,288✔
404
            // as its linebreak character.
6,288✔
405
            match self.source.read_until(b'\n', &mut self.line_buffer) {
6,288✔
406
                Ok(count) => {
6,284✔
407
                    if count == 0 || self.line_buffer[count - 1] != b'\n' {
6,284✔
408
                        self.done = true;
1,250✔
409
                    }
5,034✔
410
                    Ok(count)
6,284✔
411
                }
412
                Err(err) => {
4✔
413
                    self.done = true;
4✔
414
                    Err(QASM2ParseError::new_err(message_generic(
4✔
415
                        Some(&Position::new(&self.filename, self.line, self.col)),
4✔
416
                        &format!("lexer failed to read stream: {}", err),
4✔
417
                    )))
4✔
418
                }
419
            }
420
        }
421
    }
7,138✔
422

423
    /// Get the next character in the stream.  This updates the line and column information for the
424
    /// current byte as well.
425
    fn next_byte(&mut self) -> PyResult<Option<u8>> {
141,630✔
426
        if self.col >= self.line_buffer.len() && self.advance_line()? == 0 {
141,630✔
427
            return Ok(None);
2✔
428
        }
141,628✔
429
        let out = self.line_buffer[self.col];
141,628✔
430
        self.col += 1;
141,628✔
431
        match out {
141,628✔
432
            b @ 0x80..=0xff => {
2✔
433
                self.done = true;
2✔
434
                Err(QASM2ParseError::new_err(message_generic(
2✔
435
                    Some(&Position::new(&self.filename, self.line, self.col)),
2✔
436
                    &format!("encountered a non-ASCII byte: {:02X?}", b),
2✔
437
                )))
2✔
438
            }
439
            b => Ok(Some(b)),
141,626✔
440
        }
441
    }
141,630✔
442

443
    /// Peek at the next byte in the stream without consuming it.  This still returns an error if
444
    /// the next byte isn't in the valid range for OpenQASM 2, or if the file/stream has failed to
445
    /// read into the buffer for some reason.
446
    fn peek_byte(&mut self) -> PyResult<Option<u8>> {
185,224✔
447
        if self.col >= self.line_buffer.len() && self.advance_line()? == 0 {
185,224✔
448
            return Ok(None);
930✔
449
        }
184,290✔
450
        match self.line_buffer[self.col] {
184,290✔
451
            b @ 0x80..=0xff => {
6✔
452
                self.done = true;
6✔
453
                Err(QASM2ParseError::new_err(message_generic(
6✔
454
                    Some(&Position::new(&self.filename, self.line, self.col)),
6✔
455
                    &format!("encountered a non-ASCII byte: {:02X?}", b),
6✔
456
                )))
6✔
457
            }
458
            b => Ok(Some(b)),
184,284✔
459
        }
460
    }
185,224✔
461

462
    /// Expect that the next byte is not a word continuation, providing a suitable error message if
463
    /// it is.
464
    fn expect_word_boundary(&mut self, after: &str, start_col: usize) -> PyResult<()> {
8,934✔
465
        match self.peek_byte()? {
8,934✔
466
            Some(c @ (b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_')) => {
6,940✔
467
                Err(QASM2ParseError::new_err(message_generic(
12✔
468
                    Some(&Position::new(&self.filename, self.line, start_col)),
12✔
469
                    &format!(
12✔
470
                        "expected a word boundary after {}, but saw '{}'",
12✔
471
                        after, c as char
12✔
472
                    ),
12✔
473
                )))
12✔
474
            }
475
            _ => Ok(()),
8,922✔
476
        }
477
    }
8,934✔
478

479
    /// Complete the lexing of a floating-point value from the position of maybe accepting an
480
    /// exponent.  The previous part of the token must be a valid stand-alone float, or the next
481
    /// byte must already have been peeked and known to be `b'e' | b'E'`.
482
    fn lex_float_exponent(&mut self, start_col: usize) -> PyResult<TokenType> {
1,140✔
483
        if !matches!(self.peek_byte()?, Some(b'e' | b'E')) {
1,140✔
484
            self.expect_word_boundary("a float", start_col)?;
1,094✔
485
            return Ok(TokenType::Real);
1,090✔
486
        }
46✔
487
        // Consume the rest of the exponent.
46✔
488
        self.next_byte()?;
46✔
489
        if let Some(b'+' | b'-') = self.peek_byte()? {
46✔
490
            self.next_byte()?;
40✔
491
        }
6✔
492
        // Exponents must have at least one digit in them.
493
        if !matches!(self.peek_byte()?, Some(b'0'..=b'9')) {
46✔
494
            return Err(QASM2ParseError::new_err(message_generic(
6✔
495
                Some(&Position::new(&self.filename, self.line, start_col)),
6✔
496
                "needed to see an integer exponent for this float",
6✔
497
            )));
6✔
498
        }
40✔
499
        while let Some(b'0'..=b'9') = self.peek_byte()? {
114✔
500
            self.next_byte()?;
74✔
501
        }
502
        self.expect_word_boundary("a float", start_col)?;
40✔
503
        Ok(TokenType::Real)
38✔
504
    }
1,140✔
505

506
    /// Lex a numeric token completely.  This can return a successful integer or a real number; the
507
    /// function distinguishes based on what it sees.  If `self.try_version`, this can also be a
508
    /// version identifier (will take precedence over either other type, if possible).
509
    fn lex_numeric(&mut self, start_col: usize) -> PyResult<TokenType> {
8,948✔
510
        let first = self.line_buffer[start_col];
8,948✔
511
        if first == b'.' {
8,948✔
512
            return match self.next_byte()? {
10✔
513
                // In the case of a float that begins with '.', we require at least one digit, so
514
                // just force consume it and then loop over the rest.
515
                Some(b'0'..=b'9') => {
6✔
516
                    while let Some(b'0'..=b'9') = self.peek_byte()? {
12✔
517
                        self.next_byte()?;
6✔
518
                    }
519
                    self.lex_float_exponent(start_col)
6✔
520
                }
521
                _ => Err(QASM2ParseError::new_err(message_generic(
2✔
522
                    Some(&Position::new(&self.filename, self.line, start_col)),
2✔
523
                    "expected a numeric fractional part after the bare decimal point",
2✔
524
                ))),
2✔
525
            };
526
        }
8,938✔
527
        while let Some(b'0'..=b'9') = self.peek_byte()? {
10,470✔
528
            self.next_byte()?;
1,532✔
529
        }
530
        match self.peek_byte()? {
8,938✔
531
            Some(b'.') => {
532
                self.next_byte()?;
1,746✔
533
                let mut has_fractional = false;
1,746✔
534
                while let Some(b'0'..=b'9') = self.peek_byte()? {
9,936✔
535
                    has_fractional = true;
8,190✔
536
                    self.next_byte()?;
8,190✔
537
                }
538
                if self.try_version
1,746✔
539
                    && has_fractional
620✔
540
                    && !matches!(self.peek_byte()?, Some(b'e' | b'E'))
618✔
541
                {
542
                    self.expect_word_boundary("a version identifier", start_col)?;
616✔
543
                    return Ok(TokenType::Version);
614✔
544
                }
1,130✔
545
                return self.lex_float_exponent(start_col);
1,130✔
546
            }
547
            // In this situation, what we've lexed so far is an integer (maybe with leading
548
            // zeroes), but it can still be a float if it's followed by an exponent.  This
549
            // particular path is not technically within the spec (so should be subject to `strict`
550
            // mode), but pragmatically that's more just a nuisance for OQ2 generators, since many
551
            // languages will happily spit out something like `5e-5` when formatting floats.
552
            Some(b'e' | b'E') => {
553
                return if self.strict {
6✔
554
                    Err(QASM2ParseError::new_err(message_generic(
2✔
555
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
556
                        "[strict] all floats must include a decimal point",
2✔
557
                    )))
2✔
558
                } else {
559
                    self.lex_float_exponent(start_col)
4✔
560
                }
561
            }
562
            _ => (),
7,186✔
563
        }
7,186✔
564
        if first == b'0' && self.col - start_col > 1 {
7,186✔
565
            // Integers can't start with a leading zero unless they are only the single '0', but we
566
            // didn't see a decimal point.
567
            Err(QASM2ParseError::new_err(message_generic(
2✔
568
                Some(&Position::new(&self.filename, self.line, start_col)),
2✔
569
                "integers cannot have leading zeroes",
2✔
570
            )))
2✔
571
        } else if self.try_version {
7,184✔
572
            self.expect_word_boundary("a version identifier", start_col)?;
8✔
573
            Ok(TokenType::Version)
6✔
574
        } else {
575
            self.expect_word_boundary("an integer", start_col)?;
7,176✔
576
            Ok(TokenType::Integer)
7,174✔
577
        }
578
    }
8,948✔
579

580
    /// Lex a text-like token into a complete token.  This can return any of the keyword-like
581
    /// tokens (e.g. [TokenType::Pi]), or a [TokenType::Id] if the token is not a built-in keyword.
582
    fn lex_textlike(&mut self, start_col: usize) -> PyResult<TokenType> {
17,168✔
583
        let first = self.line_buffer[start_col];
17,168✔
584
        while let Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_') = self.peek_byte()? {
48,296✔
585
            self.next_byte()?;
31,128✔
586
        }
587
        // No need to expect the word boundary after this, because it's the same check as above.
588
        let text = &self.line_buffer[start_col..self.col];
17,164✔
589
        if let b'A'..=b'Z' = first {
17,164✔
590
            match text {
1,030✔
591
                b"OPENQASM" => Ok(TokenType::OpenQASM),
1,030✔
592
                b"U" | b"CX" => Ok(TokenType::Id),
394✔
593
                _ => Err(QASM2ParseError::new_err(message_generic(
2✔
594
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
595
                        "identifiers cannot start with capital letters except for the builtins 'U' and 'CX'"))),
2✔
596
            }
597
        } else {
598
            match text {
16,134✔
599
                b"barrier" => Ok(TokenType::Barrier),
16,134✔
600
                b"cos" => Ok(TokenType::Cos),
14,868✔
601
                b"creg" => Ok(TokenType::Creg),
14,074✔
602
                b"exp" => Ok(TokenType::Exp),
12✔
603
                b"gate" => Ok(TokenType::Gate),
306✔
604
                b"if" => Ok(TokenType::If),
10,522✔
605
                b"include" => Ok(TokenType::Include),
404✔
606
                b"ln" => Ok(TokenType::Ln),
30✔
607
                b"measure" => Ok(TokenType::Measure),
452✔
608
                b"opaque" => Ok(TokenType::Opaque),
7,190✔
609
                b"qreg" => Ok(TokenType::Qreg),
1,328✔
610
                b"reset" => Ok(TokenType::Reset),
6,776✔
611
                b"sin" => Ok(TokenType::Sin),
20✔
612
                b"sqrt" => Ok(TokenType::Sqrt),
20✔
613
                b"tan" => Ok(TokenType::Tan),
12✔
614
                b"pi" => Ok(TokenType::Pi),
244✔
615
                _ => Ok(TokenType::Id),
11,550✔
616
            }
617
        }
618
    }
17,168✔
619

620
    /// Lex a filename token completely.  This is always triggered by seeing a `b'"'` byte in the
621
    /// input stream.
622
    fn lex_filename(&mut self, terminator: u8, start_col: usize) -> PyResult<TokenType> {
372✔
623
        loop {
624
            match self.next_byte()? {
4,108✔
625
                None => {
626
                    return Err(QASM2ParseError::new_err(message_generic(
2✔
627
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
628
                        "unexpected end-of-file while lexing string literal",
2✔
629
                    )))
2✔
630
                }
631
                Some(b'\n' | b'\r') => {
632
                    return Err(QASM2ParseError::new_err(message_generic(
2✔
633
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
634
                        "unexpected line break while lexing string literal",
2✔
635
                    )))
2✔
636
                }
637
                Some(c) if c == terminator => {
4,104✔
638
                    return Ok(TokenType::Filename);
368✔
639
                }
640
                Some(_) => (),
3,736✔
641
            }
642
        }
643
    }
372✔
644

645
    /// The actual core of the iterator.  Read from the stream (ignoring preceding whitespace)
646
    /// until a complete [Token] has been constructed, or the end of the iterator is reached.  This
647
    /// returns `Some` for all tokens, including the error token, and only returns `None` if there
648
    /// are no more tokens left to take.
649
    fn next_inner(&mut self, context: &mut TokenContext) -> PyResult<Option<Token>> {
54,244✔
650
        // Consume preceding whitespace.  Beware that this can still exhaust the underlying stream,
651
        // or scan through an invalid token in the encoding.
652
        loop {
653
            match self.peek_byte()? {
95,418✔
654
                Some(b' ' | b'\t' | b'\r' | b'\n') => {
655
                    self.next_byte()?;
41,174✔
656
                }
657
                None => return Ok(None),
662✔
658
                _ => break,
53,576✔
659
            }
53,576✔
660
        }
53,576✔
661
        let start_col = self.col;
53,576✔
662
        // The whitespace loop (or [Self::try_lex_version]) has already peeked the next token, so
663
        // we know it's going to be the `Some` variant.
664
        let ttype = match self.next_byte()?.unwrap() {
53,576✔
665
            b'+' => TokenType::Plus,
102✔
666
            b'*' => TokenType::Asterisk,
70✔
667
            b'^' => TokenType::Caret,
30✔
668
            b';' => TokenType::Semicolon,
6,416✔
669
            b',' => TokenType::Comma,
2,840✔
670
            b'(' => TokenType::LParen,
1,636✔
671
            b')' => TokenType::RParen,
1,460✔
672
            b'[' => TokenType::LBracket,
6,440✔
673
            b']' => TokenType::RBracket,
6,366✔
674
            b'{' => TokenType::LBrace,
272✔
675
            b'}' => TokenType::RBrace,
196✔
676
            b'/' => {
677
                if let Some(b'/') = self.peek_byte()? {
340✔
678
                    return if self.advance_line()? == 0 {
150✔
679
                        Ok(None)
12✔
680
                    } else {
681
                        self.next(context)
138✔
682
                    };
683
                } else {
684
                    TokenType::Slash
190✔
685
                }
686
            }
687
            b'-' => {
688
                if let Ok(Some(b'>')) = self.peek_byte() {
644✔
689
                    self.col += 1;
426✔
690
                    TokenType::Arrow
426✔
691
                } else {
692
                    TokenType::Minus
218✔
693
                }
694
            }
695
            b'=' => {
696
                if let Ok(Some(b'=')) = self.peek_byte() {
272✔
697
                    self.col += 1;
270✔
698
                    TokenType::Equals
270✔
699
                } else {
700
                    return Err(QASM2ParseError::new_err(
2✔
701
                        "single equals '=' is never valid".to_owned(),
2✔
702
                    ));
2✔
703
                }
704
            }
705
            b'0'..=b'9' | b'.' => self.lex_numeric(start_col)?,
26,106✔
706
            b'a'..=b'z' | b'A'..=b'Z' => self.lex_textlike(start_col)?,
17,168✔
707
            c @ (b'"' | b'\'') => {
374✔
708
                if self.strict && c != b'"' {
374✔
709
                    return Err(QASM2ParseError::new_err(message_generic(
2✔
710
                        Some(&Position::new(&self.filename, self.line, start_col)),
2✔
711
                        "[strict] paths must be in double quotes (\"\")",
2✔
712
                    )));
2✔
713
                } else {
714
                    self.lex_filename(c, start_col)?
372✔
715
                }
716
            }
717
            c => {
2✔
718
                return Err(QASM2ParseError::new_err(message_generic(
2✔
719
                    Some(&Position::new(&self.filename, self.line, start_col)),
2✔
720
                    &format!(
2✔
721
                        "encountered '{}', which doesn't match any valid tokens",
2✔
722
                        // Non-ASCII bytes should already have been rejected by `next_byte()`.
2✔
723
                        c as char,
2✔
724
                    ),
2✔
725
                )));
2✔
726
            }
727
        };
728
        self.try_version = ttype == TokenType::OpenQASM;
53,384✔
729
        Ok(Some(Token {
53,384✔
730
            ttype,
53,384✔
731
            line: self.line,
53,384✔
732
            col: start_col,
53,384✔
733
            index: if ttype.variable_text() {
53,384✔
734
                context.index(&self.line_buffer[start_col..self.col])
21,232✔
735
            } else {
736
                usize::MAX
32,152✔
737
            },
738
        }))
739
    }
54,244✔
740

741
    /// Get an optional reference to the next token in the iterator stream without consuming it.
742
    /// This is a direct analogue of the same method on the [std::iter::Peekable] struct, except it
743
    /// is manually defined here to avoid hiding the rest of the public fields of the [TokenStream]
744
    /// struct itself.
745
    pub fn peek(&mut self, context: &mut TokenContext) -> PyResult<Option<&Token>> {
34,600✔
746
        if self.peeked.is_none() {
34,600✔
747
            self.peeked = Some(self.next_inner(context)?);
28,796✔
748
        }
5,804✔
749
        Ok(self.peeked.as_ref().unwrap().as_ref())
34,572✔
750
    }
34,600✔
751

752
    pub fn next(&mut self, context: &mut TokenContext) -> PyResult<Option<Token>> {
53,602✔
753
        match self.peeked.take() {
53,602✔
754
            Some(token) => Ok(token),
28,154✔
755
            None => self.next_inner(context),
25,448✔
756
        }
757
    }
53,602✔
758
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2024 Coveralls, Inc