• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

localstack / localstack / 17902858349

19 Sep 2025 12:17PM UTC coverage: 86.864% (+0.02%) from 86.844%
17902858349

push

github

web-flow
ASF/CloudWatch: add support for multi-protocols (#13161)

34 of 37 new or added lines in 7 files covered. (91.89%)

136 existing lines in 10 files now uncovered.

67691 of 77928 relevant lines covered (86.86%)

0.87 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

90.85
/localstack-core/localstack/aws/protocol/parser.py
1
"""
2
Request parsers for the different AWS service protocols.
3

4
The module contains classes that take an HTTP request to a service, and
5
given an operation model, parse the HTTP request according to the
6
specified input shape.
7

8
It can be seen as the counterpart to the ``serialize`` module in
9
``botocore`` (which serializes the request before sending it to this
10
parser). It has a lot of similarities with the ``parse`` module in
11
``botocore``, but serves a different purpose (parsing requests
12
instead of responses).
13

14
The different protocols have many similarities. The class hierarchy is
15
designed such that the parsers share as much logic as possible.
16
The class hierarchy looks as follows:
17
::
18
                          ┌─────────────┐
19
                          │RequestParser│
20
                          └─────────────┘
21
                             ▲   ▲   ▲
22
           ┌─────────────────┘   │   └────────────────────┬───────────────────────┬───────────────────────┐
23
  ┌────────┴─────────┐ ┌─────────┴───────────┐ ┌──────────┴──────────┐ ┌──────────┴──────────┐ ┌──────────┴───────────┐
24
  │QueryRequestParser│ │BaseRestRequestParser│ │BaseJSONRequestParser│ │BaseCBORRequestParser│ │BaseRpcV2RequestParser│
25
  └──────────────────┘ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘ └──────────────────────┘
26
          ▲                    ▲            ▲   ▲           ▲             ▲             ▲             ▲
27
  ┌───────┴────────┐ ┌─────────┴──────────┐ │   │  ┌────────┴────────┐    │         ┌───┴─────────────┴────┐
28
  │EC2RequestParser│ │RestXMLRequestParser│ │   │  │JSONRequestParser│    │         │RpcV2CBORRequestParser│
29
  └────────────────┘ └────────────────────┘ │   │  └─────────────────┘    │         └──────────────────────┘
30
                           ┌────────────────┴───┴┐                 ▲      │
31
                           │RestJSONRequestParser│             ┌───┴──────┴──────┐
32
                           └─────────────────────┘             │CBORRequestParser│
33
                                                               └─────────────────┘
34
::
35

36
The ``RequestParser`` contains the logic that is used among all the
37
different protocols (``query``, ``json``, ``rest-json``, ``rest-xml``,
38
``cbor`` and ``ec2``).
39
The relation between the different protocols is described in the
40
``serializer``.
41

42
The classes are structured as follows:
43

44
* The ``RequestParser`` contains all the basic logic for the parsing
45
  which is shared among all different protocols.
46
* The ``BaseRestRequestParser`` contains the logic for the REST
47
  protocol specifics (i.e. specific HTTP metadata parsing).
48
* The ``BaseRpcV2RequestParser`` contains the logic for the RPC v2
49
  protocol specifics (special path routing, no logic about body decoding)
50
* The ``BaseJSONRequestParser`` contains the logic for the JSON body
51
  parsing.
52
* The ``BaseCBORRequestParser`` contains the logic for the CBOR body
53
  parsing.
54
* The ``RestJSONRequestParser`` inherits the ReST specific logic from
55
  the ``BaseRestRequestParser`` and the JSON body parsing from the
56
  ``BaseJSONRequestParser``.
57
* The ``CBORRequestParser`` inherits the ``json``-protocol specific
58
  logic from the ``JSONRequestParser`` and the CBOR body parsing
59
  from the ``BaseCBORRequestParser``.
60
* The ``QueryRequestParser``, ``RestXMLRequestParser``,
61
  ``RpcV2CBORRequestParser`` and ``JSONRequestParser`` have a
62
  conventional inheritance structure.
63

64
The services and their protocols are defined by using AWS's Smithy
65
(a language to define services in a - somewhat - protocol-agnostic
66
way). The "peculiarities" in this parser code usually correspond
67
to certain so-called "traits" in Smithy.
68

69
The result of the parser methods are the operation model of the
70
service's action which the request was aiming for, as well as the
71
parsed parameters for the service's function invocation.
72
"""
73

74
import abc
1✔
75
import base64
1✔
76
import datetime
1✔
77
import functools
1✔
78
import io
1✔
79
import os
1✔
80
import re
1✔
81
import struct
1✔
82
from abc import ABC
1✔
83
from collections.abc import Mapping
1✔
84
from email.utils import parsedate_to_datetime
1✔
85
from typing import IO, Any
1✔
86
from xml.etree import ElementTree as ETree
1✔
87

88
import dateutil.parser
1✔
89
from botocore.model import (
1✔
90
    ListShape,
91
    MapShape,
92
    OperationModel,
93
    OperationNotFoundError,
94
    ServiceModel,
95
    Shape,
96
    StructureShape,
97
)
98

99
# cbor2: explicitly load from private _decoder module to avoid using the (non-patched) C-version
100
from cbor2._decoder import loads as cbor2_loads
1✔
101
from werkzeug.exceptions import BadRequest, NotFound
1✔
102

103
from localstack.aws.protocol.op_router import RestServiceOperationRouter
1✔
104
from localstack.aws.spec import ProtocolName
1✔
105
from localstack.http import Request
1✔
106

107

108
def _text_content(func):
1✔
109
    """
110
    This decorator hides the difference between an XML node with text or a plain string.
111
    It's used to ensure that scalar processing operates only on text strings, which
112
    allows the same scalar handlers to be used for XML nodes from the body, HTTP headers,
113
    and across different protocols.
114

115
    :param func: function which should be wrapped
116
    :return: wrapper function which can be called with a node or a string, where the
117
             wrapped function is always called with a string
118
    """
119

120
    def _get_text_content(
1✔
121
        self,
122
        request: Request,
123
        shape: Shape,
124
        node_or_string: ETree.Element | str,
125
        uri_params: Mapping[str, Any] = None,
126
    ):
127
        if hasattr(node_or_string, "text"):
1✔
128
            text = node_or_string.text
1✔
129
            if text is None:
1✔
130
                # If an XML node is empty <foo></foo>, we want to parse that as an empty string,
131
                # not as a null/None value.
132
                text = ""
1✔
133
        else:
134
            text = node_or_string
1✔
135
        return func(self, request, shape, text, uri_params)
1✔
136

137
    return _get_text_content
1✔
138

139

140
class RequestParserError(Exception):
1✔
141
    """
142
    Error which is thrown if the request parsing fails.
143
    Super class of all exceptions raised by the parser.
144
    """
145

146
    pass
1✔
147

148

149
class UnknownParserError(RequestParserError):
1✔
150
    """
151
    Error which indicates that the raised exception of the parser could be caused by invalid data or by any other
152
    (unknown) issue. Errors like this should be reported and indicate an issue in the parser itself.
153
    """
154

155
    pass
1✔
156

157

158
class ProtocolParserError(RequestParserError):
1✔
159
    """
160
    Error which indicates that the given data is not compliant with the service's specification and cannot be parsed.
161
    This usually results in a response with an HTTP 4xx status code (client error).
162
    """
163

164
    pass
1✔
165

166

167
class OperationNotFoundParserError(ProtocolParserError):
1✔
168
    """
169
    Error which indicates that the given data cannot be matched to a specific operation.
170
    The request is likely _not_ meant to be handled by the ASF service provider itself.
171
    """
172

173
    pass
1✔
174

175

176
def _handle_exceptions(func):
1✔
177
    """
178
    Decorator which handles the exceptions raised by the parser. It ensures that all exceptions raised by the public
179
    methods of the parser are instances of RequestParserError.
180
    :param func: to wrap in order to add the exception handling
181
    :return: wrapped function
182
    """
183

184
    @functools.wraps(func)
1✔
185
    def wrapper(*args, **kwargs):
1✔
186
        try:
1✔
187
            return func(*args, **kwargs)
1✔
188
        except RequestParserError:
1✔
189
            raise
1✔
190
        except Exception as e:
1✔
191
            raise UnknownParserError(
1✔
192
                "An unknown error occurred when trying to parse the request."
193
            ) from e
194

195
    return wrapper
1✔
196

197

198
class RequestParser(abc.ABC):
1✔
199
    """
200
    The request parser is responsible for parsing an incoming HTTP request.
201
    It determines which operation the request was aiming for and parses the incoming request such that the resulting
202
    dictionary can be used to invoke the service's function implementation.
203
    It is the base class for all parsers and therefore contains the basic logic which is used among all of them.
204
    """
205

206
    service: ServiceModel
1✔
207
    DEFAULT_ENCODING = "utf-8"
1✔
208
    # The default timestamp format is ISO8601, but this can be overwritten by subclasses.
209
    TIMESTAMP_FORMAT = "iso8601"
1✔
210
    # The default timestamp format for header fields
211
    HEADER_TIMESTAMP_FORMAT = "rfc822"
1✔
212
    # The default timestamp format for query fields
213
    QUERY_TIMESTAMP_FORMAT = "iso8601"
1✔
214

215
    def __init__(self, service: ServiceModel) -> None:
1✔
216
        super().__init__()
1✔
217
        self.service = service
1✔
218

219
    @_handle_exceptions
1✔
220
    def parse(self, request: Request) -> tuple[OperationModel, Any]:
1✔
221
        """
222
        Determines which operation the request was aiming for and parses the incoming request such that the resulting
223
        dictionary can be used to invoke the service's function implementation.
224

225
        :param request: to parse
226
        :return: a tuple with the operation model (defining the action / operation which the request aims for),
227
                 and the parsed service parameters
228
        :raises: RequestParserError (either a ProtocolParserError or an UnknownParserError)
229
        """
230
        raise NotImplementedError
231

232
    def _parse_shape(
1✔
233
        self, request: Request, shape: Shape, node: Any, uri_params: Mapping[str, Any] = None
234
    ) -> Any:
235
        """
236
        Main parsing method which dynamically calls the parsing function for the specific shape.
237

238
        :param request: the complete Request
239
        :param shape: of the node
240
        :param node: the single part of the HTTP request to parse
241
        :param uri_params: the extracted URI path params
242
        :return: result of the parsing operation, the type depends on the shape
243
        """
244
        if shape is None:
1✔
245
            return None
1✔
246
        location = shape.serialization.get("location")
1✔
247
        if location is not None:
1✔
248
            if location == "header":
1✔
249
                header_name = shape.serialization.get("name")
1✔
250
                if shape.type_name == "list":
1✔
251
                    # headers may contain a comma separated list of values (e.g., the ObjectAttributes member in
252
                    # s3.GetObjectAttributes), so we prepare it here for the handler, which will be `_parse_list`.
253
                    # Header lists can contain optional whitespace, so we strip it
254
                    # https://www.rfc-editor.org/rfc/rfc9110.html#name-lists-rule-abnf-extension
255
                    # It can also directly contain a list of headers
256
                    # See https://datatracker.ietf.org/doc/html/rfc2616
257
                    payload = request.headers.getlist(header_name) or None
1✔
258
                    if payload:
1✔
259
                        headers = ",".join(payload)
1✔
260
                        payload = [value.strip() for value in headers.split(",")]
1✔
261

262
                else:
263
                    payload = request.headers.get(header_name)
1✔
264

265
            elif location == "headers":
1✔
266
                payload = self._parse_header_map(shape, request.headers)
1✔
267
                # shapes with the location trait "headers" only contain strings and are not further processed
268
                return payload
1✔
269
            elif location == "querystring":
1✔
270
                query_name = shape.serialization.get("name")
1✔
271
                parsed_query = request.args
1✔
272
                if shape.type_name == "list":
1✔
273
                    payload = parsed_query.getlist(query_name)
1✔
274
                else:
275
                    payload = parsed_query.get(query_name)
1✔
276
            elif location == "uri":
1✔
277
                uri_param_name = shape.serialization.get("name")
1✔
278
                if uri_param_name in uri_params:
1✔
279
                    payload = uri_params[uri_param_name]
1✔
280
            else:
281
                raise UnknownParserError(f"Unknown shape location '{location}'.")
×
282
        else:
283
            # If we don't have to use a specific location, we use the node
284
            payload = node
1✔
285

286
        fn_name = f"_parse_{shape.type_name}"
1✔
287
        handler = getattr(self, fn_name, self._noop_parser)
1✔
288
        try:
1✔
289
            return handler(request, shape, payload, uri_params) if payload is not None else None
1✔
290
        except (TypeError, ValueError, AttributeError) as e:
1✔
291
            raise ProtocolParserError(
1✔
292
                f"Invalid type when parsing {shape.name}: '{payload}' cannot be parsed to {shape.type_name}."
293
            ) from e
294

295
    # The parsing functions for primitive types, lists, and timestamps are shared among subclasses.
296

297
    def _parse_list(
1✔
298
        self,
299
        request: Request,
300
        shape: ListShape,
301
        node: list,
302
        uri_params: Mapping[str, Any] = None,
303
    ):
304
        parsed = []
1✔
305
        member_shape = shape.member
1✔
306
        for item in node:
1✔
307
            parsed.append(self._parse_shape(request, member_shape, item, uri_params))
1✔
308
        return parsed
1✔
309

310
    @_text_content
1✔
311
    def _parse_integer(self, _, __, node: str, ___) -> int:
1✔
312
        return int(node)
1✔
313

314
    @_text_content
1✔
315
    def _parse_float(self, _, __, node: str, ___) -> float:
1✔
316
        return float(node)
1✔
317

318
    @_text_content
1✔
319
    def _parse_blob(self, _, __, node: str, ___) -> bytes:
1✔
320
        return base64.b64decode(node)
1✔
321

322
    @_text_content
1✔
323
    def _parse_timestamp(self, _, shape: Shape, node: str, ___) -> datetime.datetime:
1✔
324
        timestamp_format = shape.serialization.get("timestampFormat")
1✔
325
        if not timestamp_format and shape.serialization.get("location") == "header":
1✔
326
            timestamp_format = self.HEADER_TIMESTAMP_FORMAT
1✔
327
        elif not timestamp_format and shape.serialization.get("location") == "querystring":
1✔
328
            timestamp_format = self.QUERY_TIMESTAMP_FORMAT
1✔
329
        return self._convert_str_to_timestamp(node, timestamp_format)
1✔
330

331
    @_text_content
1✔
332
    def _parse_boolean(self, _, __, node: str, ___) -> bool:
1✔
333
        value = node.lower()
1✔
334
        if value == "true":
1✔
335
            return True
1✔
336
        if value == "false":
1✔
337
            return False
1✔
338
        raise ValueError(f"cannot parse boolean value {node}")
×
339

340
    @_text_content
1✔
341
    def _noop_parser(self, _, __, node: Any, ___):
1✔
342
        return node
1✔
343

344
    _parse_character = _parse_string = _noop_parser
1✔
345
    _parse_double = _parse_float
1✔
346
    _parse_long = _parse_integer
1✔
347

348
    def _convert_str_to_timestamp(self, value: str, timestamp_format=None) -> datetime.datetime:
1✔
349
        if timestamp_format is None:
1✔
350
            timestamp_format = self.TIMESTAMP_FORMAT
1✔
351
        timestamp_format = timestamp_format.lower()
1✔
352
        converter = getattr(self, f"_timestamp_{timestamp_format}")
1✔
353
        final_value = converter(value)
1✔
354
        return final_value
1✔
355

356
    @staticmethod
1✔
357
    def _timestamp_iso8601(date_string: str) -> datetime.datetime:
1✔
358
        return dateutil.parser.isoparse(date_string)
1✔
359

360
    @staticmethod
1✔
361
    def _timestamp_unixtimestamp(timestamp_string: str) -> datetime.datetime:
1✔
362
        return datetime.datetime.fromtimestamp(int(timestamp_string), tz=datetime.UTC)
1✔
363

364
    @staticmethod
1✔
365
    def _timestamp_unixtimestampmillis(timestamp_string: str) -> datetime.datetime:
1✔
366
        return datetime.datetime.fromtimestamp(float(timestamp_string) / 1000, tz=datetime.UTC)
1✔
367

368
    @staticmethod
1✔
369
    def _timestamp_rfc822(datetime_string: str) -> datetime.datetime:
1✔
370
        return parsedate_to_datetime(datetime_string)
1✔
371

372
    @staticmethod
1✔
373
    def _parse_header_map(shape: Shape, headers: dict) -> dict:
1✔
374
        # Note that headers are case insensitive, so we .lower() all header names and header prefixes.
375
        parsed = {}
1✔
376
        prefix = shape.serialization.get("name", "").lower()
1✔
377
        for header_name, header_value in headers.items():
1✔
378
            if header_name.lower().startswith(prefix):
1✔
379
                # The key name inserted into the parsed hash strips off the prefix.
380
                name = header_name[len(prefix) :]
1✔
381
                parsed[name] = header_value
1✔
382
        return parsed
1✔
383

384

385
class QueryRequestParser(RequestParser):
1✔
386
    """
387
    The ``QueryRequestParser`` is responsible for parsing incoming requests for services which use the ``query``
388
    protocol. The requests for these services encode the majority of their parameters in the URL query string.
389
    """
390

391
    @_handle_exceptions
1✔
392
    def parse(self, request: Request) -> tuple[OperationModel, Any]:
1✔
393
        instance = request.values
1✔
394
        if "Action" not in instance:
1✔
395
            raise ProtocolParserError(
×
396
                f"Operation detection failed. "
397
                f"Missing Action in request for query-protocol service {self.service}."
398
            )
399
        action = instance["Action"]
1✔
400
        try:
1✔
401
            operation: OperationModel = self.service.operation_model(action)
1✔
402
        except OperationNotFoundError as e:
1✔
403
            raise OperationNotFoundParserError(
1✔
404
                f"Operation detection failed."
405
                f"Operation {action} could not be found for service {self.service}."
406
            ) from e
407
        # There are no uri params in the query protocol (all ops are POST on "/")
408
        uri_params = {}
1✔
409
        input_shape: StructureShape = operation.input_shape
1✔
410
        parsed = self._parse_shape(request, input_shape, instance, uri_params)
1✔
411
        if parsed is None:
1✔
412
            return operation, {}
1✔
413
        return operation, parsed
1✔
414

415
    def _process_member(
1✔
416
        self,
417
        request: Request,
418
        member_name: str,
419
        member_shape: Shape,
420
        node: dict,
421
        uri_params: Mapping[str, Any] = None,
422
    ):
423
        if isinstance(member_shape, (MapShape, ListShape, StructureShape)):
1✔
424
            # If we have a complex type, we filter the node and change it's keys to craft a new "context" for the
425
            # new hierarchy level
426
            sub_node = self._filter_node(member_name, node)
1✔
427
        else:
428
            # If it is a primitive type we just get the value from the dict
429
            sub_node = node.get(member_name)
1✔
430
        # The filtered node is processed and returned (or None if the sub_node is None)
431
        return (
1✔
432
            self._parse_shape(request, member_shape, sub_node, uri_params)
433
            if sub_node is not None
434
            else None
435
        )
436

437
    def _parse_structure(
1✔
438
        self,
439
        request: Request,
440
        shape: StructureShape,
441
        node: dict,
442
        uri_params: Mapping[str, Any] = None,
443
    ) -> dict:
444
        result = {}
1✔
445

446
        for member, member_shape in shape.members.items():
1✔
447
            # The key in the node is either the serialization config "name" of the shape, or the name of the member
448
            member_name = self._get_serialized_name(member_shape, member, node)
1✔
449
            # BUT, if it's flattened and a list, the name is defined by the list's member's name
450
            if member_shape.serialization.get("flattened"):
1✔
451
                if isinstance(member_shape, ListShape):
1✔
452
                    member_name = self._get_serialized_name(member_shape.member, member, node)
1✔
453
            value = self._process_member(request, member_name, member_shape, node, uri_params)
1✔
454
            if value is not None or member in shape.required_members:
1✔
455
                # If the member is required, but not existing, we explicitly set None
456
                result[member] = value
1✔
457

458
        return result if len(result) > 0 else None
1✔
459

460
    def _parse_map(
1✔
461
        self, request: Request, shape: MapShape, node: dict, uri_params: Mapping[str, Any]
462
    ) -> dict:
463
        """
464
        This is what the node looks like for a flattened map::
465
        ::
466
          {
467
              "Attribute.1.Name": "MyKey",
468
              "Attribute.1.Value": "MyValue",
469
              "Attribute.2.Name": ...,
470
              ...
471
          }
472
        ::
473
        This function expects an already filtered / pre-processed node. The node dict would therefore look like:
474
        ::
475
          {
476
              "1.Name": "MyKey",
477
              "1.Value": "MyValue",
478
              "2.Name": ...
479
          }
480
        ::
481
        """
482
        key_prefix = ""
1✔
483
        # Non-flattened maps have an additional hierarchy level named "entry"
484
        # https://awslabs.github.io/smithy/1.0/spec/core/xml-traits.html#xmlflattened-trait
485
        if not shape.serialization.get("flattened"):
1✔
486
            key_prefix += "entry."
1✔
487
        result = {}
1✔
488

489
        i = 0
1✔
490
        while True:
1✔
491
            i += 1
1✔
492
            # The key and value can be renamed (with their serialization config's "name").
493
            # By default they are called "key" and "value".
494
            key_name = f"{key_prefix}{i}.{self._get_serialized_name(shape.key, 'key', node)}"
1✔
495
            value_name = f"{key_prefix}{i}.{self._get_serialized_name(shape.value, 'value', node)}"
1✔
496

497
            # We process the key and value individually
498
            k = self._process_member(request, key_name, shape.key, node)
1✔
499
            v = self._process_member(request, value_name, shape.value, node)
1✔
500
            if k is None or v is None:
1✔
501
                # technically, if one exists but not the other, then that would be an invalid request
502
                break
1✔
503
            result[k] = v
1✔
504

505
        return result if len(result) > 0 else None
1✔
506

507
    def _parse_list(
1✔
508
        self,
509
        request: Request,
510
        shape: ListShape,
511
        node: dict,
512
        uri_params: Mapping[str, Any] = None,
513
    ) -> list:
514
        """
515
        Some actions take lists of parameters. These lists are specified using the param.[member.]n notation.
516
        The "member" is used if the list is not flattened.
517
        Values of n are integers starting from 1.
518
        For example, a list with two elements looks like this:
519
        - Flattened: &AttributeName.1=first&AttributeName.2=second
520
        - Non-flattened: &AttributeName.member.1=first&AttributeName.member.2=second
521
        This function expects an already filtered / processed node. The node dict would therefore look like:
522
        ::
523
          {
524
              "1": "first",
525
              "2": "second",
526
              "3": ...
527
          }
528
        ::
529
        """
530
        # The keys might be prefixed (f.e. for flattened lists)
531
        key_prefix = self._get_list_key_prefix(shape, node)
1✔
532

533
        # We collect the list value as well as the integer indicating the list position so we can
534
        # later sort the list by the position, in case they attribute values are unordered
535
        result: list[tuple[int, Any]] = []
1✔
536

537
        i = 0
1✔
538
        while True:
1✔
539
            i += 1
1✔
540
            key_name = f"{key_prefix}{i}"
1✔
541
            value = self._process_member(request, key_name, shape.member, node)
1✔
542
            if value is None:
1✔
543
                break
1✔
544
            result.append((i, value))
1✔
545

546
        return [r[1] for r in sorted(result)] if len(result) > 0 else None
1✔
547

548
    @staticmethod
1✔
549
    def _filter_node(name: str, node: dict) -> dict:
1✔
550
        """Filters the node dict for entries where the key starts with the given name."""
551
        filtered = {k[len(name) + 1 :]: v for k, v in node.items() if k.startswith(name)}
1✔
552
        return filtered if len(filtered) > 0 else None
1✔
553

554
    def _get_serialized_name(self, shape: Shape, default_name: str, node: dict) -> str:
1✔
555
        """
556
        Returns the serialized name for the shape if it exists.
557
        Otherwise, it will return the given default_name.
558
        """
559
        return shape.serialization.get("name", default_name)
1✔
560

561
    def _get_list_key_prefix(self, shape: ListShape, node: dict):
1✔
562
        key_prefix = ""
1✔
563
        # Non-flattened lists have an additional hierarchy level:
564
        # https://awslabs.github.io/smithy/1.0/spec/core/xml-traits.html#xmlflattened-trait
565
        # The hierarchy level's name is the serialization name of its member or (by default) "member".
566
        if not shape.serialization.get("flattened"):
1✔
567
            key_prefix += f"{self._get_serialized_name(shape.member, 'member', node)}."
1✔
568
        return key_prefix
1✔
569

570

571
class BaseRestRequestParser(RequestParser):
1✔
572
    """
573
    The ``BaseRestRequestParser`` is the base class for all "resty" AWS service protocols.
574
    The operation which should be invoked is determined based on the HTTP method and the path suffix.
575
    The body encoding is done in the respective subclasses.
576
    """
577

578
    def __init__(self, service: ServiceModel) -> None:
1✔
579
        super().__init__(service)
1✔
580
        self.ignore_get_body_errors = False
1✔
581
        self._operation_router = RestServiceOperationRouter(service)
1✔
582

583
    @_handle_exceptions
1✔
584
    def parse(self, request: Request) -> tuple[OperationModel, Any]:
1✔
585
        try:
1✔
586
            operation, uri_params = self._operation_router.match(request)
1✔
587
        except NotFound as e:
1✔
588
            raise OperationNotFoundParserError(
1✔
589
                f"Unable to find operation for request to service "
590
                f"{self.service.service_name}: {request.method} {request.path}"
591
            ) from e
592

593
        shape: StructureShape = operation.input_shape
1✔
594
        final_parsed = {}
1✔
595
        if shape is not None:
1✔
596
            self._parse_payload(request, shape, shape.members, uri_params, final_parsed)
1✔
597
        return operation, final_parsed
1✔
598

599
    def _parse_payload(
1✔
600
        self,
601
        request: Request,
602
        shape: Shape,
603
        member_shapes: dict[str, Shape],
604
        uri_params: Mapping[str, Any],
605
        final_parsed: dict,
606
    ) -> None:
607
        """Parses all attributes which are located in the payload / body of the incoming request."""
608
        payload_parsed = {}
1✔
609
        non_payload_parsed = {}
1✔
610
        if "payload" in shape.serialization:
1✔
611
            # If a payload is specified in the output shape, then only that shape is used for the body payload.
612
            payload_member_name = shape.serialization["payload"]
1✔
613
            body_shape = member_shapes[payload_member_name]
1✔
614
            if body_shape.serialization.get("eventstream"):
1✔
615
                body = self._create_event_stream(request, body_shape)
×
616
                payload_parsed[payload_member_name] = body
×
617
            elif body_shape.type_name == "string":
1✔
618
                # Only set the value if it's not empty (the request's data is an empty binary by default)
619
                if request.data:
1✔
620
                    body = request.data
1✔
621
                    if isinstance(body, bytes):
1✔
622
                        body = body.decode(self.DEFAULT_ENCODING)
1✔
623
                    payload_parsed[payload_member_name] = body
1✔
624
            elif body_shape.type_name == "blob":
1✔
625
                # This control path is equivalent to operation.has_streaming_input (shape has a payload which is a blob)
626
                # in which case we assume essentially an IO[bytes] to be passed. Since the payload can be optional, we
627
                # only set the parameter if content_length=0, which indicates an empty request. If the content length is
628
                # not set, it could be a streaming response.
629
                if request.content_length != 0:
1✔
630
                    payload_parsed[payload_member_name] = self.create_input_stream(request)
1✔
631
            else:
632
                original_parsed = self._initial_body_parse(request)
1✔
633
                payload_parsed[payload_member_name] = self._parse_shape(
1✔
634
                    request, body_shape, original_parsed, uri_params
635
                )
636
        else:
637
            # The payload covers the whole body. We only parse the body if it hasn't been handled by the payload logic.
638
            try:
1✔
639
                non_payload_parsed = self._initial_body_parse(request)
1✔
640
            except ProtocolParserError:
1✔
641
                # GET requests should ignore the body, so we just let them pass
642
                if not (request.method in ["GET", "HEAD"] and self.ignore_get_body_errors):
1✔
643
                    raise
1✔
644

645
        # even if the payload has been parsed, the rest of the shape needs to be processed as well
646
        # (for members which are located outside of the body, like uri or header)
647
        non_payload_parsed = self._parse_shape(request, shape, non_payload_parsed, uri_params)
1✔
648
        # update the final result with the parsed body and the parsed payload (where the payload has precedence)
649
        final_parsed.update(non_payload_parsed)
1✔
650
        final_parsed.update(payload_parsed)
1✔
651

652
    def _initial_body_parse(self, request: Request) -> Any:
1✔
653
        """
654
        This method executes the initial parsing of the body (XML, JSON, or CBOR).
655
        The parsed body will afterwards still be walked through and the nodes will be converted to the appropriate
656
        types, but this method does the first round of parsing.
657

658
        :param request: of which the body should be parsed
659
        :return: depending on the actual implementation
660
        """
661
        raise NotImplementedError("_initial_body_parse")
662

663
    def _create_event_stream(self, request: Request, shape: Shape) -> Any:
1✔
664
        # TODO handle event streams
665
        raise NotImplementedError("_create_event_stream")
666

667
    def create_input_stream(self, request: Request) -> IO[bytes]:
1✔
668
        """
669
        Returns an IO object that makes the payload of the Request available for streaming.
670

671
        :param request: the http request
672
        :return: the input stream that allows services to consume the request payload
673
        """
674
        # for now _get_stream_for_parsing seems to be a good compromise. it can be used even after `request.data` was
675
        # previously called. however the reverse doesn't work. once the stream has been consumed, `request.data` will
676
        # return b''
677
        return request._get_stream_for_parsing()
1✔
678

679

680
class RestXMLRequestParser(BaseRestRequestParser):
1✔
681
    """
682
    The ``RestXMLRequestParser`` is responsible for parsing incoming requests for services which use the ``rest-xml``
683
    protocol. The requests for these services encode the majority of their parameters as XML in the request body.
684
    """
685

686
    def __init__(self, service_model: ServiceModel):
1✔
687
        super().__init__(service_model)
1✔
688
        self.ignore_get_body_errors = True
1✔
689
        self._namespace_re = re.compile("{.*}")
1✔
690

691
    def _initial_body_parse(self, request: Request) -> ETree.Element:
1✔
692
        body = request.data
1✔
693
        if not body:
1✔
694
            return ETree.Element("")
1✔
695
        return self._parse_xml_string_to_dom(body)
1✔
696

697
    def _parse_structure(
1✔
698
        self,
699
        request: Request,
700
        shape: StructureShape,
701
        node: ETree.Element,
702
        uri_params: Mapping[str, Any] = None,
703
    ) -> dict:
704
        parsed = {}
1✔
705
        xml_dict = self._build_name_to_xml_node(node)
1✔
706
        for member_name, member_shape in shape.members.items():
1✔
707
            xml_name = self._member_key_name(member_shape, member_name)
1✔
708
            member_node = xml_dict.get(xml_name)
1✔
709
            # If a shape defines a location trait, the node might be None (since these are extracted from the request's
710
            # metadata like headers or the URI)
711
            if (
1✔
712
                member_node is not None
713
                or "location" in member_shape.serialization
714
                or member_shape.serialization.get("eventheader")
715
            ):
716
                parsed[member_name] = self._parse_shape(
1✔
717
                    request, member_shape, member_node, uri_params
718
                )
719
            elif member_shape.serialization.get("xmlAttribute"):
1✔
720
                attributes = {}
1✔
721
                location_name = member_shape.serialization["name"]
1✔
722
                for key, value in node.attrib.items():
1✔
723
                    new_key = self._namespace_re.sub(location_name.split(":")[0] + ":", key)
1✔
724
                    attributes[new_key] = value
1✔
725
                if location_name in attributes:
1✔
726
                    parsed[member_name] = attributes[location_name]
1✔
727
            elif member_name in shape.required_members:
1✔
728
                # If the member is required, but not existing, we explicitly set None
729
                parsed[member_name] = None
1✔
730
        return parsed
1✔
731

732
    def _parse_map(
1✔
733
        self,
734
        request: Request,
735
        shape: MapShape,
736
        node: dict,
737
        uri_params: Mapping[str, Any] = None,
738
    ) -> dict:
739
        parsed = {}
×
740
        key_shape = shape.key
×
741
        value_shape = shape.value
×
742
        key_location_name = key_shape.serialization.get("name", "key")
×
743
        value_location_name = value_shape.serialization.get("name", "value")
×
744
        if shape.serialization.get("flattened") and not isinstance(node, list):
×
745
            node = [node]
×
746
        for keyval_node in node:
×
747
            key_name = val_name = None
×
748
            for single_pair in keyval_node:
×
749
                # Within each <entry> there's a <key> and a <value>
750
                tag_name = self._node_tag(single_pair)
×
751
                if tag_name == key_location_name:
×
752
                    key_name = self._parse_shape(request, key_shape, single_pair, uri_params)
×
753
                elif tag_name == value_location_name:
×
754
                    val_name = self._parse_shape(request, value_shape, single_pair, uri_params)
×
755
                else:
756
                    raise ProtocolParserError(f"Unknown tag: {tag_name}")
×
757
            parsed[key_name] = val_name
×
758
        return parsed
×
759

760
    def _parse_list(
1✔
761
        self,
762
        request: Request,
763
        shape: ListShape,
764
        node: dict,
765
        uri_params: Mapping[str, Any] = None,
766
    ) -> list:
767
        # When we use _build_name_to_xml_node, repeated elements are aggregated
768
        # into a list. However, we can't tell the difference between a scalar
769
        # value and a single element flattened list. So before calling the
770
        # real _handle_list, we know that "node" should actually be a list if
771
        # it's flattened, and if it's not, then we make it a one element list.
772
        if shape.serialization.get("flattened") and not isinstance(node, list):
1✔
773
            node = [node]
1✔
774
        return super()._parse_list(request, shape, node, uri_params)
1✔
775

776
    def _node_tag(self, node: ETree.Element) -> str:
1✔
777
        return self._namespace_re.sub("", node.tag)
1✔
778

779
    @staticmethod
1✔
780
    def _member_key_name(shape: Shape, member_name: str) -> str:
1✔
781
        # This method is needed because we have to special case flattened list
782
        # with a serialization name.  If this is the case we use the
783
        # locationName from the list's member shape as the key name for the
784
        # surrounding structure.
785
        if isinstance(shape, ListShape) and shape.serialization.get("flattened"):
1✔
786
            list_member_serialized_name = shape.member.serialization.get("name")
1✔
787
            if list_member_serialized_name is not None:
1✔
788
                return list_member_serialized_name
1✔
789
        serialized_name = shape.serialization.get("name")
1✔
790
        if serialized_name is not None:
1✔
791
            return serialized_name
1✔
792
        return member_name
1✔
793

794
    @staticmethod
1✔
795
    def _parse_xml_string_to_dom(xml_string: str) -> ETree.Element:
1✔
796
        try:
1✔
797
            parser = ETree.XMLParser(target=ETree.TreeBuilder())
1✔
798
            parser.feed(xml_string)
1✔
799
            root = parser.close()
1✔
800
        except ETree.ParseError as e:
1✔
801
            raise ProtocolParserError(
1✔
802
                f"Unable to parse request ({e}), invalid XML received:\n{xml_string}"
803
            ) from e
804
        return root
1✔
805

806
    def _build_name_to_xml_node(self, parent_node: list | ETree.Element) -> dict:
1✔
807
        # If the parent node is actually a list. We should not be trying
808
        # to serialize it to a dictionary. Instead, return the first element
809
        # in the list.
810
        if isinstance(parent_node, list):
1✔
811
            return self._build_name_to_xml_node(parent_node[0])
×
812
        xml_dict = {}
1✔
813
        for item in parent_node:
1✔
814
            key = self._node_tag(item)
1✔
815
            if key in xml_dict:
1✔
816
                # If the key already exists, the most natural
817
                # way to handle this is to aggregate repeated
818
                # keys into a single list.
819
                # <foo>1</foo><foo>2</foo> -> {'foo': [Node(1), Node(2)]}
820
                if isinstance(xml_dict[key], list):
1✔
821
                    xml_dict[key].append(item)
1✔
822
                else:
823
                    # Convert from a scalar to a list.
824
                    xml_dict[key] = [xml_dict[key], item]
1✔
825
            else:
826
                xml_dict[key] = item
1✔
827
        return xml_dict
1✔
828

829
    def _create_event_stream(self, request: Request, shape: Shape) -> Any:
1✔
830
        # TODO handle event streams
831
        raise NotImplementedError("_create_event_stream")
832

833

834
class BaseJSONRequestParser(RequestParser, ABC):
1✔
835
    """
836
    The ``BaseJSONRequestParser`` is the base class for all JSON-based AWS service protocols.
837
    This base-class handles parsing the payload / body as JSON.
838
    """
839

840
    # default timestamp format for JSON requests
841
    TIMESTAMP_FORMAT = "unixtimestamp"
1✔
842
    # timestamp format for requests with CBOR content type
843
    CBOR_TIMESTAMP_FORMAT = "unixtimestampmillis"
1✔
844

845
    def _parse_structure(
1✔
846
        self,
847
        request: Request,
848
        shape: StructureShape,
849
        value: dict | None,
850
        uri_params: Mapping[str, Any] = None,
851
    ) -> dict | None:
852
        if shape.is_document_type:
1✔
853
            final_parsed = value
×
854
        else:
855
            if value is None:
1✔
856
                # If the comes across the wire as "null" (None in python),
857
                # we should be returning this unchanged, instead of as an
858
                # empty dict.
859
                return None
×
860
            final_parsed = {}
1✔
861
            for member_name, member_shape in shape.members.items():
1✔
862
                json_name = member_shape.serialization.get("name", member_name)
1✔
863
                raw_value = value.get(json_name)
1✔
864
                parsed = self._parse_shape(request, member_shape, raw_value, uri_params)
1✔
865
                if parsed is not None or member_name in shape.required_members:
1✔
866
                    # If the member is required, but not existing, we set it to None anyways
867
                    final_parsed[member_name] = parsed
1✔
868
        return final_parsed
1✔
869

870
    def _parse_map(
1✔
871
        self,
872
        request: Request,
873
        shape: MapShape,
874
        value: dict | None,
875
        uri_params: Mapping[str, Any] = None,
876
    ) -> dict | None:
877
        if value is None:
1✔
878
            return None
×
879
        parsed = {}
1✔
880
        key_shape = shape.key
1✔
881
        value_shape = shape.value
1✔
882
        for key, val in value.items():
1✔
883
            actual_key = self._parse_shape(request, key_shape, key, uri_params)
1✔
884
            actual_value = self._parse_shape(request, value_shape, val, uri_params)
1✔
885
            parsed[actual_key] = actual_value
1✔
886
        return parsed
1✔
887

888
    def _parse_body_as_json(self, request: Request) -> dict:
1✔
889
        body_contents = request.data
1✔
890
        if not body_contents:
1✔
891
            return {}
1✔
892
        if request.mimetype.startswith("application/x-amz-cbor"):
1✔
893
            try:
1✔
894
                return cbor2_loads(body_contents)
1✔
895
            except ValueError as e:
×
896
                raise ProtocolParserError("HTTP body could not be parsed as CBOR.") from e
×
897
        else:
898
            try:
1✔
899
                return request.get_json(force=True)
1✔
900
            except BadRequest as e:
1✔
901
                raise ProtocolParserError("HTTP body could not be parsed as JSON.") from e
1✔
902

903
    def _parse_boolean(
1✔
904
        self, request: Request, shape: Shape, node: bool, uri_params: Mapping[str, Any] = None
905
    ) -> bool:
906
        return super()._noop_parser(request, shape, node, uri_params)
1✔
907

908
    def _parse_timestamp(
1✔
909
        self, request: Request, shape: Shape, node: str, uri_params: Mapping[str, Any] = None
910
    ) -> datetime.datetime:
911
        if not shape.serialization.get("timestampFormat") and request.mimetype.startswith(
1✔
912
            "application/x-amz-cbor"
913
        ):
914
            # cbor2 has native support for timestamp decoding, so this node could already have the right type
915
            if isinstance(node, datetime.datetime):
1✔
916
                return node
1✔
917
            # otherwise parse the timestamp using the AWS CBOR timestamp format
918
            # (non-CBOR-standard conform, uses millis instead of floating-point-millis)
919
            return self._convert_str_to_timestamp(node, self.CBOR_TIMESTAMP_FORMAT)
1✔
920
        return super()._parse_timestamp(request, shape, node, uri_params)
1✔
921

922
    def _parse_blob(
1✔
923
        self, request: Request, shape: Shape, node: bool, uri_params: Mapping[str, Any] = None
924
    ) -> bytes:
925
        if isinstance(node, bytes) and request.mimetype.startswith("application/x-amz-cbor"):
1✔
926
            # CBOR does not base64 encode binary data
927
            return bytes(node)
1✔
928
        else:
929
            return super()._parse_blob(request, shape, node, uri_params)
1✔
930

931

932
class JSONRequestParser(BaseJSONRequestParser):
1✔
933
    """
934
    The ``JSONRequestParser`` is responsible for parsing incoming requests for services which use the ``json``
935
    protocol.
936
    The requests for these services encode the majority of their parameters as JSON in the request body.
937
    The operation is defined in an HTTP header field.
938
    """
939

940
    @_handle_exceptions
1✔
941
    def parse(self, request: Request) -> tuple[OperationModel, Any]:
1✔
942
        target = request.headers["X-Amz-Target"]
1✔
943
        # assuming that the last part of the target string (e.g., "x.y.z.MyAction") contains the operation name
944
        operation_name = target.rpartition(".")[2]
1✔
945
        operation = self.service.operation_model(operation_name)
1✔
946
        shape = operation.input_shape
1✔
947
        # There are no uri params in the query protocol
948
        uri_params = {}
1✔
949
        final_parsed = self._do_parse(request, shape, uri_params)
1✔
950
        return operation, final_parsed
1✔
951

952
    def _do_parse(
1✔
953
        self, request: Request, shape: Shape, uri_params: Mapping[str, Any] = None
954
    ) -> dict:
955
        parsed = {}
1✔
956
        if shape is not None:
1✔
957
            event_name = shape.event_stream_name
1✔
958
            if event_name:
1✔
959
                parsed = self._handle_event_stream(request, shape, event_name)
×
960
            else:
961
                parsed = self._handle_json_body(request, shape, uri_params)
1✔
962
        return parsed
1✔
963

964
    def _handle_event_stream(self, request: Request, shape: Shape, event_name: str):
1✔
965
        # TODO handle event streams
966
        raise NotImplementedError
967

968
    def _handle_json_body(
1✔
969
        self, request: Request, shape: Shape, uri_params: Mapping[str, Any] = None
970
    ) -> Any:
971
        # The json.loads() gives us the primitive JSON types, but we need to traverse the parsed JSON data to convert
972
        # to richer types (blobs, timestamps, etc.)
973
        parsed_json = self._parse_body_as_json(request)
1✔
974
        return self._parse_shape(request, shape, parsed_json, uri_params)
1✔
975

976

977
class RestJSONRequestParser(BaseRestRequestParser, BaseJSONRequestParser):
1✔
978
    """
979
    The ``RestJSONRequestParser`` is responsible for parsing incoming requests for services which use the ``rest-json``
980
    protocol.
981
    The requests for these services encode the majority of their parameters as JSON in the request body.
982
    The operation is defined by the HTTP method and the path suffix.
983
    """
984

985
    def _initial_body_parse(self, request: Request) -> dict:
1✔
986
        return self._parse_body_as_json(request)
1✔
987

988
    def _create_event_stream(self, request: Request, shape: Shape) -> Any:
1✔
989
        raise NotImplementedError
990

991

992
class BaseCBORRequestParser(RequestParser, ABC):
1✔
993
    """
994
    The ``BaseCBORRequestParser`` is the base class for all CBOR-based AWS service protocols.
995
    This base-class handles parsing the payload / body as CBOR.
996
    """
997

998
    INDEFINITE_ITEM_ADDITIONAL_INFO = 31
1✔
999
    BREAK_CODE = 0xFF
1✔
1000
    # timestamp format for requests with CBOR content type
1001
    TIMESTAMP_FORMAT = "unixtimestamp"
1✔
1002

1003
    @functools.cached_property
1✔
1004
    def major_type_to_parsing_method_map(self):
1✔
1005
        return {
1✔
1006
            0: self._parse_type_unsigned_integer,
1007
            1: self._parse_type_negative_integer,
1008
            2: self._parse_type_byte_string,
1009
            3: self._parse_type_text_string,
1010
            4: self._parse_type_array,
1011
            5: self._parse_type_map,
1012
            6: self._parse_type_tag,
1013
            7: self._parse_type_simple_and_float,
1014
        }
1015

1016
    @staticmethod
1✔
1017
    def get_peekable_stream_from_bytes(_bytes: bytes) -> io.BufferedReader:
1✔
1018
        return io.BufferedReader(io.BytesIO(_bytes))
1✔
1019

1020
    def parse_data_item(self, stream: io.BufferedReader) -> Any:
1✔
1021
        # CBOR data is divided into "data items", and each data item starts
1022
        # with an initial byte that describes how the following bytes should be parsed
1023
        initial_byte = self._read_bytes_as_int(stream, 1)
1✔
1024
        # The highest order three bits of the initial byte describe the CBOR major type
1025
        major_type = initial_byte >> 5
1✔
1026
        # The lowest order 5 bits of the initial byte tells us more information about
1027
        # how the bytes should be parsed that will be used
1028
        additional_info: int = initial_byte & 0b00011111
1✔
1029

1030
        if major_type in self.major_type_to_parsing_method_map:
1✔
1031
            method = self.major_type_to_parsing_method_map[major_type]
1✔
1032
            return method(stream, additional_info)
1✔
1033
        else:
1034
            raise ProtocolParserError(
×
1035
                f"Unsupported inital byte found for data item- "
1036
                f"Major type:{major_type}, Additional info: "
1037
                f"{additional_info}"
1038
            )
1039

1040
    # Major type 0 - unsigned integers
1041
    def _parse_type_unsigned_integer(self, stream: io.BufferedReader, additional_info: int) -> int:
1✔
1042
        additional_info_to_num_bytes = {
1✔
1043
            24: 1,
1044
            25: 2,
1045
            26: 4,
1046
            27: 8,
1047
        }
1048
        # Values under 24 don't need a full byte to be stored; their values are
1049
        # instead stored as the "additional info" in the initial byte
1050
        if additional_info < 24:
1✔
1051
            return additional_info
1✔
1052
        elif additional_info in additional_info_to_num_bytes:
1✔
1053
            num_bytes = additional_info_to_num_bytes[additional_info]
1✔
1054
            return self._read_bytes_as_int(stream, num_bytes)
1✔
1055
        else:
1056
            raise ProtocolParserError(
×
1057
                "Invalid CBOR integer returned from the service; unparsable "
1058
                f"additional info found for major type 0 or 1: {additional_info}"
1059
            )
1060

1061
    # Major type 1 - negative integers
1062
    def _parse_type_negative_integer(self, stream: io.BufferedReader, additional_info: int) -> int:
1✔
1063
        return -1 - self._parse_type_unsigned_integer(stream, additional_info)
×
1064

1065
    # Major type 2 - byte string
1066
    def _parse_type_byte_string(self, stream: io.BufferedReader, additional_info: int) -> bytes:
1✔
1067
        if additional_info != self.INDEFINITE_ITEM_ADDITIONAL_INFO:
1✔
1068
            length = self._parse_type_unsigned_integer(stream, additional_info)
1✔
1069
            return self._read_from_stream(stream, length)
1✔
1070
        else:
1071
            chunks = []
×
1072
            while True:
×
1073
                if self._handle_break_code(stream):
×
1074
                    break
×
1075
                initial_byte = self._read_bytes_as_int(stream, 1)
×
1076
                additional_info = initial_byte & 0b00011111
×
1077
                length = self._parse_type_unsigned_integer(stream, additional_info)
×
1078
                chunks.append(self._read_from_stream(stream, length))
×
1079
            return b"".join(chunks)
×
1080

1081
    # Major type 3 - text string
1082
    def _parse_type_text_string(self, stream: io.BufferedReader, additional_info: int) -> str:
1✔
1083
        return self._parse_type_byte_string(stream, additional_info).decode("utf-8")
1✔
1084

1085
    # Major type 4 - lists
1086
    def _parse_type_array(self, stream: io.BufferedReader, additional_info: int) -> list:
1✔
1087
        if additional_info != self.INDEFINITE_ITEM_ADDITIONAL_INFO:
1✔
1088
            length = self._parse_type_unsigned_integer(stream, additional_info)
1✔
1089
            return [self.parse_data_item(stream) for _ in range(length)]
1✔
1090
        else:
1091
            items = []
×
1092
            while not self._handle_break_code(stream):
×
1093
                items.append(self.parse_data_item(stream))
×
1094
            return items
×
1095

1096
    # Major type 5 - maps
1097
    def _parse_type_map(self, stream: io.BufferedReader, additional_info: int) -> dict:
1✔
1098
        items = {}
1✔
1099
        if additional_info != self.INDEFINITE_ITEM_ADDITIONAL_INFO:
1✔
1100
            length = self._parse_type_unsigned_integer(stream, additional_info)
1✔
1101
            for _ in range(length):
1✔
1102
                self._parse_type_key_value_pair(stream, items)
1✔
1103
            return items
1✔
1104

1105
        else:
1106
            while not self._handle_break_code(stream):
1✔
1107
                self._parse_type_key_value_pair(stream, items)
1✔
1108
            return items
1✔
1109

1110
    def _parse_type_key_value_pair(self, stream: io.BufferedReader, items: dict) -> None:
1✔
1111
        key = self.parse_data_item(stream)
1✔
1112
        value = self.parse_data_item(stream)
1✔
1113
        if value is not None:
1✔
1114
            items[key] = value
1✔
1115

1116
    # Major type 6 is tags.  The only tag we currently support is tag 1 for unix
1117
    # timestamps
1118
    def _parse_type_tag(self, stream: io.BufferedReader, additional_info: int):
1✔
1119
        tag = self._parse_type_unsigned_integer(stream, additional_info)
1✔
1120
        value = self.parse_data_item(stream)
1✔
1121
        if tag == 1:  # Epoch-based date/time in milliseconds
1✔
1122
            return self._parse_type_datetime(value)
1✔
1123
        else:
1124
            raise ProtocolParserError(f"Found CBOR tag not supported by botocore: {tag}")
×
1125

1126
    def _parse_type_datetime(self, value: int | float) -> datetime.datetime:
1✔
1127
        if isinstance(value, (int, float)):
1✔
1128
            return self._convert_str_to_timestamp(str(value))
1✔
1129
        else:
1130
            raise ProtocolParserError(f"Unable to parse datetime value: {value}")
×
1131

1132
    # Major type 7 includes floats and "simple" types.  Supported simple types are
1133
    # currently boolean values, CBOR's null, and CBOR's undefined type.  All other
1134
    # values are either floats or invalid.
1135
    def _parse_type_simple_and_float(
1✔
1136
        self, stream: io.BufferedReader, additional_info: int
1137
    ) -> bool | float | None:
1138
        # For major type 7, values 20-23 correspond to CBOR "simple" values
1139
        additional_info_simple_values = {
1✔
1140
            20: False,  # CBOR false
1141
            21: True,  # CBOR true
1142
            22: None,  # CBOR null
1143
            23: None,  # CBOR undefined
1144
        }
1145
        # First we check if the additional info corresponds to a supported simple value
1146
        if additional_info in additional_info_simple_values:
1✔
1147
            return additional_info_simple_values[additional_info]
×
1148

1149
        # If it's not a simple value, we need to parse it into the correct format and
1150
        # number fo bytes
1151
        float_formats = {
1✔
1152
            25: (">e", 2),
1153
            26: (">f", 4),
1154
            27: (">d", 8),
1155
        }
1156

1157
        if additional_info in float_formats:
1✔
1158
            float_format, num_bytes = float_formats[additional_info]
1✔
1159
            return struct.unpack(float_format, self._read_from_stream(stream, num_bytes))[0]
1✔
1160
        raise ProtocolParserError(
×
1161
            f"Invalid additional info found for major type 7: {additional_info}.  "
1162
            f"This indicates an unsupported simple type or an indefinite float value"
1163
        )
1164

1165
    @_text_content
1✔
1166
    def _parse_blob(self, _, __, node: bytes, ___) -> bytes:
1✔
1167
        return node
1✔
1168

1169
    @_text_content
1✔
1170
    def _parse_timestamp(
1✔
1171
        self, _, shape: Shape, node: datetime.datetime | str, ___
1172
    ) -> datetime.datetime:
1173
        if isinstance(node, datetime.datetime):
1✔
1174
            return node
1✔
1175
        return super()._parse_timestamp(_, shape, node, ___)
1✔
1176

1177
    # This helper method is intended for use when parsing indefinite length items.
1178
    # It does nothing if the next byte is not the break code.  If the next byte is
1179
    # the break code, it advances past that byte and returns True so the calling
1180
    # method knows to stop parsing that data item.
1181
    def _handle_break_code(self, stream: io.BufferedReader) -> bool | None:
1✔
1182
        if int.from_bytes(stream.peek(1)[:1], "big") == self.BREAK_CODE:
1✔
1183
            stream.seek(1, os.SEEK_CUR)
1✔
1184
            return True
1✔
1185

1186
    def _read_bytes_as_int(self, stream: IO[bytes], num_bytes: int) -> int:
1✔
1187
        byte = self._read_from_stream(stream, num_bytes)
1✔
1188
        return int.from_bytes(byte, "big")
1✔
1189

1190
    @staticmethod
1✔
1191
    def _read_from_stream(stream: IO[bytes], num_bytes: int) -> bytes:
1✔
1192
        value = stream.read(num_bytes)
1✔
1193
        if len(value) != num_bytes:
1✔
1194
            raise ProtocolParserError(
×
1195
                "End of stream reached; this indicates a "
1196
                "malformed CBOR response from the server or an "
1197
                "issue in botocore"
1198
            )
1199
        return value
1✔
1200

1201

1202
class CBORRequestParser(BaseCBORRequestParser, JSONRequestParser):
1✔
1203
    """
1204
    The ``CBORRequestParser`` is responsible for parsing incoming requests for services which use the ``cbor``
1205
    protocol.
1206
    The requests for these services encode the majority of their parameters as CBOR in the request body.
1207
    The operation is defined in an HTTP header field.
1208
    This protocol is not properly defined in the specs, but it is derived from the ``json`` protocol. Only Kinesis uses
1209
    it for now.
1210
    """
1211

1212
    # timestamp format is different from traditional CBOR, and is encoded as a milliseconds integer
1213
    TIMESTAMP_FORMAT = "unixtimestampmillis"
1✔
1214

1215
    def _do_parse(
1✔
1216
        self, request: Request, shape: Shape, uri_params: Mapping[str, Any] = None
1217
    ) -> dict:
1218
        parsed = {}
1✔
1219
        if shape is not None:
1✔
1220
            event_name = shape.event_stream_name
1✔
1221
            if event_name:
1✔
1222
                parsed = self._handle_event_stream(request, shape, event_name)
×
1223
            else:
1224
                self._parse_payload(request, shape, parsed, uri_params)
1✔
1225
        return parsed
1✔
1226

1227
    def _handle_event_stream(self, request: Request, shape: Shape, event_name: str):
1✔
1228
        # TODO handle event streams
1229
        raise NotImplementedError
1230

1231
    def _parse_payload(
1✔
1232
        self,
1233
        request: Request,
1234
        shape: Shape,
1235
        final_parsed: dict,
1236
        uri_params: Mapping[str, Any] = None,
1237
    ) -> None:
1238
        original_parsed = self._initial_body_parse(request)
1✔
1239
        body_parsed = self._parse_shape(request, shape, original_parsed, uri_params)
1✔
1240
        final_parsed.update(body_parsed)
1✔
1241

1242
    def _initial_body_parse(self, request: Request) -> Any:
1✔
1243
        body_contents = request.data
1✔
1244
        if body_contents == b"":
1✔
1245
            return body_contents
×
1246
        body_contents_stream = self.get_peekable_stream_from_bytes(body_contents)
1✔
1247
        return self.parse_data_item(body_contents_stream)
1✔
1248

1249
    def _parse_timestamp(
1✔
1250
        self, request: Request, shape: Shape, node: str, uri_params: Mapping[str, Any] = None
1251
    ) -> datetime.datetime:
1252
        # TODO: remove once CBOR support has been removed from `JSONRequestParser`
1253
        return super()._parse_timestamp(request, shape, node, uri_params)
1✔
1254

1255

1256
class BaseRpcV2RequestParser(RequestParser):
1✔
1257
    """
1258
    The ``BaseRpcV2RequestParser`` is the base class for all RPC V2-based AWS service protocols.
1259
    This base class handles the routing of the request, which is specific based on the path.
1260
    The body decoding is done in the respective subclasses.
1261
    """
1262

1263
    @_handle_exceptions
1✔
1264
    def parse(self, request: Request) -> tuple[OperationModel, Any]:
1✔
1265
        # see https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html
1266
        if request.method != "POST":
1✔
1267
            raise ProtocolParserError("RPC v2 only accepts POST requests.")
×
1268

1269
        headers = request.headers
1✔
1270
        if "X-Amz-Target" in headers or "X-Amzn-Target" in headers:
1✔
1271
            raise ProtocolParserError(
×
1272
                "RPC v2 does not accept 'X-Amz-Target' or 'X-Amzn-Target'. "
1273
                "Such requests are rejected for security reasons."
1274
            )
1275
        # The Smithy RPCv2 CBOR protocol will only use the last four segments of the URL when routing requests.
1276
        rpc_v2_params = request.path.lstrip("/").split("/")
1✔
1277
        if len(rpc_v2_params) < 4 or not (
1✔
1278
            operation := self.service.operation_model(rpc_v2_params[-1])
1279
        ):
UNCOV
1280
            raise OperationNotFoundParserError(
×
1281
                f"Unable to find operation for request to service "
1282
                f"{self.service.service_name}: {request.method} {request.path}"
1283
            )
1284

1285
        # there are no URI params in RPC v2
1286
        uri_params = {}
1✔
1287
        shape: StructureShape = operation.input_shape
1✔
1288
        final_parsed = self._do_parse(request, shape, uri_params)
1✔
1289
        return operation, final_parsed
1✔
1290

1291
    @_handle_exceptions
1✔
1292
    def _do_parse(
1✔
1293
        self, request: Request, shape: Shape, uri_params: Mapping[str, Any] = None
1294
    ) -> dict[str, Any]:
1295
        parsed = {}
1✔
1296
        if shape is not None:
1✔
1297
            event_stream_name = shape.event_stream_name
1✔
1298
            if event_stream_name:
1✔
UNCOV
1299
                parsed = self._handle_event_stream(request, shape, event_stream_name)
×
1300
            else:
1301
                parsed = {}
1✔
1302
                self._parse_payload(request, shape, parsed, uri_params)
1✔
1303

1304
        return parsed
1✔
1305

1306
    def _handle_event_stream(self, request: Request, shape: Shape, event_name: str):
1✔
1307
        # TODO handle event streams
1308
        raise NotImplementedError
1309

1310
    def _parse_structure(
1✔
1311
        self,
1312
        request: Request,
1313
        shape: StructureShape,
1314
        node: dict | None,
1315
        uri_params: Mapping[str, Any] = None,
1316
    ):
1317
        if shape.is_document_type:
1✔
UNCOV
1318
            final_parsed = node
×
1319
        else:
1320
            if node is None:
1✔
1321
                # If the comes across the wire as "null" (None in python),
1322
                # we should be returning this unchanged, instead of as an
1323
                # empty dict.
UNCOV
1324
                return None
×
1325
            final_parsed = {}
1✔
1326
            members = shape.members
1✔
1327
            if shape.is_tagged_union:
1✔
1328
                cleaned_value = node.copy()
1✔
1329
                cleaned_value.pop("__type", None)
1✔
1330
                cleaned_value = {k: v for k, v in cleaned_value.items() if v is not None}
1✔
1331
                if len(cleaned_value) != 1:
1✔
UNCOV
1332
                    raise ProtocolParserError(
×
1333
                        f"Invalid service response: {shape.name} must have one and only one member set."
1334
                    )
1335

1336
            for member_name, member_shape in members.items():
1✔
1337
                member_value = node.get(member_name)
1✔
1338
                if member_value is not None:
1✔
1339
                    final_parsed[member_name] = self._parse_shape(
1✔
1340
                        request, member_shape, member_value, uri_params
1341
                    )
1342

1343
        return final_parsed
1✔
1344

1345
    def _parse_payload(
1✔
1346
        self,
1347
        request: Request,
1348
        shape: Shape,
1349
        final_parsed: dict,
1350
        uri_params: Mapping[str, Any] = None,
1351
    ) -> None:
1352
        original_parsed = self._initial_body_parse(request)
1✔
1353
        body_parsed = self._parse_shape(request, shape, original_parsed, uri_params)
1✔
1354
        final_parsed.update(body_parsed)
1✔
1355

1356
    def _initial_body_parse(self, request: Request):
1✔
1357
        # This method should do the initial parsing of the
1358
        # body.  We still need to walk the parsed body in order
1359
        # to convert types, but this method will do the first round
1360
        # of parsing.
1361
        raise NotImplementedError("_initial_body_parse")
1362

1363

1364
class RpcV2CBORRequestParser(BaseRpcV2RequestParser, BaseCBORRequestParser):
1✔
1365
    """
1366
    The ``RpcV2CBORRequestParser`` is responsible for parsing incoming requests for services which use the
1367
    ``rpc-v2-cbor`` protocol. The requests for these services encode all of their parameters as CBOR in the
1368
    request body.
1369
    """
1370

1371
    # TODO: investigate datetime format for RpcV2CBOR protocol, which might be different than Kinesis CBOR
1372
    def _initial_body_parse(self, request: Request):
1✔
1373
        body_contents = request.data
1✔
1374
        if body_contents == b"":
1✔
UNCOV
1375
            return body_contents
×
1376
        body_contents_stream = self.get_peekable_stream_from_bytes(body_contents)
1✔
1377
        return self.parse_data_item(body_contents_stream)
1✔
1378

1379

1380
class EC2RequestParser(QueryRequestParser):
1✔
1381
    """
1382
    The ``EC2RequestParser`` is responsible for parsing incoming requests for services which use the ``ec2``
1383
    protocol (which only is EC2). Protocol is quite similar to the ``query`` protocol with some small differences.
1384
    """
1385

1386
    def _get_serialized_name(self, shape: Shape, default_name: str, node: dict) -> str:
1✔
1387
        # Returns the serialized name for the shape if it exists.
1388
        # Otherwise it will return the passed in default_name.
1389
        if "queryName" in shape.serialization:
1✔
UNCOV
1390
            return shape.serialization["queryName"]
×
1391
        elif "name" in shape.serialization:
1✔
1392
            # A locationName is always capitalized on input for the ec2 protocol.
1393
            name = shape.serialization["name"]
1✔
1394
            return name[0].upper() + name[1:]
1✔
1395
        else:
1396
            return default_name
1✔
1397

1398
    def _get_list_key_prefix(self, shape: ListShape, node: dict):
1✔
1399
        # The EC2 protocol does not use a prefix notation for flattened lists
1400
        return ""
1✔
1401

1402

1403
class S3RequestParser(RestXMLRequestParser):
1✔
1404
    class VirtualHostRewriter:
1✔
1405
        """
1406
        Context Manager which rewrites the request object parameters such that - within the context - it looks like a
1407
        normal S3 request.
1408
        FIXME: this is not optimal because it mutates the Request object. Once we have better utility to create/copy
1409
        a request instead of EnvironBuilder, we should copy it before parsing (except the stream).
1410
        """
1411

1412
        def __init__(self, request: Request):
1✔
1413
            self.request = request
1✔
1414
            self.old_host = None
1✔
1415
            self.old_path = None
1✔
1416

1417
        def __enter__(self):
1✔
1418
            # only modify the request if it uses the virtual host addressing
1419
            if bucket_name := self._is_vhost_address_get_bucket(self.request):
1✔
1420
                # save the original path and host for restoring on context exit
1421
                self.old_path = self.request.path
1✔
1422
                self.old_host = self.request.host
1✔
1423
                self.old_raw_uri = self.request.environ.get("RAW_URI")
1✔
1424

1425
                # remove the bucket name from the host part of the request
1426
                new_host = self.old_host.removeprefix(f"{bucket_name}.")
1✔
1427

1428
                # put the bucket name at the front
1429
                new_path = "/" + bucket_name + self.old_path or "/"
1✔
1430

1431
                # create a new RAW_URI for the WSGI environment, this is necessary because of our `get_raw_path` utility
1432
                if self.old_raw_uri:
1✔
1433
                    new_raw_uri = "/" + bucket_name + self.old_raw_uri or "/"
1✔
1434
                    if qs := self.request.query_string:
1✔
1435
                        new_raw_uri += "?" + qs.decode("utf-8")
1✔
1436
                else:
1437
                    new_raw_uri = None
1✔
1438

1439
                # set the new path and host
1440
                self._set_request_props(self.request, new_path, new_host, new_raw_uri)
1✔
1441
            return self.request
1✔
1442

1443
        def __exit__(self, exc_type, exc_value, exc_traceback):
1✔
1444
            # reset the original request properties on exit of the context
1445
            if self.old_host or self.old_path:
1✔
1446
                self._set_request_props(
1✔
1447
                    self.request, self.old_path, self.old_host, self.old_raw_uri
1448
                )
1449

1450
        @staticmethod
1✔
1451
        def _set_request_props(request: Request, path: str, host: str, raw_uri: str | None = None):
1✔
1452
            """Sets the HTTP request's path and host and clears the cache in the request object."""
1453
            request.path = path
1✔
1454
            request.headers["Host"] = host
1✔
1455
            if raw_uri:
1✔
1456
                request.environ["RAW_URI"] = raw_uri
1✔
1457

1458
            try:
1✔
1459
                # delete the werkzeug request property cache that depends on path, but make sure all of them are
1460
                # initialized first, otherwise `del` will raise a key error
1461
                request.host = None  # noqa
1✔
1462
                request.url = None  # noqa
1✔
1463
                request.base_url = None  # noqa
1✔
1464
                request.full_path = None  # noqa
1✔
1465
                request.host_url = None  # noqa
1✔
1466
                request.root_url = None  # noqa
1✔
1467
                del request.host  # noqa
1✔
1468
                del request.url  # noqa
1✔
1469
                del request.base_url  # noqa
1✔
1470
                del request.full_path  # noqa
1✔
1471
                del request.host_url  # noqa
1✔
1472
                del request.root_url  # noqa
1✔
UNCOV
1473
            except AttributeError:
×
UNCOV
1474
                pass
×
1475

1476
        @staticmethod
1✔
1477
        def _is_vhost_address_get_bucket(request: Request) -> str | None:
1✔
1478
            from localstack.services.s3.utils import uses_host_addressing
1✔
1479

1480
            return uses_host_addressing(request.headers)
1✔
1481

1482
    @_handle_exceptions
1✔
1483
    def parse(self, request: Request) -> tuple[OperationModel, Any]:
1✔
1484
        """Handle virtual-host-addressing for S3."""
1485
        with self.VirtualHostRewriter(request):
1✔
1486
            return super().parse(request)
1✔
1487

1488
    def _parse_shape(
1✔
1489
        self, request: Request, shape: Shape, node: Any, uri_params: Mapping[str, Any] = None
1490
    ) -> Any:
1491
        """
1492
        Special handling of parsing the shape for s3 object-names (=key):
1493
        Trailing '/' are valid and need to be preserved, however, the url-matcher removes it from the key.
1494
        We need special logic to compare the parsed Key parameter against the path and add back the missing slashes
1495
        """
1496
        if (
1✔
1497
            shape is not None
1498
            and uri_params is not None
1499
            and shape.serialization.get("location") == "uri"
1500
            and shape.serialization.get("name") == "Key"
1501
            and (
1502
                (trailing_slashes := request.path.rpartition(uri_params["Key"])[2])
1503
                and all(char == "/" for char in trailing_slashes)
1504
            )
1505
        ):
1506
            uri_params = dict(uri_params)
1✔
1507
            uri_params["Key"] = uri_params["Key"] + trailing_slashes
1✔
1508
        return super()._parse_shape(request, shape, node, uri_params)
1✔
1509

1510
    @_text_content
1✔
1511
    def _parse_integer(self, _, shape, node: str, ___) -> int | None:
1✔
1512
        # S3 accepts empty query string parameters that should be integer
1513
        # to not break other cases, validate that the shape is in the querystring
1514
        if node == "" and shape.serialization.get("location") == "querystring":
1✔
1515
            return None
1✔
1516
        return int(node)
1✔
1517

1518

1519
class SQSQueryRequestParser(QueryRequestParser):
1✔
1520
    def _get_serialized_name(self, shape: Shape, default_name: str, node: dict) -> str:
1✔
1521
        """
1522
        SQS allows using both - the proper serialized name of a map as well as the member name - as name for maps.
1523
        For example, both works for the TagQueue operation:
1524
        - Using the proper serialized name "Tag": Tag.1.Key=key&Tag.1.Value=value
1525
        - Using the member name "Tag" in the parent structure: Tags.1.Key=key&Tags.1.Value=value
1526
        - Using "Name" to represent the Key for a nested dict: MessageAttributes.1.Name=key&MessageAttributes.1.Value.StringValue=value
1527
            resulting in {MessageAttributes: {key : {StringValue: value}}}
1528
        The Java SDK implements the second variant: https://github.com/aws/aws-sdk-java-v2/issues/2524
1529
        This has been approved to be a bug and against the spec, but since the client has a lot of users, and AWS SQS
1530
        supports both, we need to handle it here.
1531
        """
1532
        # ask the super implementation for the proper serialized name
1533
        primary_name = super()._get_serialized_name(shape, default_name, node)
1✔
1534

1535
        # determine potential suffixes for the name of the member in the node
1536
        suffixes = []
1✔
1537
        if shape.type_name == "map":
1✔
1538
            if not shape.serialization.get("flattened"):
1✔
UNCOV
1539
                suffixes = [".entry.1.Key", ".entry.1.Name"]
×
1540
            else:
1541
                suffixes = [".1.Key", ".1.Name"]
1✔
1542
        if shape.type_name == "list":
1✔
1543
            if not shape.serialization.get("flattened"):
1✔
UNCOV
1544
                suffixes = [".member.1"]
×
1545
            else:
1546
                suffixes = [".1"]
1✔
1547

1548
        # if the primary name is _not_ available in the node, but the default name is, we use the default name
1549
        if not any(f"{primary_name}{suffix}" in node for suffix in suffixes) and any(
1✔
1550
            f"{default_name}{suffix}" in node for suffix in suffixes
1551
        ):
1552
            return default_name
1✔
1553
        # otherwise we use the primary name
1554
        return primary_name
1✔
1555

1556

1557
@functools.cache
1✔
1558
def create_parser(service: ServiceModel, protocol: ProtocolName | None = None) -> RequestParser:
1✔
1559
    """
1560
    Creates the right parser for the given service model.
1561

1562
    :param service: to create the parser for
1563
    :param protocol: the protocol for the parser. If not provided, fallback to the service's default protocol
1564
    :return: RequestParser which can handle the protocol of the service
1565
    """
1566
    # Unfortunately, some services show subtle differences in their parsing or operation detection behavior, even though
1567
    # their specification states they implement the same protocol.
1568
    # In order to avoid bundling the whole complexity in the specific protocols, or even have service-distinctions
1569
    # within the parser implementations, the service-specific parser implementations (basically the implicit /
1570
    # informally more specific protocol implementation) has precedence over the more general protocol-specific parsers.
1571
    service_specific_parsers = {
1✔
1572
        "s3": {"rest-xml": S3RequestParser},
1573
        "sqs": {"query": SQSQueryRequestParser},
1574
    }
1575
    protocol_specific_parsers = {
1✔
1576
        "query": QueryRequestParser,
1577
        "json": JSONRequestParser,
1578
        "rest-json": RestJSONRequestParser,
1579
        "rest-xml": RestXMLRequestParser,
1580
        "ec2": EC2RequestParser,
1581
        "smithy-rpc-v2-cbor": RpcV2CBORRequestParser,
1582
        # TODO: implement multi-protocol support for Kinesis, so that it can uses the `cbor` protocol and remove
1583
        #  CBOR handling from JSONRequestParser
1584
        # this is not an "official" protocol defined from the spec, but is derived from ``json``
1585
    }
1586

1587
    service_protocol = protocol or service.protocol
1✔
1588

1589
    # Try to select a service- and protocol-specific parser implementation
1590
    if (
1✔
1591
        service.service_name in service_specific_parsers
1592
        and service_protocol in service_specific_parsers[service.service_name]
1593
    ):
1594
        return service_specific_parsers[service.service_name][service_protocol](service)
1✔
1595
    else:
1596
        # Otherwise, pick the protocol-specific parser for the protocol of the service
1597
        return protocol_specific_parsers[service_protocol](service)
1✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc