• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

Ouranosinc / xclim / 14888847601

07 May 2025 04:51PM UTC coverage: 92.367% (+0.1%) from 92.268%
14888847601

push

github

web-flow
Removed deprecated functions, deprecate `testing.open_dataset` (#2139)

### What kind of change does this PR introduce?

* Removes the deprecated `sfcwind_2_uas_vas` and `uas_vas_2_sfcwind`
converter functions.
* Adds a deprecation notice to `xclim.testing.open_dataset` to be
removed in a future version
* Changes the signature of `xclim.testing.helpers.generate_atmos` to
only accept a `nimbus` object.
* Fixes a bug that was causing cache directories to store generated
files in a directory that duplicated the version string (e.g.
`XDG_CACHE_DIR/xclim-testdata/v2077.1.1/v2077.1.1`).

### Does this PR introduce a breaking change?

Yes. Functions that were previously deprecated have been removed.
Suggestions have been made to use the `nimbus` class to manage fetching
testing data.

When opening either an absolute path location on disk or an OPeNDAP
link, developers are expected to use the typical
`xarray.open_{mf}dataset()` function. Paths provided by `nimbus.fetch()`
will always be absolute paths.

### Other information

This change moves most of the xclim-testdata fetching mechanism to rely
on `pooch.Pooch` as that object is much better at handling caching and
versioning.

It is very important to never use `nimbus.fetch()` on a `pathlib.Path`
object. The `registry` of a `Pooch` object is synonymous with a
dictionary, and some issues that needed to be dealt with stemmed from
that misusage.

An example of what this does for Windows:
```python

# Actual file is "NRCANdaily/nrcan_canada_daily_tasmax_1990.nc" in registry

file = Path("NRCANdaily", "nrcan_canada_daily_tasmax_1990.nc")
xr.open_dataset(nimbus.fetch(file))

>>> ValueError: File 'NRCANdaily\nrcan_canada_daily_tasmax_1990.nc' is not in the registry.
```

### Tiniest caveat

In order to support `no external sockets` mode, I needed to slightly the
internal `_fetch` method of `nimbus`:

```python
try:
    return _nimbus.fetch_diversion(*args, **kwargs)
except SocketBlockedError as err:
    raise File... (continued)

11 of 14 new or added lines in 2 files covered. (78.57%)

3 existing lines in 1 file now uncovered.

7539 of 8162 relevant lines covered (92.37%)

8.09 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

79.84
/src/xclim/testing/utils.py
1
"""
2
Testing and Tutorial Utilities' Module
3
======================================
4
"""
5

6
from __future__ import annotations
9✔
7

8
import importlib.resources as ilr
9✔
9
import logging
9✔
10
import os
9✔
11
import platform
9✔
12
import re
9✔
13
import sys
9✔
14
import time
9✔
15
import warnings
9✔
16
from collections.abc import Callable, Sequence
9✔
17
from datetime import datetime as dt
9✔
18
from functools import wraps
9✔
19
from importlib import import_module
9✔
20
from io import StringIO
9✔
21
from pathlib import Path
9✔
22
from shutil import copytree
9✔
23
from typing import IO, Any, TextIO
9✔
24
from urllib.error import HTTPError, URLError
9✔
25
from urllib.parse import urljoin, urlparse
9✔
26
from urllib.request import urlretrieve
9✔
27

28
from filelock import FileLock
9✔
29
from packaging.version import Version
9✔
30
from xarray import Dataset
9✔
31
from xarray import open_dataset as _open_dataset
9✔
32

33
import xclim
9✔
34
from xclim import __version__ as __xclim_version__
9✔
35

36
try:
9✔
37
    import pytest
9✔
38
    from pytest_socket import SocketBlockedError
9✔
39
except ImportError:
×
40
    pytest = None
×
41
    SocketBlockedError = None
×
42

43
try:
9✔
44
    import pooch
9✔
45
except ImportError:
×
46
    warnings.warn("The `pooch` library is not installed. The default cache directory for testing data will not be set.")
×
47
    pooch = None
×
48

49

50
logger = logging.getLogger("xclim")
9✔
51

52

53
__all__ = [
9✔
54
    "TESTDATA_BRANCH",
55
    "TESTDATA_CACHE_DIR",
56
    "TESTDATA_REPO_URL",
57
    "audit_url",
58
    "default_testdata_cache",
59
    "default_testdata_repo_url",
60
    "default_testdata_version",
61
    "gather_testing_data",
62
    "list_input_variables",
63
    "nimbus",
64
    "open_dataset",
65
    "populate_testing_data",
66
    "publish_release_notes",
67
    "run_doctests",
68
    "show_versions",
69
    "testing_setup_warnings",
70
]
71

72
default_testdata_version = "v2025.4.29"
9✔
73
"""Default version of the testing data to use when fetching datasets."""
9✔
74

75
default_testdata_repo_url = "https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/"
9✔
76
"""Default URL of the testing data repository to use when fetching datasets."""
9✔
77

78
try:
9✔
79
    default_testdata_cache = Path(pooch.os_cache("xclim-testdata"))
9✔
80
    """Default location for the testing data cache."""
9✔
81
except AttributeError:
×
82
    default_testdata_cache = None
×
83

84
TESTDATA_REPO_URL = str(os.getenv("XCLIM_TESTDATA_REPO_URL", default_testdata_repo_url))
9✔
85
"""
9✔
86
Sets the URL of the testing data repository to use when fetching datasets.
87

88
Notes
89
-----
90
When running tests locally, this can be set for both `pytest` and `tox` by exporting the variable:
91

92
.. code-block:: console
93

94
    $ export XCLIM_TESTDATA_REPO_URL="https://github.com/my_username/xclim-testdata"
95

96
or setting the variable at runtime:
97

98
.. code-block:: console
99

100
    $ env XCLIM_TESTDATA_REPO_URL="https://github.com/my_username/xclim-testdata" pytest
101
"""
102

103
TESTDATA_BRANCH = str(os.getenv("XCLIM_TESTDATA_BRANCH", default_testdata_version))
9✔
104
"""
9✔
105
Sets the branch of the testing data repository to use when fetching datasets.
106

107
Notes
108
-----
109
When running tests locally, this can be set for both `pytest` and `tox` by exporting the variable:
110

111
.. code-block:: console
112

113
    $ export XCLIM_TESTDATA_BRANCH="my_testing_branch"
114

115
or setting the variable at runtime:
116

117
.. code-block:: console
118

119
    $ env XCLIM_TESTDATA_BRANCH="my_testing_branch" pytest
120
"""
121

122
TESTDATA_CACHE_DIR = os.getenv("XCLIM_TESTDATA_CACHE_DIR", default_testdata_cache)
9✔
123
"""
9✔
124
Sets the directory to store the testing datasets.
125

126
If not set, the default location will be used (based on ``platformdirs``, see :func:`pooch.os_cache`).
127

128
Notes
129
-----
130
When running tests locally, this can be set for both `pytest` and `tox` by exporting the variable:
131

132
.. code-block:: console
133

134
    $ export XCLIM_TESTDATA_CACHE_DIR="/path/to/my/data"
135

136
or setting the variable at runtime:
137

138
.. code-block:: console
139

140
    $ env XCLIM_TESTDATA_CACHE_DIR="/path/to/my/data" pytest
141
"""
142

143

144
def list_input_variables(submodules: Sequence[str] | None = None, realms: Sequence[str] | None = None) -> dict:
9✔
145
    """
146
    List all possible variables names used in xclim's indicators.
147

148
    Made for development purposes. Parses all indicator parameters with the
149
    :py:attr:`xclim.core.utils.InputKind.VARIABLE` or `OPTIONAL_VARIABLE` kinds.
150

151
    Parameters
152
    ----------
153
    submodules : str, optional
154
        Restrict the output to indicators of a list of submodules only. Default None, which parses all indicators.
155
    realms : Sequence of str, optional
156
        Restrict the output to indicators of a list of realms only. Default None, which parses all indicators.
157

158
    Returns
159
    -------
160
    dict
161
        A mapping from variable name to indicator class.
162
    """
163
    from collections import defaultdict  # pylint: disable=import-outside-toplevel
9✔
164

165
    from xclim import indicators  # pylint: disable=import-outside-toplevel
9✔
166
    from xclim.core.indicator import registry  # pylint: disable=import-outside-toplevel
9✔
167
    from xclim.core.utils import InputKind  # pylint: disable=import-outside-toplevel
9✔
168

169
    submodules = submodules or [sub for sub in dir(indicators) if not sub.startswith("__")]
9✔
170
    realms = realms or ["atmos", "ocean", "land", "seaIce"]
9✔
171

172
    variables = defaultdict(list)
9✔
173
    for name, ind in registry.items():
9✔
174
        if "." in name:
9✔
175
            # external submodule, submodule name is prepended to registry key
176
            if name.split(".")[0] not in submodules:
9✔
177
                continue
9✔
178
        elif ind.realm not in submodules:
9✔
179
            # official indicator : realm == submodule
180
            continue
×
181
        if ind.realm not in realms:
9✔
182
            continue
9✔
183

184
        # ok we want this one.
185
        for varname, meta in ind._all_parameters.items():
9✔
186
            if meta.kind in [
9✔
187
                InputKind.VARIABLE,
188
                InputKind.OPTIONAL_VARIABLE,
189
            ]:
190
                var = meta.default or varname
9✔
191
                variables[var].append(ind)
9✔
192

193
    return variables
9✔
194

195

196
# Publishing Tools ###
197

198

199
def publish_release_notes(
9✔
200
    style: str = "md",
201
    file: os.PathLike[str] | StringIO | TextIO | None = None,
202
    changes: str | os.PathLike[str] | None = None,
203
) -> str | None:
204
    """
205
    Format release notes in Markdown or ReStructuredText.
206

207
    Parameters
208
    ----------
209
    style : {"rst", "md"}
210
        Use ReStructuredText formatting or Markdown. Default: Markdown.
211
    file : {os.PathLike, StringIO, TextIO}, optional
212
        If provided, prints to the given file-like object. Otherwise, returns a string.
213
    changes : str or os.PathLike[str], optional
214
        If provided, manually points to the file where the changelog can be found.
215
        Assumes a relative path otherwise.
216

217
    Returns
218
    -------
219
    str, optional
220
        If `file` not provided, the formatted release notes.
221

222
    Notes
223
    -----
224
    This function is used solely for development and packaging purposes.
225
    """
226
    if isinstance(changes, str | Path):
9✔
227
        changes_file = Path(changes).absolute()
9✔
228
    else:
229
        changes_file = Path(__file__).absolute().parents[3].joinpath("CHANGELOG.rst")
×
230

231
    if not changes_file.exists():
9✔
232
        raise FileNotFoundError("Changelog file not found in xclim folder tree.")
9✔
233

234
    with open(changes_file, encoding="utf-8") as hf:
9✔
235
        changes = hf.read()
9✔
236

237
    if style == "rst":
9✔
238
        hyperlink_replacements = {
9✔
239
            r":issue:`([0-9]+)`": r"`GH/\1 <https://github.com/Ouranosinc/xclim/issues/\1>`_",
240
            r":pull:`([0-9]+)`": r"`PR/\1 <https://github.com/Ouranosinc/xclim/pull/\>`_",
241
            r":user:`([a-zA-Z0-9_.-]+)`": r"`@\1 <https://github.com/\1>`_",
242
        }
243
    elif style == "md":
9✔
244
        hyperlink_replacements = {
9✔
245
            r":issue:`([0-9]+)`": r"[GH/\1](https://github.com/Ouranosinc/xclim/issues/\1)",
246
            r":pull:`([0-9]+)`": r"[PR/\1](https://github.com/Ouranosinc/xclim/pull/\1)",
247
            r":user:`([a-zA-Z0-9_.-]+)`": r"[@\1](https://github.com/\1)",
248
        }
249
    else:
250
        msg = f"Formatting style not supported: {style}"
9✔
251
        raise NotImplementedError(msg)
9✔
252

253
    for search, replacement in hyperlink_replacements.items():
9✔
254
        changes = re.sub(search, replacement, changes)
9✔
255

256
    if style == "md":
9✔
257
        changes = changes.replace("=========\nChangelog\n=========", "# Changelog")
9✔
258

259
        titles = {r"\n(.*?)\n([\-]{1,})": "-", r"\n(.*?)\n([\^]{1,})": "^"}
9✔
260
        for title_expression, level in titles.items():
9✔
261
            found = re.findall(title_expression, changes)
9✔
262
            for grouping in found:
9✔
263
                fixed_grouping = str(grouping[0]).replace("(", r"\(").replace(")", r"\)")
9✔
264
                search = rf"({fixed_grouping})\n([\{level}]{'{' + str(len(grouping[1])) + '}'})"
9✔
265
                replacement = f"{'##' if level == '-' else '###'} {grouping[0]}"
9✔
266
                changes = re.sub(search, replacement, changes)
9✔
267

268
        link_expressions = r"[\`]{1}([\w\s]+)\s<(.+)>`\_"
9✔
269
        found = re.findall(link_expressions, changes)
9✔
270
        for grouping in found:
9✔
271
            search = rf"`{grouping[0]} <.+>`\_"
9✔
272
            replacement = f"[{str(grouping[0]).strip()}]({grouping[1]})"
9✔
273
            changes = re.sub(search, replacement, changes)
9✔
274

275
    if not file:
9✔
276
        return changes
9✔
277
    if isinstance(file, Path | os.PathLike):
9✔
278
        with open(file, "w", encoding="utf-8") as f:
9✔
279
            print(changes, file=f)
9✔
280
    else:
281
        print(changes, file=file)
×
282
    return None
9✔
283

284

285
_xclim_deps = [
9✔
286
    "xclim",
287
    "xarray",
288
    "statsmodels",
289
    "sklearn",
290
    "scipy",
291
    "pint",
292
    "pandas",
293
    "numpy",
294
    "numba",
295
    "lmoments3",
296
    "jsonpickle",
297
    "flox",
298
    "dask",
299
    "cf_xarray",
300
    "cftime",
301
    "clisops",
302
    "click",
303
    "bottleneck",
304
    "boltons",
305
]
306

307

308
def show_versions(
9✔
309
    file: os.PathLike | StringIO | TextIO | None = None,
310
    deps: list[str] | None = None,
311
) -> str | None:
312
    """
313
    Print the versions of xclim and its dependencies.
314

315
    Parameters
316
    ----------
317
    file : {os.PathLike, StringIO, TextIO}, optional
318
        If provided, prints to the given file-like object. Otherwise, returns a string.
319
    deps : list of str, optional
320
        A list of dependencies to gather and print version information from.
321
        Otherwise, prints `xclim` dependencies.
322

323
    Returns
324
    -------
325
    str or None
326
        If `file` not provided, the versions of xclim and its dependencies.
327
    """
328
    dependencies: list[str]
329
    if deps is None:
9✔
330
        dependencies = _xclim_deps
9✔
331
    else:
332
        dependencies = deps
×
333

334
    dependency_versions = [(d, lambda mod: mod.__version__) for d in dependencies]
9✔
335

336
    deps_blob: list[tuple[str, str | None]] = []
9✔
337
    for modname, ver_f in dependency_versions:
9✔
338
        try:
9✔
339
            if modname in sys.modules:
9✔
340
                mod = sys.modules[modname]
9✔
341
            else:
342
                mod = import_module(modname)
9✔
343
        except (KeyError, ModuleNotFoundError):
9✔
344
            deps_blob.append((modname, None))
9✔
345
        else:
346
            try:
9✔
347
                ver = ver_f(mod)
9✔
348
                deps_blob.append((modname, ver))
9✔
349
            except AttributeError:
9✔
350
                deps_blob.append((modname, "installed"))
9✔
351

352
    modules_versions = "\n".join([f"{k}: {stat}" for k, stat in sorted(deps_blob)])
9✔
353

354
    installed_versions = [
9✔
355
        "INSTALLED VERSIONS",
356
        "------------------",
357
        f"python: {platform.python_version()}",
358
        f"{modules_versions}",
359
        f"Anaconda-based environment: {'yes' if Path(sys.base_prefix).joinpath('conda-meta').exists() else 'no'}",
360
    ]
361

362
    message = "\n".join(installed_versions)
9✔
363

364
    if not file:
9✔
365
        return message
9✔
366
    if isinstance(file, Path | os.PathLike):
9✔
367
        with open(file, "w", encoding="utf-8") as f:
9✔
368
            print(message, file=f)
9✔
369
    else:
370
        print(message, file=file)
×
371
    return None
9✔
372

373

374
# Test Data Utilities ###
375

376

377
def run_doctests():
9✔
378
    """Run the doctests for the module."""
379
    if pytest is None:
×
380
        raise ImportError(
×
381
            "The `pytest` package is required to run the doctests. "
382
            "You can install it with `pip install pytest` or `pip install xclim[dev]`."
383
        )
384

385
    cmd = [
×
386
        f"--rootdir={Path(__file__).absolute().parent}",
387
        "--numprocesses=0",
388
        "--xdoctest",
389
        f"{Path(__file__).absolute().parents[1]}",
390
    ]
391

392
    sys.exit(pytest.main(cmd))
×
393

394

395
def testing_setup_warnings():
9✔
396
    """Warn users about potential incompatibilities between xclim and xclim-testdata versions."""
397
    if re.match(r"^\d+\.\d+\.\d+$", __xclim_version__) and TESTDATA_BRANCH != default_testdata_version:
9✔
398
        # This does not need to be emitted on GitHub Workflows and ReadTheDocs
399
        if not os.getenv("CI") and not os.getenv("READTHEDOCS"):
×
400
            warnings.warn(
×
401
                f"`xclim` stable ({__xclim_version__}) is running tests against a non-default "
402
                f"branch of the testing data. It is possible that changes to the testing data may "
403
                f"be incompatible with some assertions in this version. "
404
                f"Please be sure to check {TESTDATA_REPO_URL} for more information.",
405
            )
406

407
    if re.match(r"^v\d+\.\d+\.\d+", TESTDATA_BRANCH):
9✔
408
        # Find the date of last modification of xclim source files to generate a calendar version
409
        install_date = dt.strptime(
9✔
410
            time.ctime(os.path.getmtime(xclim.__file__)),
411
            "%a %b %d %H:%M:%S %Y",
412
        )
413
        install_calendar_version = f"{install_date.year}.{install_date.month}.{install_date.day}"
9✔
414

415
        if Version(TESTDATA_BRANCH) > Version(install_calendar_version):
9✔
416
            warnings.warn(
×
417
                f"The installation date of `xclim` ({install_date.ctime()}) "
418
                f"predates the last release of testing data ({TESTDATA_BRANCH}). "
419
                "It is very likely that the testing data is incompatible with this build of `xclim`.",
420
            )
421

422

423
def load_registry(branch: str = TESTDATA_BRANCH, repo: str = TESTDATA_REPO_URL) -> dict[str, str]:
9✔
424
    """
425
    Load the registry file for the test data.
426

427
    Parameters
428
    ----------
429
    branch : str
430
        Branch of the repository to use when fetching testing datasets.
431
    repo : str
432
        URL of the repository to use when fetching testing datasets.
433

434
    Returns
435
    -------
436
    dict
437
        Dictionary of filenames and hashes.
438
    """
439
    if not repo.endswith("/"):
9✔
440
        repo = f"{repo}/"
×
441
    remote_registry = audit_url(
9✔
442
        urljoin(
443
            urljoin(repo, branch if branch.endswith("/") else f"{branch}/"),
444
            "data/registry.txt",
445
        )
446
    )
447

448
    if repo != default_testdata_repo_url:
9✔
449
        external_repo_name = urlparse(repo).path.split("/")[-2]
×
450
        external_branch_name = branch.split("/")[-1]
×
451
        registry_file = Path(
×
452
            str(ilr.files("xclim").joinpath(f"testing/registry.{external_repo_name}.{external_branch_name}.txt"))
453
        )
454
        urlretrieve(remote_registry, registry_file)  # noqa: S310
×
455

456
    elif branch != default_testdata_version:
9✔
457
        custom_registry_folder = Path(str(ilr.files("xclim").joinpath(f"testing/{branch}")))
×
458
        custom_registry_folder.mkdir(parents=True, exist_ok=True)
×
459
        registry_file = custom_registry_folder.joinpath("registry.txt")
×
460
        urlretrieve(remote_registry, registry_file)  # noqa: S310
×
461

462
    else:
463
        registry_file = Path(str(ilr.files("xclim").joinpath("testing/registry.txt")))
9✔
464

465
    if not registry_file.exists():
9✔
466
        raise FileNotFoundError(f"Registry file not found: {registry_file}")
×
467

468
    # Load the registry file
469
    with registry_file.open(encoding="utf-8") as f:
9✔
470
        registry = {line.split()[0]: line.split()[1] for line in f}
9✔
471
    return registry
9✔
472

473

474
def nimbus(
9✔
475
    repo: str = TESTDATA_REPO_URL,
476
    branch: str = TESTDATA_BRANCH,
477
    cache_dir: str | Path = TESTDATA_CACHE_DIR,
478
    allow_updates: bool = True,
479
):
480
    """
481
    Pooch registry instance for xclim test data.
482

483
    Parameters
484
    ----------
485
    repo : str
486
        URL of the repository to use when fetching testing datasets.
487
    branch : str
488
        Branch of repository to use when fetching testing datasets.
489
    cache_dir : str or Path
490
        The path to the directory where the data files are stored.
491
    allow_updates : bool
492
        If True, allow updates to the data files. Default is True.
493

494
    Returns
495
    -------
496
    pooch.Pooch
497
        The Pooch instance for accessing the xclim testing data.
498

499
    Notes
500
    -----
501
    There are three environment variables that can be used to control the behaviour of this registry:
502
        - ``XCLIM_TESTDATA_CACHE_DIR``: If this environment variable is set, it will be used as the
503
          base directory to store the data files.
504
          The directory should be an absolute path (i.e., it should start with ``/``).
505
          Otherwise, the default location will be used (based on ``platformdirs``, see :py:func:`pooch.os_cache`).
506
        - ``XCLIM_TESTDATA_REPO_URL``: If this environment variable is set, it will be used as the URL of
507
          the repository to use when fetching datasets. Otherwise, the default repository will be used.
508
        - ``XCLIM_TESTDATA_BRANCH``: If this environment variable is set, it will be used as the branch of
509
          the repository to use when fetching datasets. Otherwise, the default branch will be used.
510

511
    Examples
512
    --------
513
    Using the registry to download a file:
514

515
    .. code-block:: python
516

517
        import xarray as xr
518
        from xclim.testing.helpers import nimbus
519

520
        example_file = nimbus().fetch("example.nc")
521
        data = xr.open_dataset(example_file)
522
    """
523
    if pooch is None:
9✔
524
        raise ImportError(
×
525
            "The `pooch` package is required to fetch the xclim testing data. "
526
            "You can install it with `pip install pooch` or `pip install xclim[dev]`."
527
        )
528
    if not repo.endswith("/"):
9✔
529
        repo = f"{repo}/"
×
530
    remote = audit_url(urljoin(urljoin(repo, branch if branch.endswith("/") else f"{branch}/"), "data"))
9✔
531

532
    _nimbus = pooch.create(
9✔
533
        path=cache_dir,
534
        base_url=remote,
535
        version=default_testdata_version,
536
        version_dev=branch,
537
        allow_updates=allow_updates,
538
        registry=load_registry(branch=branch, repo=repo),
539
    )
540

541
    # Add a custom fetch method to the Pooch instance
542
    # Needed to address: https://github.com/readthedocs/readthedocs.org/issues/11763
543
    # Fix inspired by @bjlittle (https://github.com/bjlittle/geovista/pull/1202)
544
    _nimbus.fetch_diversion = _nimbus.fetch
9✔
545

546
    # Overload the fetch method to add user-agent headers
547
    @wraps(_nimbus.fetch_diversion)
9✔
548
    def _fetch(*args, **kwargs: bool | Callable) -> str:  # numpydoc ignore=GL08  # *args: str
9✔
549
        def _downloader(
9✔
550
            url: str,
551
            output_file: str | IO,
552
            poocher: pooch.Pooch,
553
            check_only: bool | None = False,
554
        ) -> None:
555
            """Download the file from the URL and save it to the save_path."""
556
            headers = {"User-Agent": f"xclim ({__xclim_version__})"}
7✔
557
            downloader = pooch.HTTPDownloader(headers=headers)
7✔
558
            return downloader(url, output_file, poocher, check_only=check_only)
7✔
559

560
        # default to our http/s downloader with user-agent headers
561
        kwargs.setdefault("downloader", _downloader)
9✔
562
        try:
9✔
563
            return _nimbus.fetch_diversion(*args, **kwargs)
9✔
NEW
564
        except SocketBlockedError as err:
×
NEW
565
            raise FileNotFoundError(
×
566
                "File was not found in the testing data cache and remote socket connections are disabled. "
567
                "You may need to download the testing data using `xclim prefetch_testing_data`."
568
            ) from err
569

570
    # Replace the fetch method with the custom fetch method
571
    _nimbus.fetch = _fetch
9✔
572

573
    return _nimbus
9✔
574

575

576
def open_dataset(name: str, nimbus_kwargs: dict[str, Path | str | bool] | None = None, **xr_kwargs: Any) -> Dataset:
9✔
577
    r"""
578
    Convenience function to open a dataset from the xclim testing data using the `nimbus` class.
579

580
    This is a thin wrapper around the `nimbus` class to make it easier to open xclim testing datasets.
581

582
    Parameters
583
    ----------
584
    name : str
585
        Name of the file containing the dataset.
586
    nimbus_kwargs : dict
587
        Keyword arguments passed to the nimbus function.
588
    **xr_kwargs : Any
589
        Keyword arguments passed to xarray.open_dataset.
590

591
    Returns
592
    -------
593
    xarray.Dataset
594
        The dataset.
595

596
    See Also
597
    --------
598
    xarray.open_dataset : Open and read a dataset from a file or file-like object.
599
    nimbus : Pooch wrapper for accessing the xclim testing data.
600

601
    Notes
602
    -----
603
    As of `xclim` v0.57.0, this function no longer supports the `dap_url` parameter. For OPeNDAP datasets, use
604
    `xarray.open_dataset` directly using the OPeNDAP URL with an appropriate backend installed (netCDF4, pydap, etc.).
605
    """
606
    if nimbus_kwargs is None:
9✔
NEW
607
        nimbus_kwargs = {}
×
608
    return _open_dataset(nimbus(**nimbus_kwargs).fetch(name), **xr_kwargs)
9✔
609

610

611
def populate_testing_data(
9✔
612
    temp_folder: Path | None = None,
613
    repo: str = TESTDATA_REPO_URL,
614
    branch: str = TESTDATA_BRANCH,
615
    local_cache: Path = TESTDATA_CACHE_DIR,
616
) -> None:
617
    """
618
    Populate the local cache with the testing data.
619

620
    Parameters
621
    ----------
622
    temp_folder : Path, optional
623
        Path to a temporary folder to use as the local cache. If not provided, the default location will be used.
624
    repo : str, optional
625
        URL of the repository to use when fetching testing datasets.
626
    branch : str, optional
627
        Branch of xclim-testdata to use when fetching testing datasets.
628
    local_cache : Path
629
        The path to the local cache. Defaults to the location set by the platformdirs library.
630
        The testing data will be downloaded to this local cache.
631
    """
632
    # Create the Pooch instance
633
    n = nimbus(repo=repo, branch=branch, cache_dir=temp_folder or local_cache)
8✔
634

635
    # Download the files
636
    errored_files = []
8✔
637
    for file in load_registry():
8✔
638
        try:
8✔
639
            n.fetch(file)
8✔
640
        except HTTPError:
×
641
            msg = f"File `{file}` not accessible in remote repository."
×
642
            logging.error(msg)
×
643
            errored_files.append(file)
×
644
        except SocketBlockedError as err:  # noqa
×
645
            msg = (
×
646
                "Unable to access registry file online. Testing suite is being run with `--disable-socket`. "
647
                "If you intend to run tests with this option enabled, please download the file beforehand with the "
648
                "following console command: `$ xclim prefetch_testing_data`."
649
            )
650
            raise SocketBlockedError(msg) from err
×
651
        else:
652
            logging.info("Files were downloaded successfully.")
8✔
653

654
    if errored_files:
8✔
655
        logging.error(
×
656
            "The following files were unable to be downloaded: %s",
657
            errored_files,
658
        )
659

660

661
def gather_testing_data(
9✔
662
    worker_cache_dir: str | os.PathLike[str] | Path,
663
    worker_id: str,
664
    _cache_dir: str | os.PathLike[str] | None = TESTDATA_CACHE_DIR,
665
) -> None:
666
    """
667
    Gather testing data across workers.
668

669
    Parameters
670
    ----------
671
    worker_cache_dir : str or Path
672
        The directory to store the testing data.
673
    worker_id : str
674
        The worker ID.
675
    _cache_dir : str or Path, optional
676
        The directory to store the testing data. Default is None.
677

678
    Raises
679
    ------
680
    ValueError
681
        If the cache directory is not set.
682
    FileNotFoundError
683
        If the testing data is not found.
684
    """
685
    if _cache_dir is None:
9✔
686
        raise ValueError(
×
687
            "The cache directory must be set. "
688
            "Please set the `cache_dir` parameter or the `XCLIM_DATA_DIR` environment variable."
689
        )
690
    cache_dir = Path(_cache_dir)
9✔
691

692
    if worker_id == "master":
9✔
693
        populate_testing_data(branch=TESTDATA_BRANCH)
×
694
    else:
695
        if platform.system() == "Windows":
9✔
696
            if not cache_dir.joinpath(default_testdata_version).exists():
1✔
697
                raise FileNotFoundError(
×
698
                    "Testing data not found and UNIX-style file-locking is not supported on Windows. "
699
                    "Consider running `$ xclim prefetch_testing_data` to download testing data beforehand."
700
                )
701
        else:
702
            cache_dir.mkdir(exist_ok=True, parents=True)
8✔
703
            lockfile = cache_dir.joinpath(".lock")
8✔
704
            test_data_being_written = FileLock(lockfile)
8✔
705
            with test_data_being_written:
8✔
706
                # This flag prevents multiple calls from re-attempting to download testing data in the same pytest run
707
                populate_testing_data(branch=TESTDATA_BRANCH)
8✔
708
                cache_dir.joinpath(".data_written").touch()
8✔
709
            with test_data_being_written.acquire():
8✔
710
                if lockfile.exists():
8✔
711
                    lockfile.unlink()
8✔
712
        copytree(cache_dir.joinpath(default_testdata_version), worker_cache_dir)
9✔
713

714

715
# Testing Utilities ###
716

717

718
def audit_url(url: str, context: str | None = None) -> str:
9✔
719
    """
720
    Check if the URL is well-formed.
721

722
    Parameters
723
    ----------
724
    url : str
725
        The URL to check.
726
    context : str, optional
727
        Additional context to include in the error message. Default is None.
728

729
    Returns
730
    -------
731
    str
732
        The URL if it is well-formed.
733

734
    Raises
735
    ------
736
    URLError
737
        If the URL is not well-formed.
738
    """
739
    msg = ""
9✔
740
    result = urlparse(url)
9✔
741
    if result.scheme == "http":
9✔
742
        msg = f"{context if context else ''} URL is not using secure HTTP: '{url}'".strip()
×
743
    if not all([result.scheme, result.netloc]):
9✔
744
        msg = f"{context if context else ''} URL is not well-formed: '{url}'".strip()
×
745

746
    if msg:
9✔
747
        logger.error(msg)
×
748
        raise URLError(msg)
×
749
    return url
9✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc