• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

IBM / unitxt / 12853539742

19 Jan 2025 12:30PM UTC coverage: 79.475%. Remained the same
12853539742

Pull #1521

github

web-flow
Merge be39365d9 into a0da7a8be
Pull Request #1521: Add documentation for Settings and Constants management

1394 of 1741 branches covered (80.07%)

Branch coverage included in aggregate %.

8778 of 11058 relevant lines covered (79.38%)

0.79 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

95.91
src/unitxt/settings_utils.py
1
"""Library Settings and Constants.
2

3
This module provides a mechanism for managing application-wide configuration and immutable constants. It includes the `Settings` and `Constants` classes, which are implemented as singleton patterns to ensure a single shared instance across the application. Additionally, it defines utility functions to access these objects and configure application behavior.
4

5
### Key Components:
6

7
1. **Settings Class**:
8
   - A singleton class for managing mutable configuration settings.
9
   - Supports type enforcement for settings to ensure correct usage.
10
   - Allows dynamic modification of settings using a context manager for temporary changes.
11
   - Retrieves environment variable overrides for settings, enabling external customization.
12

13
   #### Available Settings:
14
   - `allow_unverified_code` (bool, default: False): Whether to allow unverified code execution.
15
   - `use_only_local_catalogs` (bool, default: False): Restrict operations to local catalogs only.
16
   - `global_loader_limit` (int, default: None): Limit for global data loaders.
17
   - `num_resamples_for_instance_metrics` (int, default: 1000): Number of resamples for instance-level metrics.
18
   - `num_resamples_for_global_metrics` (int, default: 100): Number of resamples for global metrics.
19
   - `max_log_message_size` (int, default: 100000): Maximum size of log messages.
20
   - `catalogs` (default: None): List of catalog configurations.
21
   - `artifactories` (default: None): Artifact storage configurations.
22
   - `default_recipe` (str, default: "dataset_recipe"): Default recipe for dataset operations.
23
   - `default_verbosity` (str, default: "info"): Default verbosity level for logging.
24
   - `use_eager_execution` (bool, default: False): Enable eager execution for tasks.
25
   - `remote_metrics` (list, default: []): List of remote metrics configurations.
26
   - `test_card_disable` (bool, default: False): Disable test cards if set to True.
27
   - `test_metric_disable` (bool, default: False): Disable test metrics if set to True.
28
   - `metrics_master_key_token` (default: None): Master token for metrics.
29
   - `seed` (int, default: 42): Default seed for random operations.
30
   - `skip_artifacts_prepare_and_verify` (bool, default: False): Skip artifact preparation and verification.
31
   - `data_classification_policy` (default: None): Policy for data classification.
32
   - `mock_inference_mode` (bool, default: False): Enable mock inference mode.
33
   - `disable_hf_datasets_cache` (bool, default: True): Disable caching for Hugging Face datasets.
34
   - `loader_cache_size` (int, default: 1): Cache size for data loaders.
35
   - `task_data_as_text` (bool, default: True): Represent task data as text.
36
   - `default_provider` (str, default: "watsonx"): Default service provider.
37
   - `default_format` (default: None): Default format for data processing.
38

39
   #### Usage:
40
   - Access settings using `get_settings()` function.
41
   - Modify settings temporarily using the `context` method:
42
     ```python
43
     settings = get_settings()
44
     with settings.context(default_verbosity="debug"):
45
         # Code within this block uses "debug" verbosity.
46
     ```
47

48
2. **Constants Class**:
49
   - A singleton class for managing immutable constants used across the application.
50
   - Constants cannot be modified once set.
51
   - Provides centralized access to paths, URLs, and other fixed application parameters.
52

53
   #### Available Constants:
54
   - `dataset_file`: Path to the dataset file.
55
   - `metric_file`: Path to the metric file.
56
   - `local_catalog_path`: Path to the local catalog directory.
57
   - `package_dir`: Directory of the installed package.
58
   - `default_catalog_path`: Default catalog directory path.
59
   - `dataset_url`: URL for dataset resources.
60
   - `metric_url`: URL for metric resources.
61
   - `version`: Current version of the application.
62
   - `catalog_hierarchy_sep`: Separator for catalog hierarchy levels.
63
   - `env_local_catalogs_paths_sep`: Separator for local catalog paths in environment variables.
64
   - `non_registered_files`: List of files excluded from registration.
65
   - `codebase_url`: URL of the codebase repository.
66
   - `website_url`: Official website URL.
67
   - `inference_stream`: Name of the inference stream constant.
68
   - `instance_stream`: Name of the instance stream constant.
69
   - `image_tag`: Default image tag for operations.
70
   - `demos_pool_field`: Field name for demos pool.
71

72
   #### Usage:
73
   - Access constants using `get_constants()` function:
74
     ```python
75
     constants = get_constants()
76
     print(constants.dataset_file)
77
     ```
78

79
3. **Helper Functions**:
80
   - `get_settings()`: Returns the singleton `Settings` instance.
81
   - `get_constants()`: Returns the singleton `Constants` instance.
82
"""
83
import importlib.metadata
1✔
84
import importlib.util
1✔
85
import os
1✔
86
from contextlib import contextmanager
1✔
87

88
from .version import version
1✔
89

90

91
def cast_to_type(value, value_type):
1✔
92
    if value_type is bool:
1✔
93
        if value not in ["True", "False", True, False]:
1✔
94
            raise ValueError(
1✔
95
                f"Value must be in ['True', 'False', True, False] got {value}"
96
            )
97
        if value == "True":
1✔
98
            return True
1✔
99
        if value == "False":
1✔
100
            return False
1✔
101
        return value
1✔
102
    if value_type is int:
1✔
103
        return int(value)
1✔
104
    if value_type is float:
1✔
105
        return float(value)
1✔
106

107
    raise ValueError("Unsupported type.")
×
108

109

110
class Settings:
1✔
111
    _instance = None
1✔
112
    _settings = {}
1✔
113
    _types = {}
1✔
114
    _logger = None
1✔
115

116
    @classmethod
1✔
117
    def is_uninitilized(cls):
1✔
118
        return cls._instance is None
1✔
119

120
    def __new__(cls):
1✔
121
        if cls.is_uninitilized():
1✔
122
            cls._instance = super().__new__(cls)
1✔
123
        return cls._instance
1✔
124

125
    def __setattr__(self, key, value):
1✔
126
        if key.endswith("_key") or key in {"_instance", "_settings"}:
1✔
127
            raise AttributeError(f"Modifying '{key}' is not allowed.")
1✔
128

129
        if isinstance(value, tuple) and len(value) == 2:
1✔
130
            value_type, value = value
1✔
131
            if value_type not in [int, float, bool]:
1✔
132
                raise ValueError(
1✔
133
                    f"Setting settings with tuple requires the first element to be either [int, float, bool], got {value_type}"
134
                )
135
            self._types[key] = value_type
1✔
136

137
        if key in self._types and value is not None:
1✔
138
            value_type = self._types[key]
1✔
139
            value = cast_to_type(value, value_type)
1✔
140

141
        if key in self._settings:
1✔
142
            if self._logger is not None:
1✔
143
                self._logger.info(
×
144
                    f"unitxt.settings.{key} changed: {self._settings[key]} -> {value}"
145
                )
146
        self._settings[key] = value
1✔
147

148
    def __getattr__(self, key):
1✔
149
        if key.endswith("_key"):
1✔
150
            actual_key = key[:-4]  # Remove the "_key" suffix
1✔
151
            return self.environment_variable_key_name(actual_key)
1✔
152

153
        key_name = self.environment_variable_key_name(key)
1✔
154
        env_value = os.getenv(key_name)
1✔
155

156
        if env_value is not None:
1✔
157
            if key in self._types:
1✔
158
                env_value = cast_to_type(env_value, self._types[key])
1✔
159
            return env_value
1✔
160

161
        if key in self._settings:
1✔
162
            return self._settings[key]
1✔
163

164
        raise AttributeError(f"'{key}' not found")
1✔
165

166
    def environment_variable_key_name(self, key):
1✔
167
        return "UNITXT_" + key.upper()
1✔
168

169
    def get_all_environment_variables(self):
1✔
170
        return [
×
171
            self.environment_variable_key_name(key) for key in self._settings.keys()
172
        ]
173

174
    @contextmanager
1✔
175
    def context(self, **kwargs):
1✔
176
        old_values = {key: self._settings.get(key, None) for key in kwargs}
1✔
177
        try:
1✔
178
            for key, value in kwargs.items():
1✔
179
                self.__setattr__(key, value)
1✔
180
            yield
1✔
181
        finally:
182
            for key, value in old_values.items():
1✔
183
                self.__setattr__(key, value)
1✔
184

185

186
class Constants:
1✔
187
    _instance = None
1✔
188
    _constants = {}
1✔
189

190
    @classmethod
1✔
191
    def is_uninitilized(cls):
1✔
192
        return cls._instance is None
1✔
193

194
    def __new__(cls):
1✔
195
        if cls.is_uninitilized():
1✔
196
            cls._instance = super().__new__(cls)
1✔
197
        return cls._instance
1✔
198

199
    def __setattr__(self, key, value):
1✔
200
        if key.endswith("_key") or key in {"_instance", "_constants"}:
1✔
201
            raise AttributeError(f"Modifying '{key}' is not allowed.")
×
202
        if key in self._constants:
1✔
203
            raise ValueError("Cannot override constants.")
×
204
        self._constants[key] = value
1✔
205

206
    def __getattr__(self, key):
1✔
207
        if key in self._constants:
1✔
208
            return self._constants[key]
1✔
209

210
        raise AttributeError(f"'{key}' not found")
×
211

212

213
if Settings.is_uninitilized():
1✔
214
    settings = Settings()
1✔
215
    settings.allow_unverified_code = (bool, False)
1✔
216
    settings.use_only_local_catalogs = (bool, False)
1✔
217
    settings.global_loader_limit = (int, None)
1✔
218
    settings.num_resamples_for_instance_metrics = (int, 1000)
1✔
219
    settings.num_resamples_for_global_metrics = (int, 100)
1✔
220
    settings.max_log_message_size = (int, 100000)
1✔
221
    settings.catalogs = None
1✔
222
    settings.artifactories = None
1✔
223
    settings.default_recipe = "dataset_recipe"
1✔
224
    settings.default_verbosity = "info"
1✔
225
    settings.use_eager_execution = False
1✔
226
    settings.remote_metrics = []
1✔
227
    settings.test_card_disable = (bool, False)
1✔
228
    settings.test_metric_disable = (bool, False)
1✔
229
    settings.metrics_master_key_token = None
1✔
230
    settings.seed = (int, 42)
1✔
231
    settings.skip_artifacts_prepare_and_verify = (bool, False)
1✔
232
    settings.data_classification_policy = None
1✔
233
    settings.mock_inference_mode = (bool, False)
1✔
234
    settings.disable_hf_datasets_cache = (bool, True)
1✔
235
    settings.loader_cache_size = (int, 1)
1✔
236
    settings.task_data_as_text = (bool, True)
1✔
237
    settings.default_provider = "watsonx"
1✔
238
    settings.default_format = None
1✔
239

240
if Constants.is_uninitilized():
1✔
241
    constants = Constants()
1✔
242
    constants.dataset_file = os.path.join(os.path.dirname(__file__), "dataset.py")
1✔
243
    constants.metric_file = os.path.join(os.path.dirname(__file__), "metric.py")
1✔
244
    constants.local_catalog_path = os.path.join(os.path.dirname(__file__), "catalog")
1✔
245
    unitxt_pkg = importlib.util.find_spec("unitxt")
1✔
246
    if unitxt_pkg and unitxt_pkg.origin:
1✔
247
        constants.package_dir = os.path.dirname(unitxt_pkg.origin)
1✔
248
        constants.default_catalog_path = os.path.join(constants.package_dir, "catalog")
1✔
249
    else:
250
        constants.default_catalog_path = constants.local_catalog_path
×
251
    constants.catalog_dir = constants.local_catalog_path
1✔
252
    constants.dataset_url = "unitxt/data"
1✔
253
    constants.metric_url = "unitxt/metric"
1✔
254
    constants.version = version
1✔
255
    constants.catalog_hierarchy_sep = "."
1✔
256
    constants.env_local_catalogs_paths_sep = ":"
1✔
257
    constants.non_registered_files = [
1✔
258
        "__init__.py",
259
        "artifact.py",
260
        "utils.py",
261
        "register.py",
262
        "metric.py",
263
        "dataset.py",
264
        "blocks.py",
265
    ]
266
    constants.codebase_url = "https://github.com/IBM/unitxt"
1✔
267
    constants.website_url = "https://www.unitxt.org"
1✔
268
    constants.inference_stream = "__INFERENCE_STREAM__"
1✔
269
    constants.instance_stream = "__INSTANCE_STREAM__"
1✔
270
    constants.image_tag = "unitxt-img"
1✔
271
    constants.demos_pool_field = "_demos_pool_"
1✔
272

273

274
def get_settings() -> Settings:
1✔
275
    return Settings()
1✔
276

277

278
def get_constants():
1✔
279
    return Constants()
1✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc