• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

winter-telescope / winterdrp / 3688932405

pending completion
3688932405

push

github

GitHub
Close #197, use cache to massively reduce RAM (#231)

78 of 78 new or added lines in 6 files covered. (100.0%)

4569 of 6122 relevant lines covered (74.63%)

0.75 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

91.18
/winterdrp/data/image_data.py
1
"""
2
Module to specify the input data classes for
3
:class:`winterdrp.processors.base_processor.ImageHandler`
4

5
The basic idea of the code is to pass
6
:class:`~winterdrp.data.base_data.DataBlock` objects
7
through a series of :class:`~wintedrp.processors.BaseProcessor` objects.
8
Since a given image can easily be ~10-100Mb, and there may be several hundred raw images
9
from a typical survey in a given night, the total data volume for these processors
10
could be several 10s of Gb or more. Storing these all in RAM would be very
11
inefficient/slow for a typical laptop or many larger processing machines.
12

13
To mitigate this, the code can be operated in **cache mode**. In that case,
14
after raw images are loaded, only the header data is stored in memory.
15
The actual image data itself is stored temporarily in as a npy file
16
in a dedicated cache directory, and only loaded into memory when needed.
17
When the data is updated, the npy file is changed.
18
The path of the file is a unique hash, and includes the read time of the file,
19
so multiple copies of an image can be read and modified independently.
20

21
In cache mode, all of the image data is temporarily stored in a cache,
22
and this cache can therefore reach the size of 10s of Gb.
23
The location of the cache is in the configurable
24
**output data directory**. This would increase linearly with successive code executions.
25
To mitigate that, and to avoid cleaning the cache by hand,
26
the code tries to automatically delete cache files as needed.
27

28
Python provides a default `__del__()` method for handling clean up when an object
29
is deleted. Images automatically delete their cache in this method. However, has a
30
somewhat-complicated method of 'garbage collection' (see
31
`the official description <https://devguide.python.org/internals/garbage-collector>`_
32
for more info), and it is not guaranteed that Image objects will
33
clean themselves.
34

35
As a fallback, we provide the helper function to delete all cache files created
36
during a session. When you run the code from the command line (and therefore call
37
__main__), we automatically run the cleanup before exiting,
38
even if the code crashes/raises errors. This is also true for the unit tests,
39
as provided by the  base test class. **If you try to interact with the code in
40
any other way, please be mindful of this behaviour, and ensure that you clean your
41
cache in a responsible way!**
42

43
If you don't like this feature, you don't need to use it. Cache mode is entirely
44
optional, and can be disabled by setting the environment variable to false.
45

46
.. literalinclude:: ../../winterdrp/paths.py
47
    :lines: 29
48

49
You can change this via an environment variable.
50

51
.. code-block:: bash
52

53
    export USE_WINTER_CACHE = false
54

55
See :doc:`usage` for more information about selecting cache mode,
56
and setting the output data directory.
57
"""
58
import hashlib
1✔
59
import logging
1✔
60
from pathlib import Path
1✔
61

62
import numpy as np
1✔
63
from astropy.io.fits import Header
1✔
64
from astropy.time import Time
1✔
65

66
from winterdrp.data.base_data import DataBatch, DataBlock
1✔
67
from winterdrp.paths import CACHE_DIR, USE_CACHE
1✔
68

69
logger = logging.getLogger(__name__)
1✔
70

71

72
class Image(DataBlock):
1✔
73
    """
74
    A subclass of :class:`~winterdrp.data.base_data.DataBlock`,
75
    containing an image and header.
76

77
    This class serves as input for
78
    :class:`~winterdrp.processors.base_processor.BaseImageProcessor` and
79
    :class:`~winterdrp.processors.base_processor.BaseCandidateGenerator` processors.
80
    """
81

82
    cache_files = []
1✔
83

84
    def __init__(self, data: np.ndarray, header: Header):
1✔
85
        self._data = None
1✔
86
        self.header = header
1✔
87
        super().__init__()
1✔
88
        self.cache_path = self.get_cache_path()
1✔
89
        if USE_CACHE:
1✔
90
            self.cache_files.append(self.cache_path)
1✔
91
        self.set_data(data=data)
1✔
92

93
    def get_cache_path(self) -> Path:
1✔
94
        """
95
        Get a unique cache path for the image (.npy file).
96
        This is hash, using name and time, so should be unique even
97
        when rerunning on the same image.
98

99
        :return: unique cache file path
100
        """
101
        base = "".join([str(Time.now()), self.get_name()])
1✔
102
        name = f"{hashlib.sha1(base.encode()).hexdigest()}.npy"
1✔
103
        return CACHE_DIR.joinpath(name)
1✔
104

105
    def __str__(self):
1✔
106
        return f"<An {self.__class__.__name__} object, built from {self.get_name()}>"
×
107

108
    def set_data(self, data: np.ndarray):
1✔
109
        """
110
        Set the data with cache
111

112
        :param data: Updated image data
113
        :return: None
114
        """
115
        if USE_CACHE:
1✔
116
            self.set_cache_data(data)
1✔
117
        else:
118
            self.set_ram_data(data)
×
119

120
    def set_cache_data(self, data: np.ndarray):
1✔
121
        """
122
        Set the data with cache
123

124
        :param data: Updated image data
125
        :return: None
126
        """
127
        np.save(self.cache_path.as_posix(), data)
1✔
128

129
    def set_ram_data(self, data: np.ndarray):
1✔
130
        """
131
        Set the data in RAM
132

133
        :param data: Updated image data
134
        :return: None
135
        """
136
        self._data = data
×
137

138
    def get_data(self) -> np.ndarray:
1✔
139
        """
140
        Get the image data from cache
141

142
        :return: image data (numpy array)
143
        """
144
        if USE_CACHE:
1✔
145
            return self.get_cache_data()
1✔
146

147
        return self.get_ram_data()
×
148

149
    def get_cache_data(self) -> np.ndarray:
1✔
150
        """
151
        Get the image data from cache
152

153
        :return: image data (numpy array)
154
        """
155
        return np.load(self.cache_path.as_posix())
1✔
156

157
    def get_ram_data(self) -> np.ndarray:
1✔
158
        """
159
        Get the image data from RAM
160

161
        :return: image data (numpy array)
162
        """
163
        return self._data
×
164

165
    def get_header(self) -> Header:
1✔
166
        """
167
        Get the image header
168

169
        :return: astropy Header
170
        """
171
        return self.header
1✔
172

173
    def set_header(self, header: Header):
1✔
174
        """
175
        Update the header
176

177
        :param header: updated header
178
        :return: None
179
        """
180
        self.header = header
1✔
181

182
    def __getitem__(self, item):
1✔
183
        return self.header.__getitem__(item)
1✔
184

185
    def __setitem__(self, key, value):
1✔
186
        self.header.__setitem__(key, value)
1✔
187

188
    def keys(self):
1✔
189
        """
190
        Get the header keys
191

192
        :return: Keys of header
193
        """
194
        return self.header.keys()
1✔
195

196
    def __del__(self):
1✔
197
        self.cache_path.unlink(missing_ok=True)
1✔
198
        self.cache_files.remove(self.cache_path)
1✔
199

200

201
class ImageBatch(DataBatch):
1✔
202
    """
203
    A subclass of :class:`~winterdrp.data.base_data.DataBatch`,
204
    which contains :class:`~winterdrp.data.image_data.Image` objects
205
    """
206

207
    data_type = Image
1✔
208

209
    def __init__(self, batch: list[Image] | Image = None):
1✔
210
        super().__init__(batch=batch)
1✔
211

212
    def append(self, item: Image):
1✔
213
        self._append(item)
1✔
214

215
    def __str__(self):
1✔
216
        return (
×
217
            f"<An {self.__class__.__name__} object, "
218
            f"containing {[x.get_name() for x in self.get_batch()]}>"
219
        )
220

221
    def get_batch(self) -> list[Image]:
1✔
222
        """Returns the :class:`~winterdrp.data.image_data.ImageBatch`
223
        items within the batch
224

225
        :return: list of :class:`~winterdrp.data.image_data.Image` objects
226
        """
227
        return self.get_data_list()
1✔
228

229

230
def clean_cache():
1✔
231
    """Function to clear all created cache files
232

233
    :return: None
234
    """
235
    for path in Image.cache_files:
1✔
236
        path.unlink(missing_ok=True)
1✔
237
    Image.cache_files = []
1✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc