• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

mcuntz / hesseflux / 9537697060

16 Jun 2024 05:21PM UTC coverage: 3.779% (+0.03%) from 3.746%
9537697060

push

github

mcuntz
Removed all warning with new pandas and numpy versions

1 of 161 new or added lines in 4 files covered. (0.62%)

9 existing lines in 2 files now uncovered.

39 of 1032 relevant lines covered (3.78%)

0.42 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

1.98
/src/hesseflux/gapfill.py
1
#!/usr/bin/env python
2
"""
11✔
3
Fill gaps of flux data from Eddy covariance measurements or estimate flux
4
uncertainties
5

6
This module was written by Matthias Cuntz while at Department of
7
Computational Hydrosystems, Helmholtz Centre for Environmental
8
Research - UFZ, Leipzig, Germany, and continued while at Institut
9
National de Recherche pour l'Agriculture, l'Alimentation et
10
l'Environnement (INRAE), Nancy, France.
11

12
:copyright: Copyright 2012-2022 Matthias Cuntz, see AUTHORS.rst for details.
13
:license: MIT License, see LICENSE for details.
14

15
.. moduleauthor:: Matthias Cuntz
16

17
The following functions are provided
18

19
.. autosummary::
20
   gapfill
21

22
History
23
    * Written Mar 2012 by Matthias Cuntz - mc (at) macu (dot) de
24
    * Ported to Python 3, Feb 2013, Matthias Cuntz
25
    * Input data can be ND-array, Apr 2014, Matthias Cuntz
26
    * Bug in longestmarginalgap: was only working at time series edges, rename
27
      it to longgap, Apr 2014, Matthias Cuntz
28
    * Keyword fullday, Apr 2014, Matthias Cuntz
29
    * Input can be pandas Dataframe or numpy array(s), Apr 2020, Matthias Cuntz
30
    * Using numpy docstring format, May 2020, Matthias Cuntz
31
    * Error estimates are undef by default, Jun 2021, Matthias Cuntz
32
    * Mean of values for error estimates, Jun 2021, Matthias Cuntz
33
    * Improved flake8 and numpy docstring, Oct 2021, Matthias Cuntz
34

35
"""
36
from __future__ import division, absolute_import, print_function
11✔
37
import numpy as np
11✔
38
import pandas as pd
11✔
39

40

41
__all__ = ['gapfill']
11✔
42

43

44
def gapfill(dfin, flag=None, date=None, timeformat='%Y-%m-%d %H:%M:%S',
11✔
45
            colhead=None,
46
            sw_dev=50., ta_dev=2.5, vpd_dev=5.,
47
            longgap=60, fullday=False, undef=-9999, ddof=1,
48
            err=False, errmean=False, verbose=0):
49
    """
50
    Fill gaps of flux data from Eddy covariance measurements
51
    or estimate flux uncertainties
52

53
    Fills gaps in flux data from Eddy covariance measurements with
54
    Marginal Distribution Sampling (MDS) according to Reichstein et al.
55
    (Global Change Biology, 2005).
56

57
    This means, if there is a gap in the data, look for similar meteorological
58
    conditions (defined as maximum possible deviations) in a certain time
59
    window and fill with the average of these 'similar' values.
60

61
    The routine can also do the same search for similar meteorological
62
    conditions for every data point and calculate its standard deviation as a
63
    measure of uncertainty after Lasslop et al. (Biogeosciences, 2008).
64

65
    Parameters
66
    ----------
67
    dfin : pandas.Dataframe or numpy.array
68
        time series of fluxes to fill as well as meteorological variables:
69
        incoming short-wave radiation, air temperature, and air vapour pressure
70
        deficit. *dfin* can be a pandas.Dataframe with the columns
71

72
           'SW_IN' (or starting with 'SW_IN') for incoming short-wave
73
           radiation [W m-2]
74

75
           'TA'    (or starting with `TA_`) for air temperature [deg C]
76

77
           'VPD'   (or starting with 'VPD') for air vapour deficit [hPa]
78

79
           as well as columns with ecosystem fluxes with missing values
80
           (gaps).
81

82
        The index is taken as date variable.
83

84
        *dfin* can also me a numpy array with the same columns. In this case
85
        *colhead*, *date*, and possibly *dateformat* must be given.
86
    flag : pandas.Dataframe or numpy.array, optional
87
        Dataframe or array has the same shape as dfin.
88
        Non-zero values in *flag* will be treated as missing values in *dfin*.
89
        *flag* must follow the same rules as *dfin* if pandas.Dataframe.
90
        If *flag* is numpy array, *df.columns.values* will be used as column
91
        heads and the index of *dfin* will be copied to *flag*.
92
    date : array_like of string, optional
93
        1D-array_like of calendar dates in format given in *timeformat*.
94
        *date* must be given if *dfin* is numpy array.
95
    timeformat : str, optional
96
        Format of dates in *date*, if given (default: '%Y-%m-%d %H:%M:%S').
97
        See strftime documentation of Python's datetime module:
98
        https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
99
    colhed : array_like of str, optional
100
        column names if *dfin* is numpy array. See *dfin* for mandatory
101
        column names.
102
    sw_dev : float, optional
103
        threshold for maximum deviation of global radiation (default: 50)
104
    ta_dev : float, optional
105
        threshold for maximum deviation of air Temperature (default: 2.5)
106
    vpd_dev : float, optional
107
        threshold for maximum deviation of vpd (default: 5)
108
    longgap : int, optional
109
        avoid extrapolation into a gap longer than *longgap* days (default: 60)
110
    fullday : bool, optional
111
        True: move beginning of large gap to start of next day and move end of
112
        large gap to end of last day (default: False)
113
    undef : float, optional
114
        values having *undef* value are treated as missing values in *dfin*
115
        (default: -9999).
116
        np.nan is not allowed as *undef* (not working).
117
    ddof : int, optional
118
        Delta Degrees of Freedom. The divisor used in calculation of standard
119
        deviation for error estimates (`err=True`) is `N-ddof`, where *N*
120
        represents the number of elements (default: 1).
121
    err : bool, optional
122
        True: fill every data point with standard deviation instead of mean,
123
        i.e. used for error generation as in Lasslop et al. (Biogeosci 2008)
124
        (default: False)
125
    errmean : bool, optional
126
        True: also return mean value of values for error estimates
127
        `if err == True` (default: False)
128
    shape : bool or tuple, optional
129
        True: output have the same shape as input data if *dfin* is
130
        numpy array; if a tuple is given, then this tuple is used to reshape.
131

132
        False: outputs are 1D arrays if *dfin* is numpy array (default: False).
133
    verbose : int, optional
134
        Verbosity level 0-3 (default: 0). 0 is no output; 3 is very verbose.
135

136
    Returns
137
    -------
138
    pandas.Dataframe(s) or numpy array(s)
139
        `if not err:` filled_data, quality_class
140

141
        `if err and not errmean:` err_estimate
142

143
        `if err and errmean:` err_estimate, mean_estimate
144

145
        pandas.Dataframe(s) will be returned if *dfin* was Dataframe.
146

147
        numpy array(s) will be returned if *dfin* was numpy array.
148

149
    Notes
150
    -----
151
    If *err*, there is no error estimate if there are no meteorological
152
    conditions in the vicinity of the data point (first cycle of
153
    Reichstein et al. GCB 2005).
154

155
    Routine does not work with `undef=np.nan`.
156

157
    Reichstein et al. (2005)
158
        On the separation of net ecosystem exchange into assimilation and
159
        ecosystem respiration: review and improved algorithm
160
        Global Change Biology 11, 1424-1439
161

162
    Lasslop et al. (2008)
163
        Influences of observation errors in eddy flux data on inverse model
164
        parameter estimation
165
        Biogeosciences, 5, 1311–1324
166

167
    Examples
168
    --------
169
    >>> import numpy as np
170
    >>> from fread import fread
171
    >>> from date2dec import date2dec
172
    >>> from dec2date import dec2date
173

174
    data
175

176
    >>> ifile = 'test_gapfill.csv' # Tharandt 1998 = Online tool example file
177
    >>> undef = -9999.
178
    >>> dat   = fread(ifile, skip=2, transpose=True)
179
    >>> ndat  = dat.shape[1]
180
    >>> head  = fread(ifile, skip=2, header=True)
181
    >>> head1 = head[0]
182
    >>> idx   = []
183
    >>> for i in head1:
184
    ...     if i in ['NEE', 'LE', 'H', 'Rg', 'Tair', 'VPD']:
185
    ...         idx.append(head1.index(i))
186
    >>> colhead = ['FC', 'LE', 'H', 'SW_IN', 'TA', 'VPD']
187
    >>> dfin = dat[idx,:]
188

189
    flag
190

191
    >>> flag = np.where(dfin == undef, 2, 0)
192
    >>> flag[0, :] = dat[head1.index('qcNEE'), :].astype(int)
193
    >>> flag[1, :] = dat[head1.index('qcLE'), :].astype(int)
194
    >>> flag[2, :] = dat[head1.index('qcH'), :].astype(int)
195
    >>> flag[np.where(flag==1)] = 0
196

197
    date
198

199
    >>> day_id  = head1.index('Day')
200
    >>> hour_id = head1.index('Hour')
201
    >>> ntime   = dat.shape[1]
202
    >>> year  = np.ones(ntime, dtype=int) * 1998
203
    >>> hh    = dat[hour_id, :].astype(int)
204
    >>> mn    = np.rint((dat[hour_id,:] - hh) * 60.).astype(int)
205
    >>> y0    = date2dec(yr=year[0], mo=1, dy=1, hr=hh, mi=mn)
206
    >>> jdate = y0 + dat[day_id, :]
207
    >>> adate = dec2date(jdate, eng=True)
208

209
    fill missing data
210

211
    >>> dat_f, flag_f = gapfill(dfin, flag=flag, date=adate, colhead=colhead,
212
    ...                         undef=undef, verbose=0)
213
    >>> print('{:d} {:d} {:d} {:d} {:d} {:d}'.format(*flag_f[0, 11006:11012]))
214
    1 1 1 2 2 2
215
    >>> print('{:.2f} {:.2f} {:.2f} {:.2f} {:.2f} {:.2f}'.format(
216
    ...       *dat_f[0, 11006:11012]))
217
    -18.68 -15.63 -19.61 -15.54 -12.40 -15.33
218

219
    1D err
220

221
    >>> dat_std = gapfill(dfin, flag=flag, date=adate, colhead=colhead,
222
    ...                   undef=undef, verbose=0, err=True)
223
    >>> print('{:.3f} {:.3f} {:.3f} {:.3f} {:.3f} {:.3f}'.format(
224
    ...       *dat_std[0, 11006:11012]))
225
    5.372 13.118 6.477 -9999.000 -9999.000 -9999.000
226

227
    >>> dat_err = np.ones(ndat, dtype=int)*(-1)
228
    >>> kk      = np.where((dat_std[0, :] != undef) & (dat_f[0, :] != 0.))[0]
229
    >>> dat_err[kk] = np.abs(dat_std[0,kk]/dat_f[0,kk]*100.).astype(int)
230
    >>> print('{:d} {:d} {:d} {:d} {:d} {:d}'.format(*dat_err[11006:11012]))
231
    28 83 33 -1 -1 -1
232

233
    1D err + mean
234

235
    >>> dat_std, dat_mean = gapfill(dfin, flag=flag, date=adate,
236
    ...                             colhead=colhead, undef=undef, verbose=0,
237
    ...                             err=True, errmean=True)
238
    >>> print('{:.3f} {:.3f} {:.3f} {:.3f} {:.3f} {:.3f}'.format(
239
    ...       *dat_std[0, 11006:11012]))
240
    5.372 13.118 6.477 -9999.000 -9999.000 -9999.000
241
    >>> print('{:.3f} {:.3f} {:.3f} {:.3f} {:.3f} {:.3f}'.format(
242
    ...       *dat_mean[0, 11006:11012]))
243
    -18.677 -15.633 -19.610 -9999.000 -9999.000 -9999.000
244

245
    """
246
    # Check input
247
    # numpy or panda
248
    if isinstance(dfin, (np.ndarray, np.ma.MaskedArray)):
×
249
        isnumpy = True
×
250
        istrans = False
×
251
        astr = 'colhead must be given if input is numpy.ndarray.'
×
252
        assert colhead is not None, astr
×
253
        if dfin.shape[0] == len(colhead):
×
254
            istrans = True
×
255
            df = pd.DataFrame(dfin.T, columns=colhead)
×
256
        elif dfin.shape[1] == len(colhead):
×
257
            df = pd.DataFrame(dfin, columns=colhead)
×
258
        else:
259
            estr = 'Length of colhead must be number of columns in input'
×
260
            estr = estr + ' array. len(colhead)=' + str(len(colhead))
×
261
            estr = estr + ' shape(input)=(' + str(dfin.shape[0])
×
262
            estr = estr + ',' + str(dfin.shape[1]) + ').'
×
263
            raise ValueError(estr)
×
264
        assert date is not None, 'date must be given if input is numpy arrary.'
×
265
        df['Datetime'] = pd.to_datetime(date, format=timeformat)
×
266
        df.set_index('Datetime', drop=True, inplace=True)
×
267
    else:
268
        isnumpy = False
×
269
        istrans = False
×
270
        astr = 'Input must be either numpy.ndarray or pandas.DataFrame.'
×
271
        assert isinstance(dfin, pd.core.frame.DataFrame), astr
×
272
        df = dfin.copy(deep=True)
×
273

274
    # Incoming flags
275
    if flag is not None:
×
276
        if isinstance(flag, (np.ndarray, np.ma.MaskedArray)):
×
277
            fisnumpy = True
×
278
            fistrans = False
×
279
            if flag.shape[0] == len(df):
×
280
                ff = pd.DataFrame(flag, columns=df.columns.values)
×
281
            elif flag.shape[1] == len(df):
×
282
                fistrans = True
×
283
                ff = pd.DataFrame(flag.T, columns=df.columns.values)
×
284
            else:
285
                estr = 'flag must have same shape as data array. data:'
×
286
                estr += ' ({:d},{:d}); flag: ({:d},{:d})'.format(
×
287
                    dfin.shape[0], dfin.shape[1], flag.shape[0], flag.shape[1])
288
                raise ValueError(estr)
×
289
            ff = ff.set_index(df.index)
×
290
        else:
291
            fisnumpy = False
×
292
            fistrans = False
×
293
            astr = 'Flag must be either numpy.ndarray or pandas.DataFrame.'
×
294
            assert isinstance(flag, pd.core.frame.DataFrame), astr
×
295
            ff = flag.copy(deep=True)
×
296
    else:
297
        fisnumpy = isnumpy
×
298
        fistrans = istrans
×
299
        # flags: 0: good; 1: input flagged; 2: output flagged
300
        ff = df.copy(deep=True).astype(int)
×
301
        ff[:] = 0
×
302
        ff[df == undef] = 1
×
303
        ff[df.isna()] = 1
×
304

305
    # Data and flags
306
    sw_id = ''
×
307
    for cc in df.columns:
×
308
        if cc.startswith('SW_IN_') or (cc == 'SW_IN'):
×
309
            sw_id = cc
×
310
            break
×
311
    ta_id = ''
×
312
    for cc in df.columns:
×
313
        if cc.startswith('TA_') or (cc == 'TA'):
×
314
            ta_id = cc
×
315
            break
×
316
    vpd_id = ''
×
317
    for cc in df.columns:
×
318
        if cc.startswith('VPD_') or (cc == 'VPD'):
×
319
            vpd_id = cc
×
320
            break
×
321
    astr = 'Global radiation with name SW or starting with SW_'
×
322
    astr = astr + ' must be in input.'
×
NEW
323
    assert sw_id, astr
×
324
    astr = 'Air temperature with name TA or starting with TA_'
×
325
    astr = astr + ' must be in input.'
×
NEW
326
    assert ta_id, astr
×
327
    astr = 'Vapour pressure deficit with name VPD or starting'
×
328
    astr = astr + ' with VPD_ must be in input.'
×
329
    assert vpd_id, astr
×
330

331
    sw      = df[sw_id].to_numpy()
×
332
    sw_flg  = ff[sw_id].to_numpy()
×
333
    ta      = df[ta_id].to_numpy()
×
334
    ta_flg  = ff[ta_id].to_numpy()
×
335
    vpd     = df[vpd_id].to_numpy()
×
336
    vpd_flg = ff[vpd_id].to_numpy()
×
337

338
    # dfill is filled data
339
    # ffill is fill flag if not err else error estimate
340
    dfill = df.copy(deep=True)
×
341
    if err:
×
342
        ffill = df.copy(deep=True)
×
343
    else:
344
        ffill = ff.copy(deep=True)
×
345
        ffill[:] = 0
×
346

347
    # Times
348
    # number of data points per week; basic factor of the time window
349
    week    = pd.Timedelta('1 W') / (df.index[1] - df.index[0])
×
350
    nperday = week // 7
×
351
    hour    = df.index.hour + df.index.minute / 60.
×
352
    day     = (df.index.to_julian_date() - 0.5).astype(int)
×
353

354
    # Filling variables
355
    ndata = len(df)
×
356
    for hcol in df.columns:
×
357

358
        if hcol.startswith('SW_IN_') or (hcol == 'SW_IN'):
×
359
            continue
×
360
        if hcol.startswith('TA_') or (hcol == 'TA'):
×
361
            continue
×
362
        if hcol.startswith('VPD_') or (hcol == 'VPD'):
×
363
            continue
×
364

365
        if verbose > 0:
×
366
            if err:
×
367
                print('  Error estimate ', str(hcol))
×
368
            else:
369
                print('  Filling ', str(hcol))
×
370

371
        data  = df[hcol].to_numpy()
×
372
        dflag = ff[hcol].to_numpy()
×
373

374
        data_f  = dfill[hcol].to_numpy()
×
375
        dflag_f = ffill[hcol].to_numpy()
×
376

377
        if err:
×
378
            data_f[:]  = undef
×
379
            dflag_f[:] = undef
×
380

381
        # Large margins
382

383
        # Check for large margins at beginning
384
        largegap   = np.zeros(ndata, dtype=bool)
×
385
        firstvalid = np.amin(np.where(dflag == 0)[0])
×
386
        lastvalid  = np.amax(np.where(dflag == 0)[0])
×
387
        nn         = int(nperday * longgap)
×
388
        if firstvalid > nn:
×
389
            if verbose > 1:
×
390
                print('    Large margin at beginning: ', firstvalid)
×
NEW
391
            largegap[0:(firstvalid - nn)] = True
×
NEW
392
        if lastvalid < (ndata - nn):
×
393
            if verbose > 1:
×
NEW
394
                print('    Large margin at end: ', lastvalid - nn)
×
NEW
395
            largegap[(lastvalid + nn):] = True
×
396

397
        # Large gaps
398

399
        # search largegap - code from maskgroup.py
400
        index  = []
×
401
        length = []
×
402
        count  = 0
×
403
        for i in range(ndata):
×
404
            if i == 0:
×
405
                if dflag[i] != 0:
×
406
                    index += [i]
×
407
                    count  = 1
×
408
            if i > 0:
×
NEW
409
                if (dflag[i] != 0) and (dflag[i - 1] == 0):
×
410
                    index += [i]
×
411
                    count  = 1
×
412
                elif dflag[i] != 0:
×
413
                    count += 1
×
NEW
414
                elif (dflag[i] == 0) and (dflag[i - 1] != 0):
×
415
                    length += [count]
×
416
                    count = 0
×
417
                else:
418
                    pass
×
419
        if count > 0:
×
420
            length += [count]
×
421

422
        # set largegap
423
        for i in range(len(index)):
×
424
            if length[i] > nn:
×
425
                if verbose > 1:
×
NEW
426
                    print('    Large gap: ', index[i], ':',
×
427
                          index[i] + length[i])
NEW
428
                largegap[index[i]:index[i] + length[i]] = True
×
429

430
        # set or unset rest of days in large gaps
431
        if fullday:
×
NEW
432
            for i in range(ndata - 1):
×
433
                # end of large margin
NEW
434
                if largegap[i] and not largegap[i + 1]:
×
435
                    largegap[np.where(day == day[i])[0]] = False
×
436
                # beginning of large margin
NEW
437
                elif not largegap[i] and largegap[i + 1]:
×
438
                    largegap[np.where(day == day[i])[0]] = False
×
439
                else:
440
                    continue
×
441

442
        # Gap filling
443

444
        # flag for all meteorological conditions
445
        meteo_flg = (ta_flg == 0) & (vpd_flg == 0) & (sw_flg == 0)
×
446
        # flag for all meteorological conditions and data
447
        total_flg = meteo_flg & (dflag == 0)
×
448

449
        # Fill loop over all data points
450
        for j in range(ndata):
×
451
            if not err:
×
452
                # no reason to go further, no gap -> continue
453
                if (dflag[j] == 0) | largegap[j]:
×
454
                    continue
×
455
            # 3 Methods
456
            #   1. ta, vpd and global radiation sw;
457
            #   2. just global radiation sw;
458
            #   3. no meteorolgical conditions: take the mean of +- hour
459

460
            # for better overview: dynamic calculation of radiation threshold
461
            # minimum 20; maximum 50 [Wm-2] according to private correspondence
462
            # with Markus Reichstein
463
            sw_devmax = np.maximum(20., np.minimum(sw[j], sw_dev))
×
464

465
            # Method 1: all met conditions
466
            if meteo_flg[j]:
×
467
                # search for values around the met-conditions
468
                # in a window of time
469
                # (one week in the first iteration and odd weeks in the next)
NEW
470
                j1  = j - np.arange(1, week + 1, dtype=int) + 1
×
471
                j2  = j + np.arange(1, week, dtype=int)
×
472
                jj  = np.append(j1, j2)
×
NEW
473
                win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
474
                # get boolean array where meteo-conditions are in a given width
NEW
475
                conditions = ( (np.abs(sw[win] - sw[j])   < sw_devmax) &
×
476
                               (np.abs(ta[win] - ta[j])   < ta_dev) &
477
                               (np.abs(vpd[win] - vpd[j]) < vpd_dev) &
478
                               total_flg[win] )
479
                num4avg = np.sum(conditions)
×
480
                # we need at least two samples with similar conditions
481
                if num4avg >= 2:
×
482
                    dat = np.ma.array(data[win], mask=~conditions)
×
483
                    if verbose > 2:
×
484
                        print('    m1.1: ', j, win.size, dat.mean(),
×
485
                              dat.std(ddof=ddof))
486
                    data_f[j] = dat.mean()
×
487
                    if err:
×
488
                        dflag_f[j] = dat.std(ddof=ddof)
×
489
                    else:
490
                        # assign also quality category of gap filling
491
                        dflag_f[j] = 1
×
492
                    continue
×
493
                else:  # --> extend time window to two weeks
NEW
494
                    j1  = j - np.arange(1, 2 * week + 1, dtype=int) + 1
×
NEW
495
                    j2  = j + np.arange(1, 2 * week, dtype=int)
×
496
                    jj  = np.append(j1, j2)
×
NEW
497
                    win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
498
                    conditions = ( (np.abs(sw[win]  - sw[j])  < sw_devmax) &
×
499
                                   (np.abs(ta[win]  - ta[j])  < ta_dev) &
500
                                   (np.abs(vpd[win] - vpd[j]) < vpd_dev) &
501
                                   total_flg[win] )
502
                    num4avg = np.sum(conditions)
×
503
                    if num4avg >= 2:
×
504
                        dat = np.ma.array(data[win], mask=~conditions)
×
505
                        if verbose > 2:
×
506
                            print('    m1.2: ', j, win.size, dat.mean(),
×
507
                                  dat.std(ddof=ddof))
508
                        data_f[j] = dat.mean()
×
509
                        if err:
×
510
                            dflag_f[j] = dat.std(ddof=ddof)
×
511
                        else:
512
                            # assign also quality category of gap filling
513
                            dflag_f[j] = 1
×
514
                        continue
×
515

516
            if err:
×
517
                continue
×
518
            # if you come here, gap-filling rather than error estimate
519

520
            # If nothing is found under similar meteo within two weeks,
521
            # look for global radiation within one week ->
522

523
            # Method 2: just global radiation available
524
            if sw_flg[j] == 0:
×
NEW
525
                j1  = j - np.arange(1, week + 1, dtype=int) + 1
×
526
                j2  = j + np.arange(1, week, dtype=int)
×
527
                jj  = np.append(j1, j2)
×
NEW
528
                win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
529
                # get boolean array where meteo-conditions are in a given width
NEW
530
                conditions = ( (np.abs(sw[win] - sw[j]) < sw_devmax) &
×
531
                               total_flg[win] )
532
                num4avg = np.sum(conditions)
×
533
                # we need at least two samples with similar conditions
534
                if num4avg >= 2:
×
535
                    dat = np.ma.array(data[win], mask=~conditions)
×
536
                    if verbose > 2:
×
537
                        print('    m2: ', j, win.size, dat.mean(),
×
538
                              dat.std(ddof=ddof))
539
                    data_f[j]  = dat.mean()
×
540
                    dflag_f[j] = 1
×
541
                    continue
×
542

543
            # If still nothing is found under similar sw within one week,
544
            # take the same hour within 1-7 days
545

546
            # Method 3: same hour
547
            enough = False
×
548
            for i in range(2):
×
NEW
549
                t_win = (nperday * (2 * i + 1)) // 2
×
NEW
550
                j1  = j - np.arange(1, t_win + 1, dtype=int) + 1
×
551
                j2  = j + np.arange(1, t_win, dtype=int)
×
552
                jj  = np.append(j1, j2)
×
NEW
553
                win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
NEW
554
                conditions = ( (np.abs(hour[win] - hour[j]) < 1.1)
×
555
                               & (dflag[win] == 0) )
556
                num4avg = np.sum(conditions)
×
557
                if num4avg >= 2:
×
558
                    dat = np.ma.array(data[win], mask=~conditions)
×
559
                    if verbose > 2:
×
560
                        print('    m3.{:d}: '.format(i), j, win.size,
×
561
                              dat.mean(), dat.std(ddof=ddof))
562
                    data_f[j] = dat.mean()
×
563
                    if i == 0:
×
564
                        dflag_f[j] = 1
×
565
                    else:
566
                        dflag_f[j] = 2
×
567
                    break
×
568

569
            # sanity check
570
            if dflag_f[j] > 0:
×
571
                continue
×
572

573
            # If still nothing is found, start a new cycle
574
            # with increased window size
575
            # Method 4: same as 1 but for 3-12 weeks
576
            if meteo_flg[j]:
×
577
                for multi in range(3, 12):
×
NEW
578
                    j1  = j - np.arange(1, multi * week + 1, dtype=int) + 1
×
NEW
579
                    j2  = j + np.arange(1, multi * week, dtype=int)
×
580
                    jj  = np.append(j1, j2)
×
NEW
581
                    win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
582
                    conditions = ( (np.abs(sw[win]  - sw[j])  < sw_devmax) &
×
583
                                   (np.abs(ta[win]  - ta[j])  < ta_dev) &
584
                                   (np.abs(vpd[win] - vpd[j]) < vpd_dev) &
585
                                   total_flg[win] )
586
                    num4avg = np.sum(conditions)
×
587
                    # we need at least two samples with similar conditions
588
                    if num4avg >= 2:
×
589
                        dat = np.ma.array(data[win], mask=~conditions)
×
590
                        if verbose > 2:
×
591
                            print('    m4.{:d}: '.format(multi), j, win.size,
×
592
                                  dat.mean(), dat.std(ddof=ddof))
593
                        data_f[j] = dat.mean()
×
594
                        # assign also quality category of gap filling
595
                        if multi <= 2:
×
596
                            dflag_f[j] = 1
×
597
                        elif multi > 4:
×
598
                            dflag_f[j] = 3
×
599
                        else:
600
                            dflag_f[j] = 2
×
601
                        break
×
602

603
                # Check because continue does not support
604
                # to jump out of two loops
605
                if dflag_f[j] > 0:
×
606
                    continue
×
607

608
            # Method 5: same as 2 but for 2-12 weeks
609
            if sw_flg[j] == 0:
×
610
                for multi in range(2, 12):
×
NEW
611
                    j1  = j - np.arange(1, multi * week + 1, dtype=int) + 1
×
NEW
612
                    j2  = j + np.arange(1, multi * week, dtype=int)
×
613
                    jj  = np.append(j1, j2)
×
NEW
614
                    win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
615
                    # get boolean array where meteo-conditions are
616
                    # in a given width
617
                    conditions = ( (np.abs(sw[win] - sw[j]) < sw_devmax) &
×
618
                                   total_flg[win] )
619
                    num4avg = np.sum(conditions)
×
620
                    # we need at least two samples with similar conditions
621
                    if num4avg >= 2:
×
622
                        dat = np.ma.array(data[win], mask=~conditions)
×
623
                        if verbose > 2:
×
624
                            print('    m5.{:d}: '.format(multi), j, win.size,
×
625
                                  dat.mean(), dat.std(ddof=ddof))
626
                        data_f[j] = dat.mean()
×
627
                        if multi == 0:
×
628
                            dflag_f[j] = 1
×
629
                        elif multi <= 2:
×
630
                            dflag_f[j] = 2
×
631
                        else:
632
                            dflag_f[j] = 3
×
633
                        break
×
634

635
                if dflag_f[j] > 0:
×
636
                    continue
×
637

638
            # Method 6: same as 3 but for 3-120 days
639
            for i in range(3, 120):
×
NEW
640
                t_win = nperday * (2 * i + 1) / 2
×
NEW
641
                j1  = j - np.arange(1, t_win + 1, dtype=int) + 1
×
642
                j2  = j + np.arange(1, t_win, dtype=int)
×
643
                jj  = np.append(j1, j2)
×
NEW
644
                win = np.unique(np.sort(np.clip(jj, 0, ndata - 1)))
×
NEW
645
                conditions = ( (np.abs(hour[win] - hour[j]) < 1.1)
×
646
                               & (dflag[win] == 0) )
647
                num4avg = np.sum(conditions)
×
648
                if num4avg >= 2:
×
649
                    dat = np.ma.array(data[win], mask=~conditions)
×
650
                    if verbose > 2:
×
651
                        print('    m6.{:d}: '.format(i), j, win.size,
×
652
                              dat.mean(), dat.std(ddof=ddof))
653
                    data_f[j]  = dat.mean()
×
654
                    dflag_f[j] = 3
×
655
                    break
×
656

657
        dfill[hcol] = data_f
×
658
        ffill[hcol] = dflag_f
×
659

660
    # Finish
661

662
    if isnumpy:
×
663
        if istrans:
×
664
            dfout = dfill.to_numpy().T
×
665
        else:
666
            dfout = dfill.to_numpy()
×
667
    else:
668
        dfout = dfill
×
669

670
    if fisnumpy:
×
671
        if fistrans:
×
672
            ffout = ffill.to_numpy().T
×
673
        else:
674
            ffout = ffill.to_numpy()
×
675
    else:
676
        ffout = ffill
×
677

678
    if err:
×
679
        if errmean:
×
680
            return ffout, dfout
×
681
        else:
682
            return ffout
×
683
    else:
684
        return dfout, ffout
×
685

686

687
if __name__ == '__main__':
688
    import doctest
689
    doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)
690

691
    # import numpy as np
692
    # from fread import fread
693
    # from date2dec import date2dec
694
    # from dec2date import dec2date
695
    # from autostring import astr
696
    # ifile = 'test_gapfill.csv' # Tharandt 1998 = Online tool example file
697
    # undef = -9999.
698
    # # Day Hour NEE         qcNEE  LE    qcLE  H       qcH  Rg    Tair  Tsoil  rH     VPD  Ustar
699
    # # --  --   umolm-2s-1  --     Wm-2  --    Wm-2    --   Wm-2  degC  degC   %      hPa  ms-1
700
    # # 1   0.5  -1.21       1      1.49  1     -11.77  1    0     7.4   4.19   55.27  4.6  0.72
701
    # dat   = fread(ifile, skip=2, transpose=True)
702
    # # dat = dat[:,:1000]
703
    # ndat  = dat.shape[1]
704
    # head  = fread(ifile, skip=2, header=True)
705
    # head1 = head[0]
706
    # # colhead
707
    # idx   = []
708
    # for i in head1:
709
    #     if i in ['NEE', 'LE', 'H', 'Rg', 'Tair', 'VPD']: idx.append(head1.index(i))
710
    # colhead = ['FC', 'LE', 'H', 'SW_IN', 'TA', 'VPD']
711
    # # data
712
    # dfin = dat[idx,:]
713
    # # flag
714
    # flag = np.where(dfin == undef, 2, 0)
715
    # flag[0,:] = dat[head1.index('qcNEE'),:].astype(int)
716
    # flag[1,:] = dat[head1.index('qcLE'),:].astype(int)
717
    # flag[2,:] = dat[head1.index('qcH'),:].astype(int)
718
    # flag[np.where(flag==1)] = 0
719
    # # date
720
    # day_id  = head1.index('Day')
721
    # hour_id = head1.index('Hour')
722
    # ntime   = dat.shape[1]
723
    # year  = np.ones(ntime, dtype=int) * 1998
724
    # hh    = dat[hour_id,:].astype(int)
725
    # mn    = np.rint((dat[hour_id,:]-hh)*60.).astype(int)
726
    # y0    = date2dec(yr=year[0], mo=1, dy=1, hr=hh, mi=mn)
727
    # jdate = y0 + dat[day_id,:]
728
    # adate = dec2date(jdate, eng=True)
729
    # # fill
730
    # dat_f, flag_f = gapfill(dfin, flag=flag, date=adate, colhead=colhead, undef=undef, verbose=0)
731
    # print(astr(flag_f[0,11006:11012],0,pp=True))
732
    # # ['1' '1' '1' '2' '2' '2']
733
    # print(astr(dat_f[0,11006:11012],3,pp=True))
734
    # # ['-18.678' '-15.633' '-19.610' '-15.536' '-12.402' '-15.329']
735

736
    # # 1D err
737
    # dat_std = gapfill(dfin, flag=flag, date=adate, colhead=colhead, undef=undef, verbose=0, err=True)
738
    # print(astr(dat_std[0,11006:11012],3,pp=True))
739
    # # ['    5.372' '   13.118' '    6.477' '-9999.000' '-9999.000' '-9999.000']
740

741
    # dat_err     = np.ones(ndat, dtype=int)*(-1)
742
    # kk          = np.where((dat_std[0,:] != undef) & (dat_f[0,:] != 0.))[0]
743
    # dat_err[kk] = np.abs(dat_std[0,kk]/dat_f[0,kk]*100.).astype(int)
744
    # print(astr(dat_err[11006:11012],pp=True))
745
    # # [' 28' ' 83' ' 33' ' -1' ' -1' ' -1']
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc