• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

realm / realm-core / michael.wilkersonbarker_971

23 Feb 2024 04:24AM UTC coverage: 91.835% (-0.03%) from 91.865%
michael.wilkersonbarker_971

Pull #7365

Evergreen

michael-wb
First round of updates from review
Pull Request #7365: RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection

93112 of 171580 branches covered (54.27%)

209 of 226 new or added lines in 9 files covered. (92.48%)

126 existing lines in 18 files now uncovered.

235370 of 256297 relevant lines covered (91.83%)

6232257.12 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

82.86
/src/realm/index_string.hpp
1
/*************************************************************************
2
 *
3
 * Copyright 2016 Realm Inc.
4
 *
5
 * Licensed under the Apache License, Version 2.0 (the "License");
6
 * you may not use this file except in compliance with the License.
7
 * You may obtain a copy of the License at
8
 *
9
 * http://www.apache.org/licenses/LICENSE-2.0
10
 *
11
 * Unless required by applicable law or agreed to in writing, software
12
 * distributed under the License is distributed on an "AS IS" BASIS,
13
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
 * See the License for the specific language governing permissions and
15
 * limitations under the License.
16
 *
17
 **************************************************************************/
18

19
#ifndef REALM_INDEX_STRING_HPP
20
#define REALM_INDEX_STRING_HPP
21

22
#include <cstring>
23
#include <memory>
24
#include <array>
25
#include <set>
26

27
#include <realm/array.hpp>
28
#include <realm/cluster_tree.hpp>
29

30
/*
31
The StringIndex class is used for both type_String and all integral types, such as type_Bool, type_Timestamp and
32
type_Int. When used for integral types, the 64-bit integer is simply casted to a string of 8 bytes through a
33
pretty simple "wrapper layer" in all public methods.
34

35
The StringIndex data structure is like an "inversed" B+ tree where the leafs contain row indexes and the non-leafs
36
contain 4-byte chunks of payload. Imagine a table with following strings:
37

38
       hello, kitty, kitten, foobar, kitty, foobar
39

40
The topmost level of the index tree contains prefixes of the payload strings of length <= 4. The next level contains
41
prefixes of the remaining parts of the strings. Unnecessary levels of the tree are optimized away; the prefix "foob"
42
is shared only by rows that are identical ("foobar"), so "ar" is not needed to be stored in the tree.
43

44
       hell   kitt      foob
45
        |      /\        |
46
        0     en  y    {3, 5}
47
              |    \
48
           {1, 4}   2
49

50
Each non-leafs consists of two integer arrays of the same length, one containing payload and the other containing
51
references to the sublevel nodes.
52

53
The leafs can be either a single value or a Column. If the reference in its parent node has its least significant
54
bit set, then the remaining upper bits specify the row index at which the string is stored. If the bit is clear,
55
it must be interpreted as a reference to a Column that stores the row indexes at which the string is stored.
56

57
If a Column is used, then all row indexes are guaranteed to be sorted increasingly, which means you an search in it
58
using our binary search functions such as upper_bound() and lower_bound(). Each duplicate value will be stored in
59
the same Column, but Columns may contain more than just duplicates if the depth of the tree exceeds the value
60
`s_max_offset` This is to avoid stack overflow problems with many of our recursive functions if we have two very
61
long strings that have a long common prefix but differ in the last couple bytes. If a Column stores more than just
62
duplicates, then the list is kept sorted in ascending order by string value and within the groups of common
63
strings, the rows are sorted in ascending order.
64
*/
65

66
namespace realm {
67

68
class Spec;
69
class Timestamp;
70
class ClusterColumn;
71

72
template <class T>
73
class BPlusTree;
74

75
/// Each StringIndex node contains an array of this type
76
class IndexArray : public Array {
77
public:
78
    IndexArray(Allocator& allocator)
79
        : Array(allocator)
80
    {
7,040,547✔
81
    }
7,040,547✔
82

83
    ObjKey index_string_find_first(Mixed value, const ClusterColumn& column) const;
84
    void index_string_find_all(std::vector<ObjKey>& result, Mixed value, const ClusterColumn& column,
85
                               bool case_insensitive = false) const;
86
    void index_string_find_all_prefix(std::set<int64_t>& result, StringData str) const
87
    {
66✔
88
        _index_string_find_all_prefix(result, str, NodeHeader::get_header_from_data(m_data));
66✔
89
    }
66✔
90

91
    FindRes index_string_find_all_no_copy(Mixed value, const ClusterColumn& column, InternalFindResult& result) const;
92
    size_t index_string_count(Mixed value, const ClusterColumn& column) const;
93

94
private:
95
    template <IndexMethod>
96
    int64_t from_list(Mixed value, InternalFindResult& result_ref, const IntegerColumn& key_values,
97
                      const ClusterColumn& column) const;
98

99
    void from_list_all(Mixed value, std::vector<ObjKey>& result, const IntegerColumn& rows,
100
                       const ClusterColumn& column) const;
101

102
    void from_list_all_ins(StringData value, std::vector<ObjKey>& result, const IntegerColumn& rows,
103
                           const ClusterColumn& column) const;
104

105
    template <IndexMethod method>
106
    int64_t index_string(Mixed value, InternalFindResult& result_ref, const ClusterColumn& column) const;
107

108
    void index_string_all(Mixed value, std::vector<ObjKey>& result, const ClusterColumn& column) const;
109

110
    void index_string_all_ins(StringData value, std::vector<ObjKey>& result, const ClusterColumn& column) const;
111
    void _index_string_find_all_prefix(std::set<int64_t>& result, StringData str, const char* header) const;
112
};
113

114
// 16 is the biggest element size of any non-string/binary Realm type
115
constexpr size_t string_conversion_buffer_size = 16;
116
using StringConversionBuffer = std::array<char, string_conversion_buffer_size>;
117
static_assert(sizeof(UUID::UUIDBytes) <= string_conversion_buffer_size,
118
              "if you change the size of a UUID then also change the string index buffer space");
119

120
// The purpose of this class is to get easy access to fields in a specific column in the
121
// cluster. When you have an object like this, you can get a string version of the relevant
122
// field based on the key for the object.
123
class ClusterColumn {
124
public:
125
    ClusterColumn(const ClusterTree* cluster_tree, ColKey column_key, IndexType type)
126
        : m_cluster_tree(cluster_tree)
127
        , m_column_key(column_key)
128
        , m_type(type)
129
    {
960,474✔
130
    }
960,474✔
131
    size_t size() const
132
    {
1,320✔
133
        return m_cluster_tree->size();
1,320✔
134
    }
1,320✔
135
    ClusterTree::Iterator begin() const
136
    {
×
137
        return ClusterTree::Iterator(*m_cluster_tree, 0);
×
138
    }
×
139

140
    ClusterTree::Iterator end() const
141
    {
×
142
        return ClusterTree::Iterator(*m_cluster_tree, size());
×
143
    }
×
144

145

146
    DataType get_data_type() const;
147
    ColKey get_column_key() const
148
    {
×
149
        return m_column_key;
×
150
    }
×
151
    bool is_nullable() const
152
    {
×
153
        return m_column_key.is_nullable();
×
154
    }
×
155
    bool is_fulltext() const
156
    {
6,686,160✔
157
        return m_type == IndexType::Fulltext;
6,686,160✔
158
    }
6,686,160✔
159
    Mixed get_value(ObjKey key) const;
160
    std::vector<ObjKey> get_all_keys() const;
161

162
private:
163
    const ClusterTree* m_cluster_tree;
164
    ColKey m_column_key;
165
    IndexType m_type;
166
};
167

168
class StringIndex {
169
public:
170
    StringIndex(const ClusterColumn& target_column, Allocator&);
171
    StringIndex(ref_type, ArrayParent*, size_t ndx_in_parent, const ClusterColumn& target_column, Allocator&);
172
    ~StringIndex() noexcept
173
    {
7,039,350✔
174
    }
7,039,350✔
175

176
    ColKey get_column_key() const
177
    {
×
178
        return m_target_column.get_column_key();
×
179
    }
×
180

181
    static bool type_supported(realm::DataType type)
182
    {
96,711✔
183
        return (type == type_Int || type == type_String || type == type_Bool || type == type_Timestamp ||
96,711✔
184
                type == type_ObjectId || type == type_Mixed || type == type_UUID);
68,964✔
185
    }
96,711✔
186

187
    void set_target(const ClusterColumn& target_column) noexcept;
188

189
    // Accessor concept:
190
    Allocator& get_alloc() const noexcept;
191
    void destroy() noexcept;
192
    void detach();
193
    bool is_attached() const noexcept;
194
    void set_parent(ArrayParent* parent, size_t ndx_in_parent) noexcept;
195
    size_t get_ndx_in_parent() const noexcept;
196
    void set_ndx_in_parent(size_t ndx_in_parent) noexcept;
197
    void update_from_parent() noexcept;
198
    void refresh_accessor_tree(const ClusterColumn& target_column);
199
    ref_type get_ref() const noexcept;
200

201
    // StringIndex interface:
202

203
    bool is_empty() const;
204
    bool is_fulltext_index() const
205
    {
933,087✔
206
        return this->m_target_column.is_fulltext();
933,087✔
207
    }
933,087✔
208

209
    template <class T>
210
    void insert(ObjKey key, T value);
211
    template <class T>
212
    void insert(ObjKey key, util::Optional<T> value);
213

214
    template <class T>
215
    void set(ObjKey key, T new_value);
216
    template <class T>
217
    void set(ObjKey key, util::Optional<T> new_value);
218

219
    void erase(ObjKey key);
220

221
    template <class T>
222
    ObjKey find_first(T value) const;
223
    template <class T>
224
    void find_all(std::vector<ObjKey>& result, T value, bool case_insensitive = false) const;
225
    template <class T>
226
    FindRes find_all_no_copy(T value, InternalFindResult& result) const;
227
    template <class T>
228
    size_t count(T value) const;
229

230
    void find_all_fulltext(std::vector<ObjKey>& result, StringData value) const;
231

232
    void clear();
233

234
    bool has_duplicate_values() const noexcept;
235

236
    void verify() const;
237
#ifdef REALM_DEBUG
238
    template <class T>
239
    void verify_entries(const ClusterColumn& column) const;
240
    void do_dump_node_structure(std::ostream&, int) const;
241
#endif
242

243
    typedef int32_t key_type;
244

245
    // s_max_offset specifies the number of levels of recursive string indexes
246
    // allowed before storing everything in lists. This is to avoid nesting
247
    // to too deep of a level. Since every SubStringIndex stores 4 bytes, this
248
    // means that a StringIndex is helpful for strings of a common prefix up to
249
    // 4 times this limit (200 bytes shared). Lists are stored in sorted order,
250
    // so strings sharing a common prefix of more than this limit will use a
251
    // binary search of approximate complexity log2(n) from `std::lower_bound`.
252
    static const size_t s_max_offset = 200; // max depth * s_index_key_length
253
    static const size_t s_index_key_length = 4;
254
    static key_type create_key(StringData) noexcept;
255
    static key_type create_key(StringData, size_t) noexcept;
256

257
private:
258
    // m_array is a compact representation for storing the children of this StringIndex.
259
    // Children can be:
260
    // 1) a row number
261
    // 2) a reference to a list which stores row numbers (for duplicate strings).
262
    // 3) a reference to a sub-index
263
    // m_array[0] is always a reference to a values array which stores the 4 byte chunk
264
    // of payload data for quick string chunk comparisons. The array stored
265
    // at m_array[0] lines up with the indices of values in m_array[1] so for example
266
    // starting with an empty StringIndex:
267
    // StringColumn::insert(target_row_ndx=42, value="test_string") would result with
268
    // get_array_from_ref(m_array[0])[0] == create_key("test") and
269
    // m_array[1] == 42
270
    // In this way, m_array which stores one child has a size of two.
271
    // Children are type 1 (row number) if the LSB of the value is set.
272
    // To get the actual row value, shift value down by one.
273
    // If the LSB of the value is 0 then the value is a reference and can be either
274
    // type 2, or type 3 (no shifting in either case).
275
    // References point to a list if the context header flag is NOT set.
276
    // If the header flag is set, references point to a sub-StringIndex (nesting).
277
    std::unique_ptr<IndexArray> m_array;
278
    ClusterColumn m_target_column;
279

280
    struct inner_node_tag {
281
    };
282
    StringIndex(inner_node_tag, Allocator&);
283

284
    static std::unique_ptr<IndexArray> create_node(Allocator&, bool is_leaf);
285

286
    void insert_with_offset(ObjKey key, StringData index_data, const Mixed& value, size_t offset);
287
    void insert_row_list(size_t ref, size_t offset, StringData value);
288
    void insert_to_existing_list(ObjKey key, Mixed value, IntegerColumn& list);
289
    void insert_to_existing_list_at_lower(ObjKey key, Mixed value, IntegerColumn& list,
290
                                          const IntegerColumnIterator& lower);
291
    key_type get_last_key() const;
292

293
    struct NodeChange {
294
        size_t ref1;
295
        size_t ref2;
296
        enum ChangeType { change_None, change_InsertBefore, change_InsertAfter, change_Split } type;
297
        NodeChange(ChangeType t, size_t r1 = 0, size_t r2 = 0)
298
            : ref1(r1)
299
            , ref2(r2)
300
            , type(t)
301
        {
9,184,680✔
302
        }
9,184,680✔
303
        NodeChange()
304
            : ref1(0)
305
            , ref2(0)
306
            , type(change_None)
307
        {
×
308
        }
×
309
    };
310

311
    // B-Tree functions
312
    void TreeInsert(ObjKey obj_key, key_type, size_t offset, StringData index_data, const Mixed& value);
313
    NodeChange do_insert(ObjKey, key_type, size_t offset, StringData index_data, const Mixed& value);
314
    /// Returns true if there is room or it can join existing entries
315
    bool leaf_insert(ObjKey obj_key, key_type, size_t offset, StringData index_data, const Mixed& value,
316
                     bool noextend = false);
317
    void node_insert_split(size_t ndx, size_t new_ref);
318
    void node_insert(size_t ndx, size_t ref);
319
    // Erase without getting value from parent column (useful when string stored
320
    // does not directly match string in parent, like with full-text indexing)
321
    void erase_string(ObjKey key, StringData value);
322
    void do_delete(ObjKey key, StringData, size_t offset);
323

324
    Mixed get(ObjKey key) const;
325

326
    void node_add_key(ref_type ref);
327

328
#ifdef REALM_DEBUG
329
    static void dump_node_structure(const Array& node, std::ostream&, int level);
330
#endif
331
};
332

333
class SortedListComparator {
334
public:
335
    SortedListComparator(const ClusterTree* cluster_tree, ColKey column_key, IndexType type)
336
        : m_column(cluster_tree, column_key, type)
337
    {
×
338
    }
×
339
    SortedListComparator(const ClusterColumn& column)
340
        : m_column(column)
341
    {
4,206,549✔
342
    }
4,206,549✔
343

344
    bool operator()(int64_t key_value, Mixed needle);
345
    bool operator()(Mixed needle, int64_t key_value);
346

347
private:
348
    const ClusterColumn m_column;
349
};
350

351

352
// Implementation:
353
inline StringIndex::StringIndex(const ClusterColumn& target_column, Allocator& alloc)
354
    : m_array(create_node(alloc, true)) // Throws
355
    , m_target_column(target_column)
356
{
144,651✔
357
}
144,651✔
358

359
inline StringIndex::StringIndex(ref_type ref, ArrayParent* parent, size_t ndx_in_parent,
360
                                const ClusterColumn& target_column, Allocator& alloc)
361
    : m_array(new IndexArray(alloc))
362
    , m_target_column(target_column)
363
{
6,895,941✔
364
    REALM_ASSERT_EX(Array::get_context_flag_from_header(alloc.translate(ref)), ref, size_t(alloc.translate(ref)));
6,895,941✔
365
    m_array->init_from_ref(ref);
6,895,941✔
366
    set_parent(parent, ndx_in_parent);
6,895,941✔
367
}
6,895,941✔
368

369
inline StringIndex::StringIndex(inner_node_tag, Allocator& alloc)
370
    : m_array(create_node(alloc, false)) // Throws
371
    , m_target_column(ClusterColumn(nullptr, {}, IndexType::General))
372
{
72✔
373
}
72✔
374

375
// Byte order of the key is *reversed*, so that for the integer index, the least significant
376
// byte comes first, so that it fits little-endian machines. That way we can perform fast
377
// range-lookups and iterate in order, etc, as future features. This, however, makes the same
378
// features slower for string indexes. Todo, we should reverse the order conditionally, depending
379
// on the column type.
380
inline StringIndex::key_type StringIndex::create_key(StringData str) noexcept
381
{
12,433,467✔
382
    key_type key = 0;
12,433,467✔
383

6,193,911✔
384
    if (str.size() >= 4)
12,433,467✔
385
        goto four;
11,712,621✔
386
    if (str.size() < 2) {
720,846✔
387
        if (str.size() == 0)
495,693✔
388
            goto none;
×
389
        goto one;
495,693✔
390
    }
495,693✔
391
    if (str.size() == 2)
225,153✔
392
        goto two;
193,167✔
393
    goto three;
31,986✔
394

15,210✔
395
// Create 4 byte index key
15,210✔
396
// (encoded like this to allow literal comparisons
15,210✔
397
// independently of endianness)
15,210✔
398
four:
11,712,837✔
399
    key |= (key_type(static_cast<unsigned char>(str[3])) << 0);
11,712,837✔
400
three:
11,745,006✔
401
    key |= (key_type(static_cast<unsigned char>(str[2])) << 8);
11,745,006✔
402
two:
11,935,956✔
403
    key |= (key_type(static_cast<unsigned char>(str[1])) << 16);
11,935,956✔
404
one:
12,429,423✔
405
    key |= (key_type(static_cast<unsigned char>(str[0])) << 24);
12,429,423✔
406
none:
12,431,097✔
407
    return key;
12,431,097✔
408
}
12,429,423✔
409

410
// Index works as follows: All non-NULL values are stored as if they had appended an 'X' character at the end. So
411
// "foo" is stored as if it was "fooX", and "" (empty string) is stored as "X". And NULLs are stored as empty strings.
412
inline StringIndex::key_type StringIndex::create_key(StringData str, size_t offset) noexcept
413
{
12,997,512✔
414
    if (str.is_null())
12,997,512✔
415
        return 0;
573,441✔
416

6,187,548✔
417
    if (offset > str.size())
12,424,071✔
UNCOV
418
        return 0;
×
419

6,187,548✔
420
    // for very short strings
6,187,548✔
421
    size_t tail = str.size() - offset;
12,424,071✔
422
    if (tail <= sizeof(key_type) - 1) {
12,424,071✔
423
        char buf[sizeof(key_type)];
762,684✔
424
        memset(buf, 0, sizeof(key_type));
762,684✔
425
        buf[tail] = 'X';
762,684✔
426
        memcpy(buf, str.data() + offset, tail);
762,684✔
427
        return create_key(StringData(buf, tail + 1));
762,684✔
428
    }
762,684✔
429
    // else fallback
5,806,842✔
430
    return create_key(str.substr(offset));
11,661,387✔
431
}
11,661,387✔
432

433
template <class T>
434
void StringIndex::insert(ObjKey key, T value)
435
{
1,752,558✔
436
    StringConversionBuffer buffer;
1,752,558✔
437
    Mixed m(value);
1,752,558✔
438
    size_t offset = 0;                                      // First key from beginning of string
1,752,558✔
439
    insert_with_offset(key, m.get_index_data(buffer), m, offset); // Throws
1,752,558✔
440
}
1,752,558✔
441

442
template <>
443
void StringIndex::insert<StringData>(ObjKey key, StringData value);
444

445
template <class T>
446
void StringIndex::insert(ObjKey key, util::Optional<T> value)
447
{
184,695✔
448
    if (value) {
184,695✔
449
        insert(key, *value);
153,432✔
450
    }
153,432✔
451
    else {
31,263✔
452
        insert(key, null{});
31,263✔
453
    }
31,263✔
454
}
184,695✔
455

456
template <class T>
457
void StringIndex::set(ObjKey key, T new_value)
458
{
221,499✔
459
    Mixed old_value = get(key);
221,499✔
460
    Mixed new_value2 = Mixed(new_value);
221,499✔
461

110,637✔
462
    // Note that insert_with_offset() throws UniqueConstraintViolation.
110,637✔
463

110,637✔
464
    if (REALM_LIKELY(new_value2 != old_value)) {
221,499!
465
        // We must erase this row first because erase uses find_first which
100,638✔
466
        // might find the duplicate if we insert before erasing.
100,638✔
467
        erase(key); // Throws
200,976✔
468

100,638✔
469
        StringConversionBuffer buffer;
200,976✔
470
        size_t offset = 0;                               // First key from beginning of string
200,976✔
471
        auto index_data = new_value2.get_index_data(buffer);
200,976✔
472
        insert_with_offset(key, index_data, new_value2, offset); // Throws
200,976✔
473
    }
200,976✔
474
}
221,499✔
475

476
template <>
477
void StringIndex::set<StringData>(ObjKey key, StringData new_value);
478

479
template <class T>
480
void StringIndex::set(ObjKey key, util::Optional<T> new_value)
481
{
482
    if (new_value) {
483
        set(key, *new_value);
484
    }
485
    else {
486
        set(key, null{});
487
    }
488
}
489

490
template <class T>
491
ObjKey StringIndex::find_first(T value) const
492
{
1,230,840✔
493
    // Use direct access method
605,412✔
494
    return m_array->index_string_find_first(Mixed(value), m_target_column);
1,230,840✔
495
}
1,230,840✔
496

497
template <class T>
498
void StringIndex::find_all(std::vector<ObjKey>& result, T value, bool case_insensitive) const
499
{
34,826✔
500
    // Use direct access method
17,225✔
501
    return m_array->index_string_find_all(result, Mixed(value), m_target_column, case_insensitive);
34,826✔
502
}
34,826✔
503

504
template <class T>
505
FindRes StringIndex::find_all_no_copy(T value, InternalFindResult& result) const
506
{
657,014✔
507
    return m_array->index_string_find_all_no_copy(Mixed(value), m_target_column, result);
657,014✔
508
}
657,014✔
509

510
template <class T>
511
size_t StringIndex::count(T value) const
512
{
12,864✔
513
    // Use direct access method
6,432✔
514
    return m_array->index_string_count(Mixed(value), m_target_column);
12,864✔
515
}
12,864✔
516

517
inline Allocator& StringIndex::get_alloc() const noexcept
518
{
1,536✔
519
    return m_array->get_alloc();
1,536✔
520
}
1,536✔
521

522
inline void StringIndex::destroy() noexcept
523
{
14,673✔
524
    return m_array->destroy_deep();
14,673✔
525
}
14,673✔
526

527
inline bool StringIndex::is_attached() const noexcept
528
{
×
529
    return m_array->is_attached();
×
530
}
×
531

532
inline void StringIndex::refresh_accessor_tree(const ClusterColumn& target_column)
533
{
354,654✔
534
    m_array->init_from_parent();
354,654✔
535
    m_target_column = target_column;
354,654✔
536
}
354,654✔
537

538
inline ref_type StringIndex::get_ref() const noexcept
539
{
144,819✔
540
    return m_array->get_ref();
144,819✔
541
}
144,819✔
542

543
inline void StringIndex::set_parent(ArrayParent* parent, size_t ndx_in_parent) noexcept
544
{
6,989,013✔
545
    m_array->set_parent(parent, ndx_in_parent);
6,989,013✔
546
}
6,989,013✔
547

548
inline size_t StringIndex::get_ndx_in_parent() const noexcept
549
{
×
550
    return m_array->get_ndx_in_parent();
×
551
}
×
552

553
inline void StringIndex::set_ndx_in_parent(size_t ndx_in_parent) noexcept
554
{
×
555
    m_array->set_ndx_in_parent(ndx_in_parent);
×
556
}
×
557

558
inline void StringIndex::update_from_parent() noexcept
559
{
278,115✔
560
    m_array->update_from_parent();
278,115✔
561
}
278,115✔
562

563
} // namespace realm
564

565
#endif // REALM_INDEX_STRING_HPP
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc