• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

supabase / etl / 19900107178

03 Dec 2025 03:55PM UTC coverage: 82.008% (-0.4%) from 82.382%
19900107178

Pull #487

github

web-flow
Merge 5c2ab4c83 into eeef10c29
Pull Request #487: ref(allocator): Try to use jemalloc for etl-api and etl-replicator

0 of 91 new or added lines in 2 files covered. (0.0%)

2 existing lines in 1 file now uncovered.

16455 of 20065 relevant lines covered (82.01%)

181.74 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

0.0
/etl-replicator/src/jemalloc_metrics.rs
1
//! Jemalloc allocator metrics for Prometheus monitoring.
2
//!
3
//! Exposes jemalloc statistics as Prometheus gauges, enabling monitoring of memory
4
//! allocation patterns, fragmentation, and overall allocator health. Uses MIB-based
5
//! access for efficient repeated polling.
6
//!
7
//! # Interpreting the Metrics
8
//!
9
//! - **Healthy state**: `allocated` close to `active`, `active` close to `resident`.
10
//! - **Fragmentation**: A large gap between `allocated` and `resident` indicates
11
//!   overhead from page alignment, dirty pages, or metadata.
12
//! - **Memory pressure**: If `resident` approaches container limits while `allocated`
13
//!   is much lower, consider tuning decay settings or investigating allocation patterns.
14
//! - **Retained memory**: High `retained` is normal on 64-bit Linux. It represents
15
//!   virtual address space only - no physical memory cost.
16

17
use std::time::Duration;
18

19
use metrics::{Unit, describe_gauge, gauge};
20
use tikv_jemalloc_ctl::{epoch, stats};
21
use tracing::{debug, warn};
22

23
/// Total bytes allocated by the application and currently in use.
24
///
25
/// This is the sum of all active allocations made via `malloc`, `Box::new`, `Vec`, etc.
26
/// It represents the actual memory your application has requested and is using.
27
///
28
/// This is the most accurate measure of your application's memory footprint from the
29
/// application's perspective. Compare with `resident` to understand overhead.
30
const JEMALLOC_ALLOCATED_BYTES: &str = "jemalloc_allocated_bytes";
31

32
/// Total bytes in active pages allocated by the application.
33
///
34
/// Active pages are memory pages that jemalloc has dedicated to serving application
35
/// allocations. This is a multiple of the page size and includes both currently
36
/// allocated bytes and freed-but-not-returned bytes within those pages.
37
///
38
/// `active >= allocated` always (per jemalloc docs). The gap represents:
39
/// - Page alignment overhead (allocations rounded up to page boundaries).
40
/// - Internal fragmentation within pages still in use.
41
///
42
/// A large gap suggests many small allocations or size patterns that don't fit
43
/// jemalloc's size classes efficiently.
44
const JEMALLOC_ACTIVE_BYTES: &str = "jemalloc_active_bytes";
45

46
/// Maximum bytes in physically resident data pages mapped by the allocator.
47
///
48
/// Per jemalloc docs, this comprises all pages dedicated to:
49
/// - Allocator metadata.
50
/// - Pages backing active allocations.
51
/// - Unused dirty pages (freed but not yet returned to OS).
52
///
53
/// This is the physical RAM footprint that counts against container memory limits.
54
/// Note: This is a "maximum" because pages may not actually be resident if they
55
/// correspond to demand-zeroed virtual memory not yet touched.
56
///
57
/// **Important**: There is no guaranteed ordering between `resident` and `mapped`.
58
/// Resident can exceed mapped (due to dirty pages) or be less (unmaterialized pages).
59
///
60
/// Monitor this metric against your container memory limits to prevent OOMKilled.
61
const JEMALLOC_RESIDENT_BYTES: &str = "jemalloc_resident_bytes";
62

63
/// Total bytes in active extents mapped by the allocator.
64
///
65
/// This is virtual memory mapped via `mmap()` for active extents. Per jemalloc docs,
66
/// `mapped > active` always, but there is **no strict ordering with `resident`**.
67
///
68
/// Why no ordering with resident:
69
/// - `mapped` excludes inactive extents (even those with dirty pages).
70
/// - `resident` includes dirty pages but only counts physically-backed memory.
71
///
72
/// This metric helps understand virtual memory usage patterns. For container memory
73
/// limits, focus on `resident` instead since only physical memory is enforced.
74
const JEMALLOC_MAPPED_BYTES: &str = "jemalloc_mapped_bytes";
75

76
/// Total bytes in virtual memory mappings retained for future reuse.
77
///
78
/// When jemalloc returns memory to the OS, it can retain the virtual address mapping
79
/// (without physical pages) for faster reallocation later. This is controlled by
80
/// `opt.retain` (enabled by default on 64-bit Linux).
81
///
82
/// High `retained` values are normal and don't consume physical memory. This metric
83
/// helps understand jemalloc's virtual memory management but rarely requires action.
84
const JEMALLOC_RETAINED_BYTES: &str = "jemalloc_retained_bytes";
85

86
/// Total bytes dedicated to jemalloc metadata.
87
///
88
/// jemalloc maintains internal data structures for tracking allocations, arenas,
89
/// thread caches, extent maps, etc. This overhead scales with allocation count
90
/// and arena count, not allocation size.
91
///
92
/// Typical overhead is 1-3% of allocated memory. Higher ratios may indicate too
93
/// many small allocations or too many arenas (`narenas` setting).
94
const JEMALLOC_METADATA_BYTES: &str = "jemalloc_metadata_bytes";
95

96
/// Memory fragmentation ratio: `(resident - allocated) / resident`.
97
///
98
/// Measures how efficiently physical memory is being used:
99
/// - **0.0**: Perfect efficiency (all resident memory is allocated). Rare in practice.
100
/// - **0.1-0.3**: Healthy range for most workloads.
101
/// - **0.3-0.5**: Moderate fragmentation. Consider investigating if memory-constrained.
102
/// - **>0.5**: Significant fragmentation. May indicate allocation pattern issues or
103
///   need for decay tuning.
104
///
105
/// High fragmentation can occur from:
106
/// - Bursty allocation patterns (memory freed but decay hasn't run).
107
/// - Many small allocations with varying lifetimes.
108
/// - Size class mismatches (allocations don't fit jemalloc's size buckets well).
109
///
110
/// To reduce fragmentation: lower decay times, reduce `tcache_max`, or investigate
111
/// allocation patterns with jemalloc's heap profiling.
112
const JEMALLOC_FRAGMENTATION_RATIO: &str = "jemalloc_fragmentation_ratio";
113

114
/// Polling interval for jemalloc statistics.
115
const POLL_INTERVAL: Duration = Duration::from_secs(10);
116

117
/// Label key for pipeline identifier.
118
const PIPELINE_ID_LABEL: &str = "pipeline_id";
119

120
/// Label key for application type.
121
const APP_TYPE_LABEL: &str = "app_type";
122

123
/// Application type value for the replicator.
124
const APP_TYPE_VALUE: &str = "etl-replicator-app";
125

126
/// Registers jemalloc metric descriptions with the global metrics recorder.
NEW
127
fn register_metrics() {
×
NEW
128
    describe_gauge!(
×
129
        JEMALLOC_ALLOCATED_BYTES,
130
        Unit::Bytes,
131
        "Total bytes allocated by the application"
132
    );
NEW
133
    describe_gauge!(
×
134
        JEMALLOC_ACTIVE_BYTES,
135
        Unit::Bytes,
136
        "Total bytes in active pages allocated by the application"
137
    );
NEW
138
    describe_gauge!(
×
139
        JEMALLOC_RESIDENT_BYTES,
140
        Unit::Bytes,
141
        "Total bytes in physically resident data pages mapped by the allocator"
142
    );
NEW
143
    describe_gauge!(
×
144
        JEMALLOC_MAPPED_BYTES,
145
        Unit::Bytes,
146
        "Total bytes in active extents mapped by the allocator"
147
    );
NEW
148
    describe_gauge!(
×
149
        JEMALLOC_RETAINED_BYTES,
150
        Unit::Bytes,
151
        "Total bytes in virtual memory mappings retained for future reuse"
152
    );
NEW
153
    describe_gauge!(
×
154
        JEMALLOC_METADATA_BYTES,
155
        Unit::Bytes,
156
        "Total bytes dedicated to jemalloc metadata"
157
    );
NEW
158
    describe_gauge!(
×
159
        JEMALLOC_FRAGMENTATION_RATIO,
160
        Unit::Count,
161
        "Memory fragmentation ratio: (resident - allocated) / resident. Lower is better, >0.5 indicates significant fragmentation"
162
    );
NEW
163
}
×
164

165
/// Spawns a background task that periodically polls jemalloc statistics.
166
///
167
/// The task runs every 10 seconds and updates Prometheus gauges with current
168
/// allocator statistics. Uses MIB-based access for efficient repeated polling.
169
///
170
/// This function should be called after [`etl_telemetry::metrics::init_metrics`]
171
/// to ensure the metrics recorder is installed.
NEW
172
pub fn spawn_jemalloc_metrics_task(pipeline_id: u64) {
×
NEW
173
    register_metrics();
×
174

NEW
175
    let pipeline_id_str = pipeline_id.to_string();
×
176

NEW
177
    tokio::spawn(async move {
×
178
        // Initialize MIBs once for efficient repeated lookups.
179
        // MIBs translate string keys to numeric indices, avoiding string parsing on each read.
NEW
180
        let epoch_mib = match epoch::mib() {
×
NEW
181
            Ok(mib) => mib,
×
NEW
182
            Err(err) => {
×
NEW
183
                warn!("failed to initialize jemalloc epoch MIB: {err}");
×
NEW
184
                return;
×
185
            }
186
        };
NEW
187
        let allocated_mib = match stats::allocated::mib() {
×
NEW
188
            Ok(mib) => mib,
×
NEW
189
            Err(err) => {
×
NEW
190
                warn!("failed to initialize jemalloc allocated MIB: {err}");
×
NEW
191
                return;
×
192
            }
193
        };
NEW
194
        let active_mib = match stats::active::mib() {
×
NEW
195
            Ok(mib) => mib,
×
NEW
196
            Err(err) => {
×
NEW
197
                warn!("failed to initialize jemalloc active MIB: {err}");
×
NEW
198
                return;
×
199
            }
200
        };
NEW
201
        let resident_mib = match stats::resident::mib() {
×
NEW
202
            Ok(mib) => mib,
×
NEW
203
            Err(err) => {
×
NEW
204
                warn!("failed to initialize jemalloc resident MIB: {err}");
×
NEW
205
                return;
×
206
            }
207
        };
NEW
208
        let mapped_mib = match stats::mapped::mib() {
×
NEW
209
            Ok(mib) => mib,
×
NEW
210
            Err(err) => {
×
NEW
211
                warn!("failed to initialize jemalloc mapped MIB: {err}");
×
NEW
212
                return;
×
213
            }
214
        };
NEW
215
        let retained_mib = match stats::retained::mib() {
×
NEW
216
            Ok(mib) => mib,
×
NEW
217
            Err(err) => {
×
NEW
218
                warn!("failed to initialize jemalloc retained MIB: {err}");
×
NEW
219
                return;
×
220
            }
221
        };
NEW
222
        let metadata_mib = match stats::metadata::mib() {
×
NEW
223
            Ok(mib) => mib,
×
NEW
224
            Err(err) => {
×
NEW
225
                warn!("failed to initialize jemalloc metadata MIB: {err}");
×
NEW
226
                return;
×
227
            }
228
        };
229

230
        loop {
231
            // Advance epoch to refresh cached statistics.
NEW
232
            if let Err(err) = epoch_mib.advance() {
×
NEW
233
                warn!("failed to advance jemalloc epoch: {err}");
×
NEW
234
                tokio::time::sleep(POLL_INTERVAL).await;
×
NEW
235
                continue;
×
NEW
236
            }
×
237

238
            // Read all statistics.
NEW
239
            let allocated = allocated_mib.read().unwrap_or(0) as f64;
×
NEW
240
            let active = active_mib.read().unwrap_or(0) as f64;
×
NEW
241
            let resident = resident_mib.read().unwrap_or(0) as f64;
×
NEW
242
            let mapped = mapped_mib.read().unwrap_or(0) as f64;
×
NEW
243
            let retained = retained_mib.read().unwrap_or(0) as f64;
×
NEW
244
            let metadata = metadata_mib.read().unwrap_or(0) as f64;
×
245

246
            // Update gauges with pipeline_id and app_type labels.
NEW
247
            gauge!(
×
248
                JEMALLOC_ALLOCATED_BYTES,
NEW
249
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
250
                APP_TYPE_LABEL => APP_TYPE_VALUE,
251
            )
NEW
252
            .set(allocated);
×
NEW
253
            gauge!(
×
254
                JEMALLOC_ACTIVE_BYTES,
NEW
255
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
256
                APP_TYPE_LABEL => APP_TYPE_VALUE,
257
            )
NEW
258
            .set(active);
×
NEW
259
            gauge!(
×
260
                JEMALLOC_RESIDENT_BYTES,
NEW
261
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
262
                APP_TYPE_LABEL => APP_TYPE_VALUE,
263
            )
NEW
264
            .set(resident);
×
NEW
265
            gauge!(
×
266
                JEMALLOC_MAPPED_BYTES,
NEW
267
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
268
                APP_TYPE_LABEL => APP_TYPE_VALUE,
269
            )
NEW
270
            .set(mapped);
×
NEW
271
            gauge!(
×
272
                JEMALLOC_RETAINED_BYTES,
NEW
273
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
274
                APP_TYPE_LABEL => APP_TYPE_VALUE,
275
            )
NEW
276
            .set(retained);
×
NEW
277
            gauge!(
×
278
                JEMALLOC_METADATA_BYTES,
NEW
279
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
280
                APP_TYPE_LABEL => APP_TYPE_VALUE,
281
            )
NEW
282
            .set(metadata);
×
283

284
            // Calculate fragmentation ratio: (resident - allocated) / resident.
285
            // A ratio of 0 means no fragmentation, >0.5 indicates significant fragmentation.
NEW
286
            let fragmentation = if resident > 0.0 {
×
NEW
287
                (resident - allocated) / resident
×
288
            } else {
NEW
289
                0.0
×
290
            };
NEW
291
            gauge!(
×
292
                JEMALLOC_FRAGMENTATION_RATIO,
NEW
293
                PIPELINE_ID_LABEL => pipeline_id_str.clone(),
×
294
                APP_TYPE_LABEL => APP_TYPE_VALUE,
295
            )
NEW
296
            .set(fragmentation);
×
297

NEW
298
            debug!(
×
NEW
299
                allocated_mb = allocated / 1_048_576.0,
×
NEW
300
                resident_mb = resident / 1_048_576.0,
×
301
                fragmentation_ratio = fragmentation,
NEW
302
                "jemalloc stats updated"
×
303
            );
304

NEW
305
            tokio::time::sleep(POLL_INTERVAL).await;
×
306
        }
NEW
307
    });
×
NEW
308
}
×
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc