1
78%
master: 80%

Ran 13 Jan 2025 11:18AM UTC

Files 37

Run time 1s

Badge

Embed ▾

Committed 13 Jan 2025 11:14AM UTC coverage: 78.273%. Remained the same

Job # 12746018289.1

Build Type

push

github

Committed by

web-flow

Commit Message

CP-52709: use timeslices shorter than 50ms (#6177)

# Changing the default OCaml thread switch timeslice from 50ms

The default OCaml 4.x timeslice for switching between threads is 50ms:
if there is more than 1 active OCaml threads each one is let to run up
to 50ms, and then (at various safepoints) it can switch to another
running thread.
When the runtime lock is released (and C code or syscalls run) then
another OCaml thread is immediately let to run if any.

However 50ms is too long, and it inserts large latencies into the
handling of API calls.

OTOH if a timeslice is too short then we waste CPU time:
* overhead of Thread.yield system call, and the cost of switching
threads at the OS level
* potentially higher L1/L2 cache misses if we switch on the same CPU
between multiple OCaml threads
* potentially losing branch predictor history
* potentially higher L3 cache misses (but on a hypervisor with VMs
running L3 will be mostly taken up by VMs anyway, we can only rely on
L1/L2 staying with us)

A microbenchmark has shown that timeslices as small as 0.5ms might
strike an optimal balance between latency and overhead: values lower
than that lose performance due to increased overhead, and values higher
than that lose performance due to increased latency:


![auto_p](https://github.com/user-attachments/assets/3751291b-8f64-4d70-9a65-9c3fdb053955)

![auto_pr](https://github.com/user-attachments/assets/3b710484-87ba-488a-9507-7916c85aab20)

(the microbenchmark measures the number of CPU cycles spent simulating
an API call with various working set sizes and timeslice settings)

This is all hardware dependent though, and a future PR will introduce an
autotune service that measures the yield overhead and L1/L2 cache refill
overhead and calculates an optimal timeslice for that particular
hardware/Xen/kernel combination.
(and while we're at it, we can also tweak the minor heap size to match
~half of CPU L2 cache).

# Times... (continued)

Run Details

3462 of 4423 relevant lines covered (78.27%)

0.78 hits per line

xapi-project / xen-api / 12746018289 / 1
78%
master: 80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job python3.11 - 12746018289.1

xapi-project / xen-api / 12746018289 / 1 78% master: 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job python3.11 - 12746018289.1

xapi-project / xen-api / 12746018289 / 1
78%
master: 80%

README BADGES
x