• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

xapi-project / xen-api / 12746018289 / 1
78%
master: 80%

Build:
Build:
LAST BUILD BRANCH: dev/pau/majmin
DEFAULT BRANCH: master
Ran 13 Jan 2025 11:18AM UTC
Files 37
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

13 Jan 2025 11:14AM UTC coverage: 78.273%. Remained the same
12746018289.1

push

github

web-flow
CP-52709: use timeslices shorter than 50ms (#6177)

# Changing the default OCaml thread switch timeslice from 50ms

The default OCaml 4.x timeslice for switching between threads is 50ms:
if there is more than 1 active OCaml threads each one is let to run up
to 50ms, and then (at various safepoints) it can switch to another
running thread.
When the runtime lock is released (and C code or syscalls run) then
another OCaml thread is immediately let to run if any.

However 50ms is too long, and it inserts large latencies into the
handling of API calls.

OTOH if a timeslice is too short then we waste CPU time:
* overhead of Thread.yield system call, and the cost of switching
threads at the OS level
* potentially higher L1/L2 cache misses if we switch on the same CPU
between multiple OCaml threads
* potentially losing branch predictor history
* potentially higher L3 cache misses (but on a hypervisor with VMs
running L3 will be mostly taken up by VMs anyway, we can only rely on
L1/L2 staying with us)

A microbenchmark has shown that timeslices as small as 0.5ms might
strike an optimal balance between latency and overhead: values lower
than that lose performance due to increased overhead, and values higher
than that lose performance due to increased latency:


![auto_p](https://github.com/user-attachments/assets/3751291b-8f64-4d70-9a65-9c3fdb053955)

![auto_pr](https://github.com/user-attachments/assets/3b710484-87ba-488a-9507-7916c85aab20)

(the microbenchmark measures the number of CPU cycles spent simulating
an API call with various working set sizes and timeslice settings)

This is all hardware dependent though, and a future PR will introduce an
autotune service that measures the yield overhead and L1/L2 cache refill
overhead and calculates an optimal timeslice for that particular
hardware/Xen/kernel combination.
(and while we're at it, we can also tweak the minor heap size to match
~half of CPU L2 cache).

# Times... (continued)

3462 of 4423 relevant lines covered (78.27%)

0.78 hits per line

Source Files on job python3.11 - 12746018289.1
  • Tree
  • List 37
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 12746018289
  • 9c5c8dde on github
  • Prev Job for on feature/perf (#12668871257.1)
  • Next Job for on feature/perf (#12766698809.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc