• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

xapi-project / xen-api / 12746018289
78%
master: 80%

Build:
Build:
LAST BUILD BRANCH: private/changleli/fix-xenops-cache
DEFAULT BRANCH: master
Ran 13 Jan 2025 11:15AM UTC
Jobs 1
Files 37
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

13 Jan 2025 11:14AM UTC coverage: 78.273%. Remained the same
12746018289

push

github

web-flow
CP-52709: use timeslices shorter than 50ms (#6177)

# Changing the default OCaml thread switch timeslice from 50ms

The default OCaml 4.x timeslice for switching between threads is 50ms:
if there is more than 1 active OCaml threads each one is let to run up
to 50ms, and then (at various safepoints) it can switch to another
running thread.
When the runtime lock is released (and C code or syscalls run) then
another OCaml thread is immediately let to run if any.

However 50ms is too long, and it inserts large latencies into the
handling of API calls.

OTOH if a timeslice is too short then we waste CPU time:
* overhead of Thread.yield system call, and the cost of switching
threads at the OS level
* potentially higher L1/L2 cache misses if we switch on the same CPU
between multiple OCaml threads
* potentially losing branch predictor history
* potentially higher L3 cache misses (but on a hypervisor with VMs
running L3 will be mostly taken up by VMs anyway, we can only rely on
L1/L2 staying with us)

A microbenchmark has shown that timeslices as small as 0.5ms might
strike an optimal balance between latency and overhead: values lower
than that lose performance due to increased overhead, and values higher
than that lose performance due to increased latency:


![auto_p](https://github.com/user-attachments/assets/3751291b-8f64-4d70-9a65-9c3fdb053955)

![auto_pr](https://github.com/user-attachments/assets/3b710484-87ba-488a-9507-7916c85aab20)

(the microbenchmark measures the number of CPU cycles spent simulating
an API call with various working set sizes and timeslice settings)

This is all hardware dependent though, and a future PR will introduce an
autotune service that measures the yield overhead and L1/L2 cache refill
overhead and calculates an optimal timeslice for that particular
hardware/Xen/kernel combination.
(and while we're at it, we can also tweak the minor heap size to match
~half of CPU L2 cache).

# Times... (continued)

3462 of 4423 relevant lines covered (78.27%)

0.78 hits per line

Jobs
ID Job ID Ran Files Coverage
1 python3.11 - 12746018289.1 13 Jan 2025 11:15AM UTC 37
78.27
GitHub Action Run
Source Files on build 12746018289
  • Tree
  • List 37
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #12746018289
  • 9c5c8dde on github
  • Prev Build on feature/perf (#12668871257)
  • Next Build on feature/perf (#12766698809)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc