• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

xapi-project / xen-api / 19861229644
80%
master: 80%

Build:
Build:
LAST BUILD BRANCH: xen420
DEFAULT BRANCH: master
Ran 02 Dec 2025 02:02PM UTC
Jobs 1
Files 34
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

02 Dec 2025 02:01PM UTC coverage: 80.459%. Remained the same
19861229644

push

github

web-flow
CA-420968: avoid large performance hit on small NUMA nodes (#6763)

NUMA optimized placement can have a large performance hit on machines
with small NUMA nodes and VMs with a large number of vCPUs. For example
a machine that has 2 sockets, which can run at most 32 vCPUs in a single
socket (NUMA node), and a VM with 32 vCPUs.

Usually Xen would try to spread the load across actual cores, and avoid
the hyperthread siblings (when the machine is sufficiently idle, or the
workload is bursty), e.g. using CPUs 0,2,4,etc.
But when NUMA placement is used all the vCPUs must be in the same NUMA
node. If that NUMA node doesn't have enough cores, then Xen will have no
choice but to use CPUs 0,1,2,3,etc.

Hyperthread siblings share resources, and if you try to use both at the
same time you get a big performance hit, depending on the workload.
We've also seen this previously with Xen's core-scheduling support
(which is off by default)

Avoid this by "requesting" `threads_per_core` times more vCPUs for each
VM, which will make the placement algorithm choose the next size up in
terms of NUMA nodes (i.e. instead of a single NUMA node use 2,3 as
needed, falling back to using all nodes if needed).

The potential gain from reducing memory latency with a NUMA optimized
placement (~20% on Intel Memory Latency Checker: Idle latency) is
outweighed by the potential loss due to reduced CPU capacity (40%-75% on
OpenSSL, POV-Ray, and OpenVINO), so this is the correct tradeoff.

If the NUMA node is large enough, or if the VMs have a small number of
vCPUs then we still try to use a single NUMA node as we did previously.

The performance difference can be reproduced and verified easily by
running `openssl speed -multi 32 rsa4096` on a 32 vCPU VM on a host that
has 2 NUMA nodes, with 32 PCPUs each, and 2 threads per core.

3504 of 4355 relevant lines covered (80.46%)

0.8 hits per line

Jobs
ID Job ID Ran Files Coverage
1 python3.11 - 19861229644.1 02 Dec 2025 02:02PM UTC 34
80.46
GitHub Action Run
Source Files on build 19861229644
  • Tree
  • List 34
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #19861229644
  • 28eeaeac on github
  • Prev Build on master (#19846213588)
  • Next Build on gh-readonly-queue/master/pr-6763-62f962b20ed42e0ed792abc624be689c7cf1a3bf (#19861229786)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc