• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

xapi-project / xen-api / 19861480446 / 1
80%
master: 80%

Build:
DEFAULT BRANCH: master
Ran 02 Dec 2025 02:13PM UTC
Files 34
Run time 2s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

02 Dec 2025 02:01PM UTC coverage: 80.459%. Remained the same
19861480446.1

push

github

web-flow
CA-420968: avoid large performance hit on small NUMA nodes (#6763)

NUMA optimized placement can have a large performance hit on machines
with small NUMA nodes and VMs with a large number of vCPUs. For example
a machine that has 2 sockets, which can run at most 32 vCPUs in a single
socket (NUMA node), and a VM with 32 vCPUs.

Usually Xen would try to spread the load across actual cores, and avoid
the hyperthread siblings (when the machine is sufficiently idle, or the
workload is bursty), e.g. using CPUs 0,2,4,etc.
But when NUMA placement is used all the vCPUs must be in the same NUMA
node. If that NUMA node doesn't have enough cores, then Xen will have no
choice but to use CPUs 0,1,2,3,etc.

Hyperthread siblings share resources, and if you try to use both at the
same time you get a big performance hit, depending on the workload.
We've also seen this previously with Xen's core-scheduling support
(which is off by default)

Avoid this by "requesting" `threads_per_core` times more vCPUs for each
VM, which will make the placement algorithm choose the next size up in
terms of NUMA nodes (i.e. instead of a single NUMA node use 2,3 as
needed, falling back to using all nodes if needed).

The potential gain from reducing memory latency with a NUMA optimized
placement (~20% on Intel Memory Latency Checker: Idle latency) is
outweighed by the potential loss due to reduced CPU capacity (40%-75% on
OpenSSL, POV-Ray, and OpenVINO), so this is the correct tradeoff.

If the NUMA node is large enough, or if the VMs have a small number of
vCPUs then we still try to use a single NUMA node as we did previously.

The performance difference can be reproduced and verified easily by
running `openssl speed -multi 32 rsa4096` on a 32 vCPU VM on a host that
has 2 NUMA nodes, with 32 PCPUs each, and 2 threads per core.

3504 of 4355 relevant lines covered (80.46%)

0.8 hits per line

Source Files on job python3.11 - 19861480446.1
  • Tree
  • List 34
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 19861480446
  • 28eeaeac on github
  • Prev Job for on master (#19846213588.1)
  • Next Job for on master (#19863657655.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc