19861229644
80%
master: 80%

Ran 02 Dec 2025 02:02PM UTC

Jobs 1

Files 34

Run time 1min

Badge

Embed ▾

Committed 02 Dec 2025 02:01PM UTC coverage: 80.459%. Remained the same

Build # 19861229644

Build Type

push

github

Committed by

web-flow

Commit Message

CA-420968: avoid large performance hit on small NUMA nodes (#6763)

NUMA optimized placement can have a large performance hit on machines
with small NUMA nodes and VMs with a large number of vCPUs. For example
a machine that has 2 sockets, which can run at most 32 vCPUs in a single
socket (NUMA node), and a VM with 32 vCPUs.

Usually Xen would try to spread the load across actual cores, and avoid
the hyperthread siblings (when the machine is sufficiently idle, or the
workload is bursty), e.g. using CPUs 0,2,4,etc.
But when NUMA placement is used all the vCPUs must be in the same NUMA
node. If that NUMA node doesn't have enough cores, then Xen will have no
choice but to use CPUs 0,1,2,3,etc.

Hyperthread siblings share resources, and if you try to use both at the
same time you get a big performance hit, depending on the workload.
We've also seen this previously with Xen's core-scheduling support
(which is off by default)

Avoid this by "requesting" `threads_per_core` times more vCPUs for each
VM, which will make the placement algorithm choose the next size up in
terms of NUMA nodes (i.e. instead of a single NUMA node use 2,3 as
needed, falling back to using all nodes if needed).

The potential gain from reducing memory latency with a NUMA optimized
placement (~20% on Intel Memory Latency Checker: Idle latency) is
outweighed by the potential loss due to reduced CPU capacity (40%-75% on
OpenSSL, POV-Ray, and OpenVINO), so this is the correct tradeoff.

If the NUMA node is large enough, or if the VMs have a small number of
vCPUs then we still try to use a single NUMA node as we did previously.

The performance difference can be reproduced and verified easily by
running `openssl speed -multi 32 rsa4096` on a 32 vCPU VM on a host that
has 2 NUMA nodes, with 32 PCPUs each, and 2 threads per core.

Run Details

3504 of 4355 relevant lines covered (80.46%)

0.8 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
1	python3.11 - 19861229644.1	02 Dec 2025 02:02PM UTC	34	80.46	GitHub Action Run

xapi-project / xen-api / 19861229644
80%
master: 80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 19861229644

xapi-project / xen-api / 19861229644 80% master: 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 19861229644

xapi-project / xen-api / 19861229644
80%
master: 80%

README BADGES
x