1
80%
master: 80%

Ran 02 Dec 2025 02:13PM UTC

Files 34

Run time 2s

Badge

Embed ▾

Committed 02 Dec 2025 02:01PM UTC coverage: 80.459%. Remained the same

Job # 19861480446.1

Build Type

push

github

Committed by

web-flow

Commit Message

CA-420968: avoid large performance hit on small NUMA nodes (#6763)

NUMA optimized placement can have a large performance hit on machines
with small NUMA nodes and VMs with a large number of vCPUs. For example
a machine that has 2 sockets, which can run at most 32 vCPUs in a single
socket (NUMA node), and a VM with 32 vCPUs.

Usually Xen would try to spread the load across actual cores, and avoid
the hyperthread siblings (when the machine is sufficiently idle, or the
workload is bursty), e.g. using CPUs 0,2,4,etc.
But when NUMA placement is used all the vCPUs must be in the same NUMA
node. If that NUMA node doesn't have enough cores, then Xen will have no
choice but to use CPUs 0,1,2,3,etc.

Hyperthread siblings share resources, and if you try to use both at the
same time you get a big performance hit, depending on the workload.
We've also seen this previously with Xen's core-scheduling support
(which is off by default)

Avoid this by "requesting" `threads_per_core` times more vCPUs for each
VM, which will make the placement algorithm choose the next size up in
terms of NUMA nodes (i.e. instead of a single NUMA node use 2,3 as
needed, falling back to using all nodes if needed).

The potential gain from reducing memory latency with a NUMA optimized
placement (~20% on Intel Memory Latency Checker: Idle latency) is
outweighed by the potential loss due to reduced CPU capacity (40%-75% on
OpenSSL, POV-Ray, and OpenVINO), so this is the correct tradeoff.

If the NUMA node is large enough, or if the VMs have a small number of
vCPUs then we still try to use a single NUMA node as we did previously.

The performance difference can be reproduced and verified easily by
running `openssl speed -multi 32 rsa4096` on a 32 vCPU VM on a host that
has 2 NUMA nodes, with 32 PCPUs each, and 2 threads per core.

Run Details

3504 of 4355 relevant lines covered (80.46%)

0.8 hits per line

xapi-project / xen-api / 19861480446 / 1
80%
master: 80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job python3.11 - 19861480446.1

xapi-project / xen-api / 19861480446 / 1 80% master: 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job python3.11 - 19861480446.1

xapi-project / xen-api / 19861480446 / 1
80%
master: 80%

README BADGES
x