• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

grpc / grpc-java / #19720
89%

Build:
DEFAULT BRANCH: master
Ran 07 Mar 2025 06:47PM UTC
Jobs 1
Files 614
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

07 Mar 2025 06:33PM UTC coverage: 88.517% (+0.006%) from 88.511%
#19720

push

github

web-flow
xds: Fix cluster selection races when updating config selector

Listener2.onResult() doesn't require running in the sync context, so
when called from the sync context it is guaranteed not to do its
processing immediately (instead, it schedules work into the sync
context).

The code was doing an update dance: 1) update service config to add new
cluster, 2) update config selector to use new cluster, 3) update service
config to remove old clusters. But the onResult() wasn't being processed
immediately, so the actual execution order was 2, 1, 3 which has a small
window where RPCs will fail. But onResult2() does run immediately. And
since ca4819ac6, updateBalancingState() updates the picker immediately.

cleanUpRoutes() was also racy because it updated the routingConfig
before swapping to the new config selector, so RPCs could fail saying
there was no route instead of the useful error message. Even with the
opposite order, some RPCs may be executing the while loop of
selectConfig(), trying to acquire a cluster. The code unreffed the
clusters before updating the routingConfig, so those RPCs could go into
a tight loop until the routingConfig was updated. Also, once the
routingConfig was updated to EMPTY those RPCs would similarly
see the wrong error message. To give the correct error message,
selectConfig() must fail such RPCs directly, and once it can do that
there's no need to stop using the config selector in error cases. This
has the benefit of fewer moving parts and more consistent threading
among cases.

The added test was able to detect the race 2% of the time. The slower
the code/machine, the more reliable the test failed. ca4819ac6 along
with this commit reduced it to 0 failures in 1000 runs.

Discovered when investigating b/394850611

34573 of 39058 relevant lines covered (88.52%)

0.89 hits per line

Uncovered Existing Lines

Lines Coverage ∆ File
2
81.05
-2.11% ../okhttp/src/main/java/io/grpc/okhttp/ExceptionHandlingFrameWriter.java
3
95.07
-0.44% ../core/src/main/java/io/grpc/internal/RetriableStream.java
3
92.38
-1.43% ../xds/src/main/java/io/grpc/xds/client/ControlPlaneClient.java
15
95.21
0.02% ../xds/src/main/java/io/grpc/xds/XdsNameResolver.java
Jobs
ID Job ID Ran Files Coverage
1 #19720.1 07 Mar 2025 06:47PM UTC 614
88.52
Source Files on build #19720
  • Tree
  • List 614
  • Changed 9
  • Source Changed 0
  • Coverage Changed 9
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #19720
  • d82613a7 on github
  • Prev Build on master
  • Next Build on master
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc