• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pantsbuild / pants / 19529437518

20 Nov 2025 07:44AM UTC coverage: 78.884% (-1.4%) from 80.302%
19529437518

push

github

web-flow
nfpm.native_libs: Add RPM package depends from packaged pex_binaries (#22899)

## PR Series Overview

This is the second in a series of PRs that introduces a new backend:
`pants.backend.npm.native_libs`
Initially, the backend will be available as:
`pants.backend.experimental.nfpm.native_libs`

I proposed this new backend (originally named `bindeps`) in discussion
#22396.

This backend will inspect ELF bin/lib files (like `lib*.so`) in packaged
contents (for this PR series, only in `pex_binary` targets) to identify
package dependency metadata and inject that metadata on the relevant
`nfpm_deb_package` or `nfpm_rpm_package` targets. Effectively, it will
provide an approximation of these native packager features:
- `rpm`: `rpmdeps` + `elfdeps`
- `deb`: `dh_shlibdeps` + `dpkg-shlibdeps` (These substitute
`${shlibs:Depends}` in debian control files have)

### Goal: Host-agnostic package builds

This pants backend is designed to be host-agnostic, like
[nFPM](https://nfpm.goreleaser.com/).

Native packaging tools are often restricted to a single release of a
single distro. Unlike native package builders, this new pants backend
does not use any of those distro-specific or distro-release-specific
utilities or local package databases. This new backend should be able to
run (help with building deb and rpm packages) anywhere that pants can
run (MacOS, rpm linux distros, deb linux distros, other linux distros,
docker, ...).

### Previous PRs in series

- #22873

## PR Overview

This PR adds rules in `nfpm.native_libs` to add package dependency
metadata to `nfpm_rpm_package`. The 2 new rules are:

- `inject_native_libs_dependencies_in_package_fields`:

    - An implementation of the polymorphic rule `inject_nfpm_package_fields`.
      This rule is low priority (`priority = 2`) so that in-repo plugins can
      override/augment what it injects. (See #22864)

    - Rule logic overview:
        - find any pex_binaries that will be packaged in an `nfpm_rpm_package`
   ... (continued)

96 of 118 new or added lines in 3 files covered. (81.36%)

910 existing lines in 53 files now uncovered.

73897 of 93678 relevant lines covered (78.88%)

3.21 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

58.7
/src/python/pants/backend/tools/semgrep/rules.py
1
# Copyright 2023 Pants project contributors (see CONTRIBUTORS.md).
2
# Licensed under the Apache License, Version 2.0 (see LICENSE).
3
from __future__ import annotations
4✔
4

5
import itertools
4✔
6
import logging
4✔
7
from collections import defaultdict
4✔
8
from collections.abc import Iterable
4✔
9
from dataclasses import dataclass
4✔
10
from pathlib import PurePath
4✔
11

12
from pants.backend.python.util_rules import pex
4✔
13
from pants.backend.python.util_rules.pex import VenvPexProcess, create_venv_pex
4✔
14
from pants.core.goals.lint import LintResult, LintTargetsRequest
4✔
15
from pants.core.util_rules.partitions import Partition, Partitions
4✔
16
from pants.core.util_rules.source_files import SourceFilesRequest, determine_source_files
4✔
17
from pants.engine.addresses import Address
4✔
18
from pants.engine.fs import CreateDigest, FileContent, MergeDigests, PathGlobs, Paths
4✔
19
from pants.engine.intrinsics import (
4✔
20
    create_digest,
21
    digest_to_snapshot,
22
    execute_process,
23
    merge_digests,
24
    path_globs_to_paths,
25
)
26
from pants.engine.process import ProcessCacheScope
4✔
27
from pants.engine.rules import Rule, collect_rules, concurrently, implicitly, rule
4✔
28
from pants.engine.unions import UnionRule
4✔
29
from pants.option.global_options import GlobalOptions
4✔
30
from pants.util.logging import LogLevel
4✔
31
from pants.util.strutil import pluralize
4✔
32

33
from .subsystem import SemgrepFieldSet, SemgrepSubsystem
4✔
34

35
logger = logging.getLogger(__name__)
4✔
36

37

38
_SEMGREPIGNORE_FILE_NAME = ".semgrepignore"
4✔
39
_DEFAULT_SEMGREP_CONFIG_DIR = ".semgrep"
4✔
40

41

42
class SemgrepLintRequest(LintTargetsRequest):
4✔
43
    field_set_type = SemgrepFieldSet
4✔
44
    tool_subsystem = SemgrepSubsystem  # type: ignore[assignment]
4✔
45

46

47
@dataclass(frozen=True)
4✔
48
class PartitionMetadata:
4✔
49
    config_files: frozenset[PurePath]
4✔
50

51
    @property
4✔
52
    def description(self) -> str:
4✔
53
        return ", ".join(sorted(str(path) for path in self.config_files))
×
54

55

56
@dataclass
4✔
57
class AllSemgrepConfigs:
4✔
58
    configs_by_dir: dict[PurePath, set[PurePath]]
4✔
59

60
    def ancestor_configs(self, address: Address) -> Iterable[PurePath]:
4✔
61
        # TODO: introspect the semgrep rules and determine which (if any) apply to the files, e.g. a
62
        # Python file shouldn't depend on a .semgrep.yml that doesn't have any 'python' or 'generic'
63
        # rules, and similarly if there's path inclusions/exclusions.
64
        # TODO: this would be better as actual dependency inference (e.g. allows inspection, manual
65
        # addition/exclusion), but that can only infer 'full' dependencies and it is wrong (e.g. JVM
66
        # things break) for real code files to depend on this sort of non-code linter config; requires
67
        # dependency scopes or similar (https://github.com/pantsbuild/pants/issues/12794)
UNCOV
68
        spec = PurePath(address.spec_path)
×
69

UNCOV
70
        for ancestor in itertools.chain([spec], spec.parents):
×
UNCOV
71
            yield from self.configs_by_dir.get(ancestor, [])
×
72

73

74
def _group_by_semgrep_dir(
4✔
75
    all_config_files: Paths, all_config_dir_files: Paths, config_name: str
76
) -> AllSemgrepConfigs:
UNCOV
77
    configs_by_dir: dict[PurePath, set[PurePath]] = {}
×
UNCOV
78
    for config_path in all_config_files.files:
×
79
        # Rules like foo/semgrep.yaml should apply to the project at foo/
UNCOV
80
        path = PurePath(config_path)
×
UNCOV
81
        configs_by_dir.setdefault(path.parent, set()).add(path)
×
82

UNCOV
83
    for config_path in all_config_dir_files.files:
×
84
        # Rules like foo/bar/.semgrep/baz.yaml and foo/bar/.semgrep/baz/qux.yaml should apply to the
85
        # project at foo/bar/
UNCOV
86
        path = PurePath(config_path)
×
UNCOV
87
        config_directory = next(
×
88
            parent.parent for parent in path.parents if parent.name == config_name
89
        )
UNCOV
90
        configs_by_dir.setdefault(config_directory, set()).add(path)
×
91

UNCOV
92
    return AllSemgrepConfigs(configs_by_dir)
×
93

94

95
@rule
4✔
96
async def find_all_semgrep_configs(semgrep: SemgrepSubsystem) -> AllSemgrepConfigs:
4✔
97
    config_file_globs: tuple[str, ...] = ()
×
98
    config_dir_globs: tuple[str, ...] = ()
×
99

100
    if semgrep.config_name is None:
×
101
        config_file_globs = ("**/.semgrep.yml", "**/.semgrep.yaml")
×
102
        config_dir_globs = (
×
103
            f"**/{_DEFAULT_SEMGREP_CONFIG_DIR}/**/*.yaml",
104
            f"**/{_DEFAULT_SEMGREP_CONFIG_DIR}/**/*.yml",
105
        )
106
    elif semgrep.config_name.endswith((".yaml", ".yml")):
×
107
        config_file_globs = (f"**/{semgrep.config_name}",)
×
108
    else:
109
        config_dir_globs = (
×
110
            f"**/{semgrep.config_name}/**/*.yaml",
111
            f"**/{semgrep.config_name}/**/*.yml",
112
        )
113

114
    all_config_files = await path_globs_to_paths(PathGlobs(config_file_globs))
×
115
    all_config_dir_files = await path_globs_to_paths(PathGlobs(config_dir_globs))
×
116
    return _group_by_semgrep_dir(
×
117
        all_config_files,
118
        all_config_dir_files,
119
        (semgrep.config_name or _DEFAULT_SEMGREP_CONFIG_DIR),
120
    )
121

122

123
@dataclass(frozen=True)
4✔
124
class RelevantSemgrepConfigsRequest:
4✔
125
    field_set: SemgrepFieldSet
4✔
126

127

128
class RelevantSemgrepConfigs(frozenset[PurePath]):
4✔
129
    pass
4✔
130

131

132
@rule
4✔
133
async def infer_relevant_semgrep_configs(
4✔
134
    request: RelevantSemgrepConfigsRequest, all_semgrep: AllSemgrepConfigs
135
) -> RelevantSemgrepConfigs:
136
    return RelevantSemgrepConfigs(all_semgrep.ancestor_configs(request.field_set.address))
×
137

138

139
@rule
4✔
140
async def partition(
4✔
141
    request: SemgrepLintRequest.PartitionRequest[SemgrepFieldSet],
142
    semgrep: SemgrepSubsystem,
143
) -> Partitions:
144
    if semgrep.skip:
×
145
        return Partitions()
×
146

147
    all_configs = await concurrently(
×
148
        infer_relevant_semgrep_configs(RelevantSemgrepConfigsRequest(field_set), **implicitly())
149
        for field_set in request.field_sets
150
    )
151

152
    # partition by the sets of configs that apply to each input
153
    by_config = defaultdict(list)
×
154
    for field_set, configs in zip(request.field_sets, all_configs):
×
155
        if configs:
×
156
            by_config[configs].append(field_set)
×
157

158
    return Partitions(
×
159
        Partition(tuple(field_sets), PartitionMetadata(configs))
160
        for configs, field_sets in by_config.items()
161
    )
162

163

164
# We have a hard-coded settings file to side-step
165
# https://github.com/returntocorp/semgrep/issues/7102, and also provide more cacheability, NB. both
166
# keys are required.
167
_DEFAULT_SETTINGS = FileContent(
4✔
168
    path="__semgrep_settings.yaml",
169
    content=b"anonymous_user_id: 00000000-0000-0000-0000-000000000000\nhas_shown_metrics_notification: true",
170
)
171

172

173
@rule(desc="Lint with Semgrep", level=LogLevel.DEBUG)
4✔
174
async def lint(
4✔
175
    request: SemgrepLintRequest.Batch[SemgrepFieldSet, PartitionMetadata],
176
    semgrep: SemgrepSubsystem,
177
    global_options: GlobalOptions,
178
) -> LintResult:
179
    config_files, ignore_files, semgrep_pex, input_files, settings = await concurrently(
×
180
        digest_to_snapshot(
181
            **implicitly(PathGlobs(str(s) for s in request.partition_metadata.config_files))
182
        ),
183
        digest_to_snapshot(**implicitly(PathGlobs([_SEMGREPIGNORE_FILE_NAME]))),
184
        create_venv_pex(**implicitly(semgrep.to_pex_request())),
185
        determine_source_files(
186
            SourceFilesRequest(field_set.source for field_set in request.elements)
187
        ),
188
        create_digest(CreateDigest([_DEFAULT_SETTINGS])),
189
    )
190

191
    input_digest = await merge_digests(
×
192
        MergeDigests(
193
            (
194
                input_files.snapshot.digest,
195
                config_files.digest,
196
                settings,
197
                ignore_files.digest,
198
            )
199
        )
200
    )
201

202
    cache_scope = ProcessCacheScope.PER_SESSION if semgrep.force else ProcessCacheScope.SUCCESSFUL
×
203

204
    # TODO: https://github.com/pantsbuild/pants/issues/18430 support running this with --autofix
205
    # under the fix goal... but not all rules have fixes, so we need to be running with
206
    # --error/checking exit codes, which FixResult doesn't currently support.
207
    result = await execute_process(
×
208
        **implicitly(
209
            VenvPexProcess(
210
                semgrep_pex,
211
                argv=(
212
                    "scan",
213
                    *(f"--config={f}" for f in config_files.files),
214
                    "--jobs={pants_concurrency}",
215
                    "--error",
216
                    *semgrep.args,
217
                    # we don't pass the target files directly because that overrides .semgrepignore
218
                    # (https://github.com/returntocorp/semgrep/issues/4978), so instead we just tell its
219
                    # traversal to include all the source files in this partition. Unfortunately this
220
                    # include is implicitly unrooted (i.e. as if it was **/path/to/file), and so may
221
                    # pick up other files if the names match. The highest risk of this is within the
222
                    # semgrep PEX.
223
                    *(f"--include={f}" for f in input_files.files),
224
                    f"--exclude={semgrep_pex.pex_filename}",
225
                ),
226
                extra_env={
227
                    "SEMGREP_FORCE_COLOR": "true",
228
                    # disable various global state/network requests
229
                    "SEMGREP_SETTINGS_FILE": _DEFAULT_SETTINGS.path,
230
                    "SEMGREP_ENABLE_VERSION_CHECK": "0",
231
                    "SEMGREP_SEND_METRICS": "off",
232
                },
233
                input_digest=input_digest,
234
                concurrency_available=len(input_files.files),
235
                description=f"Run Semgrep on {pluralize(len(input_files.files), 'file')}.",
236
                level=LogLevel.DEBUG,
237
                cache_scope=cache_scope,
238
            )
239
        )
240
    )
241

242
    return LintResult.create(request, result, output_simplifier=global_options.output_simplifier())
×
243

244

245
def rules() -> Iterable[Rule | UnionRule]:
4✔
246
    return [*collect_rules(), *SemgrepLintRequest.rules(), *pex.rules()]
3✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc