• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

idanmoradarthas / DataScienceUtils / 21079409936
100%

Build:
DEFAULT BRANCH: master
Ran 16 Jan 2026 08:11PM UTC
Jobs 12
Files 7
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

16 Jan 2026 08:09PM UTC coverage: 99.7% (-0.3%) from 100.0%
21079409936

push

github

web-flow
Refactor `append_tags_to_frame` to use `MultiLabelBinarizer` (#87)

* refactor: Use MultiLabelBinarizer in append_tags_to_frame

Replaced `CountVectorizer` with `MultiLabelBinarizer` in the `append_tags_to_frame` function.

The previous implementation used `CountVectorizer` with a custom tokenizer, which was not the most suitable tool for the task. `MultiLabelBinarizer` is a more direct and efficient choice for creating a binary matrix from pre-tokenized tags.

The refactoring includes manual implementations for the `min_df` and `max_features` parameters to ensure that the function's behavior remains identical to the original implementation.

- Replaced `CountVectorizer` with `MultiLabelBinarizer`.
- Manually implemented `min_df` filtering based on document frequency.
- Manually implemented `max_features` selection of the most frequent tags.
- Ensured that the function signature and all existing tests remain unchanged and pass.

* refactor: Use MultiLabelBinarizer in append_tags_to_frame

Replaced `CountVectorizer` with `MultiLabelBinarizer` in the `append_tags_to_frame` function and incorporated feedback from the code review.

The previous implementation used `CountVectorizer` with a custom tokenizer, which was not the most suitable tool for the task. `MultiLabelBinarizer` is a more direct and efficient choice for creating a binary matrix from pre-tokenized tags.

The refactoring includes manual implementations for the `min_df` and `max_features` parameters to ensure that the function's behavior remains identical to the original implementation.

Changes based on code review feedback:
- Added a clearer comment to the sorting logic for `max_features`.
- Deduplicated code by creating a `_prepare_tags` helper function.
- Added a new test case to handle the edge case where no tags are left after filtering.

- Replaced `CountVectorizer` with `MultiLabelBinarizer`.
- Manually implemented `min_df` filtering based on document frequency.
- Manually implemented... (continued)

31 of 33 new or added lines in 1 file covered. (93.94%)

665 of 667 relevant lines covered (99.7%)

11.96 hits per line

New Missed Lines in Diff

Lines Coverage ∆ File
2
96.67
-3.33% ds_utils/strings.py
Jobs
ID Job ID Ran Files Coverage
1 windows-latest-python-3.10 - 21079409936.1 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
2 ubuntu-latest-python-3.10 - 21079409936.2 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
3 macos-latest-python-3.13 - 21079409936.3 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
4 macos-latest-python-3.11 - 21079409936.4 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
5 ubuntu-latest-python-3.12 - 21079409936.5 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
6 windows-latest-python-3.13 - 21079409936.6 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
7 ubuntu-latest-python-3.11 - 21079409936.7 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
8 windows-latest-python-3.12 - 21079409936.8 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
9 macos-latest-python-3.10 - 21079409936.9 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
10 ubuntu-latest-python-3.13 - 21079409936.10 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
11 macos-latest-python-3.12 - 21079409936.11 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
12 windows-latest-python-3.11 - 21079409936.12 16 Jan 2026 08:14PM UTC 7
99.7
GitHub Action Run
Source Files on build 21079409936
  • Tree
  • List 7
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • 45b8e5c0 on github
  • Prev Build on master (#21043733466)
  • Next Build on master (#21175371642)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc