• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

idanmoradarthas / DataScienceUtils / 21079409936 / 8
100%
master: 100%

Build:
DEFAULT BRANCH: master
Ran 16 Jan 2026 08:14PM UTC
Files 7
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

16 Jan 2026 08:09PM UTC coverage: 99.7% (-0.3%) from 100.0%
21079409936.8

push

github

web-flow
Refactor `append_tags_to_frame` to use `MultiLabelBinarizer` (#87)

* refactor: Use MultiLabelBinarizer in append_tags_to_frame

Replaced `CountVectorizer` with `MultiLabelBinarizer` in the `append_tags_to_frame` function.

The previous implementation used `CountVectorizer` with a custom tokenizer, which was not the most suitable tool for the task. `MultiLabelBinarizer` is a more direct and efficient choice for creating a binary matrix from pre-tokenized tags.

The refactoring includes manual implementations for the `min_df` and `max_features` parameters to ensure that the function's behavior remains identical to the original implementation.

- Replaced `CountVectorizer` with `MultiLabelBinarizer`.
- Manually implemented `min_df` filtering based on document frequency.
- Manually implemented `max_features` selection of the most frequent tags.
- Ensured that the function signature and all existing tests remain unchanged and pass.

* refactor: Use MultiLabelBinarizer in append_tags_to_frame

Replaced `CountVectorizer` with `MultiLabelBinarizer` in the `append_tags_to_frame` function and incorporated feedback from the code review.

The previous implementation used `CountVectorizer` with a custom tokenizer, which was not the most suitable tool for the task. `MultiLabelBinarizer` is a more direct and efficient choice for creating a binary matrix from pre-tokenized tags.

The refactoring includes manual implementations for the `min_df` and `max_features` parameters to ensure that the function's behavior remains identical to the original implementation.

Changes based on code review feedback:
- Added a clearer comment to the sorting logic for `max_features`.
- Deduplicated code by creating a `_prepare_tags` helper function.
- Added a new test case to handle the edge case where no tags are left after filtering.

- Replaced `CountVectorizer` with `MultiLabelBinarizer`.
- Manually implemented `min_df` filtering based on document frequency.
- Manually implemented... (continued)

665 of 667 relevant lines covered (99.7%)

1.0 hits per line

Source Files on job windows-latest-python-3.12 - 21079409936.8
  • Tree
  • List 7
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 21079409936
  • 45b8e5c0 on github
  • Prev Job for on master (#21043733466.2)
  • Next Job for on master (#21175371642.2)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc