• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

deepset-ai / haystack / 21899435517

11 Feb 2026 09:23AM UTC coverage: 92.636% (-0.003%) from 92.639%
21899435517

push

github

web-flow
feat: MarkdownHeaderSplitter (#9660)

* implement md-header-splitter and add tests

* rework md-header splitter to rewrite md-header levels

* remove deprecated test

* Update haystack/components/preprocessors/markdown_header_splitter.py

use haystack logging

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* use native types

* move to haystack logging

* docstrings improvements

* Update haystack/components/preprocessors/markdown_header_splitter.py

remove temp toc

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* fix CustomDocumentSplitter arguments

* remove header prefix from content

* rework split_id assignment to avoid collisions

* remove unneeded dese methods

* cleanup

* cleanup

* add tests

cleanup

* move initialization of secondary-splitter out of run method

* move _custom_document_splitter to class method

* removed the _CustomDocumentSplitter class. splitting logic is now encapsulated within the MarkdownHeaderSplitter class as private methods.

* return to standard feed-forward character and add tests for page break handling

* quit exposing splitting_function param since it shouldn't be changed anyway

* remove test section in module

* add license header

* add release note

* minor refactor for type safety

* Update haystack/components/preprocessors/markdown_header_splitter.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* remove unneeded release notes entries

* improved documentation for methods

* improve method naming

* improved page-number assignment & added return in docstring

minor cleanup

* unified page-counting

* simplify conditional secondary-split initialization and usage

* fix linting error

* clearly specify the use of ATX-style headers (#) only

* reference doc_id when logging no headers found

* initialize md-header pattern as private variable once

* add example to for inferring header levels to docstring

* imp... (continued)

15183 of 16390 relevant lines covered (92.64%)

0.93 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

99.44
haystack/core/component/component.py


Source Not Available

STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc