• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

ml6team / fondant / 7486960489 / 1
91%
main: 91%

Build:
DEFAULT BRANCH: main
Ran 11 Jan 2024 09:51AM UTC
Files 19
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

11 Jan 2024 09:47AM UTC coverage: 91.836%. Remained the same
7486960489.1

push

github

web-flow
Add load from pdf component (#765)

Fixes https://github.com/ml6team/fondant-use-cases/issues/54

PR that adds the functionality to load pdf documents from different
local and remote storage.

The implementation differs from the suggested solution at
[#54](https://github.com/ml6team/fondant-use-cases/issues/54) since:
* Accumulating different loaders and loading each document individually
seems to be inefficient since it would require the initialization of a
client, temp storage, ... on every invocation
[link](https://github.com/langchain-ai/langchain/blob/04caf07de/libs/community/langchain_community/document_loaders/gcs_file.py#L62)
* The langchain cloud loaders don't have a unified interface
* Each would requires specific arguments to be passed (in contrast
fsspec is much simpler)
* Only the google loader enables defining a custom loader class, the
rest uses the `Unstructured` loader which requires a lot of system and
cuda dependencies to have it installed (a lot of overhead for just
loading pdfs)

The current implementation relies on copying the pdfs to a temporary
local storage and loading them using the `PyPDFDirectoryLoader`, they
are then loaded lazily. The assumption for now is that the loaded docs
won't exceed the storage of the device which should be valid for most
use cases. Later on, we can think on how to optimize this further.

1766 of 1923 relevant lines covered (91.84%)

0.92 hits per line

Source Files on job test-3.10 - 7486960489.1
  • Tree
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 7486960489
  • b422fc31 on github
  • Prev Job for on main (#7486899348.3)
  • Next Job for on main (#7488843898.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc