• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

ml6team / fondant / 7486960489
91%

Build:
DEFAULT BRANCH: main
Ran 11 Jan 2024 09:50AM UTC
Jobs 3
Files 19
Run time 3s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

11 Jan 2024 09:47AM UTC coverage: 91.94%. Remained the same
7486960489

push

github

web-flow
Add load from pdf component (#765)

Fixes https://github.com/ml6team/fondant-use-cases/issues/54

PR that adds the functionality to load pdf documents from different
local and remote storage.

The implementation differs from the suggested solution at
[#54](https://github.com/ml6team/fondant-use-cases/issues/54) since:
* Accumulating different loaders and loading each document individually
seems to be inefficient since it would require the initialization of a
client, temp storage, ... on every invocation
[link](https://github.com/langchain-ai/langchain/blob/04caf07de/libs/community/langchain_community/document_loaders/gcs_file.py#L62)
* The langchain cloud loaders don't have a unified interface
* Each would requires specific arguments to be passed (in contrast
fsspec is much simpler)
* Only the google loader enables defining a custom loader class, the
rest uses the `Unstructured` loader which requires a lot of system and
cuda dependencies to have it installed (a lot of overhead for just
loading pdfs)

The current implementation relies on copying the pdfs to a temporary
local storage and loading them using the `PyPDFDirectoryLoader`, they
are then loaded lazily. The assumption for now is that the loaded docs
won't exceed the storage of the device which should be valid for most
use cases. Later on, we can think on how to optimize this further.

1768 of 1923 relevant lines covered (91.94%)

2.75 hits per line

Jobs
ID Job ID Ran Files Coverage
1 test-3.10 - 7486960489.1 11 Jan 2024 09:51AM UTC 0
91.84
2 test-3.9 - 7486960489.2 11 Jan 2024 09:51AM UTC 0
91.83
3 test-3.8 - 7486960489.3 11 Jan 2024 09:51AM UTC 0
91.93
Source Files on build 7486960489
Detailed source file information is not available for this build.
  • Back to Repo
  • b422fc31 on github
  • Prev Build on main (#7486899348)
  • Next Build on main (#7488843898)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc