• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

wesm / parquet-cpp / 488 / 2
88%
master: 88%

Build:
DEFAULT BRANCH: master
Ran 18 Jan 2017 12:25AM UTC
Files 1233
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

18 Jan 2017 12:10AM UTC coverage: 93.142% (-0.06%) from 93.202%
488.2

push

travis-ci

wesm
PARQUET-820: Decoders should directly emit arrays with spacing for null entries

Old:

```
In [3]: import pyarrow.io as paio
   ...: import pyarrow.parquet as pq
   ...:
   ...: with open('yellow_tripdata_2016-01.parquet', 'r') as f:
   ...:     buf = f.read()
   ...: buf = paio.buffer_from_bytes(buf)
   ...:
   ...: def read_parquet():
   ...:   reader = paio.BufferReader(buf)
   ...:   df = pq.read_table(reader)
   ...:
   ...: %timeit read_parquet()
   ...:
1 loop, best of 3: 1.21 s per loop
```

New:

```
In [1]: import pyarrow.io as paio
   ...: import pyarrow.parquet as pq
   ...:
   ...: with open('yellow_tripdata_2016-01.parquet', 'r') as f:
   ...:     buf = f.read()
   ...: buf = paio.buffer_from_bytes(buf)
   ...:
   ...: def read_parquet():
   ...:   reader = paio.BufferReader(buf)
   ...:   df = pq.read_table(reader)
   ...:
   ...: %timeit read_parquet()
   ...:
1 loop, best of 3: 906 ms per loop
```

Arrow->Pandas conversion for comparison:

```
In [5]: %timeit df.to_pandas()
1 loop, best of 3: 567 ms per loop
```

All benchmarks were done on a single core CPU

I have to add a better test coverage before this can go in. There is still some room for future improvements that won't be done in this PR:
 * `DefinitionLevelsToBitmap` should be done in the DefinitionLevelsDecoder
 * `GetBatchWithDictSpaced` is something for a vectorization/bitmap ninja.

Author: Uwe L. Korn <uwelk@xhochy.com>
Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes #218 from xhochy/PARQUET-820 and squashes the following commits:

e6db697 [Korn, Uwe] Add INIT_BITSET macro
8f17db9 [Korn, Uwe] Use arrow::TypeTraits
8dcab1b [Uwe L. Korn] Adjust documentation for ReadBatchSpaced
798bc83 [Uwe L. Korn] Test ReadSpaced
9dc6dc0 [Uwe L. Korn] Test DecodeSpaced
ccb70dc [Uwe L. Korn] Add fast path for non-nullable-batches
6f99191 [Uwe L. Korn] Move bit reading into a macro
393d99a [Uwe L. Korn] Explicitly mark overrides
3424ae3 [Uwe L. Korn] Make more use of the bitmaps
685ad34 [Uwe L. Korn] Remove unused include
9b0f105 [Uwe L. Korn] Use bitset in the whole GetBatchWithDict loop
907c165 [Uwe L. Korn] Use bitset in literalbatch
0ec4b38 [Uwe L. Korn] Remove unused code
f6c4b5e [Uwe L. Korn] ninja format
cbf0176 [Uwe L. Korn] DecodeSpaced in dictionary encoder
3dfa43b [Uwe L. Korn] Directly read valid_bits
15aa324 [Uwe L. Korn] Only use ReadSpaced where needed
96dd347 [Korn, Uwe] PARQUET-820: Decoders should directly emit arrays with spacing for null entries

10946 of 11752 relevant lines covered (93.14%)

69744.17 hits per line

Source Files on job 488.2
  • Tree
  • List 0
  • Changed 40
  • Source Changed 13
  • Coverage Changed 40
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 488
  • Travis Job 488.2
  • 65e7db19 on github
  • Prev Job for on master (#487.2)
  • Next Job for on master (#489.2)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc