• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

apache / parquet-cpp / 1071 / 2

Build:
DEFAULT BRANCH: master
Ran 18 Jan 2017 02:42AM UTC
Files 1233
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

18 Jan 2017 12:10AM UTC coverage: 93.142% (-0.01%) from 93.152%
1071.2

push

travis-ci

wesm
PARQUET-820: Decoders should directly emit arrays with spacing for null entries

Old:

```
In [3]: import pyarrow.io as paio
   ...: import pyarrow.parquet as pq
   ...:
   ...: with open('yellow_tripdata_2016-01.parquet', 'r') as f:
   ...:     buf = f.read()
   ...: buf = paio.buffer_from_bytes(buf)
   ...:
   ...: def read_parquet():
   ...:   reader = paio.BufferReader(buf)
   ...:   df = pq.read_table(reader)
   ...:
   ...: %timeit read_parquet()
   ...:
1 loop, best of 3: 1.21 s per loop
```

New:

```
In [1]: import pyarrow.io as paio
   ...: import pyarrow.parquet as pq
   ...:
   ...: with open('yellow_tripdata_2016-01.parquet', 'r') as f:
   ...:     buf = f.read()
   ...: buf = paio.buffer_from_bytes(buf)
   ...:
   ...: def read_parquet():
   ...:   reader = paio.BufferReader(buf)
   ...:   df = pq.read_table(reader)
   ...:
   ...: %timeit read_parquet()
   ...:
1 loop, best of 3: 906 ms per loop
```

Arrow->Pandas conversion for comparison:

```
In [5]: %timeit df.to_pandas()
1 loop, best of 3: 567 ms per loop
```

All benchmarks were done on a single core CPU

I have to add a better test coverage before this can go in. There is still some room for future improvements that won't be done in this PR:
 * `DefinitionLevelsToBitmap` should be done in the DefinitionLevelsDecoder
 * `GetBatchWithDictSpaced` is something for a vectorization/bitmap ninja.

Author: Uwe L. Korn <uwelk@xhochy.com>
Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes #218 from xhochy/PARQUET-820 and squashes the following commits:

e6db697 [Korn, Uwe] Add INIT_BITSET macro
8f17db9 [Korn, Uwe] Use arrow::TypeTraits
8dcab1b [Uwe L. Korn] Adjust documentation for ReadBatchSpaced
798bc83 [Uwe L. Korn] Test ReadSpaced
9dc6dc0 [Uwe L. Korn] Test DecodeSpaced
ccb70dc [Uwe L. Korn] Add fast path for non-nullable-batches
6f99191 [Uwe L. Korn] Move bit reading into a macro
393d99a [Uwe L. Korn] Explicitly mark overrides
3424ae3 [Uwe L. Korn] Make more use of the bitmaps
685ad34 [Uwe L. Korn] Remove unused include
9b0f105 [Uwe L. Korn] Use bitset in the whole GetBatchWithDict loop
907c165 [Uwe L. Korn] Use bitset in literalbatch
0ec4b38 [Uwe L. Korn] Remove unused code
f6c4b5e [Uwe L. Korn] ninja format
cbf0176 [Uwe L. Korn] DecodeSpaced in dictionary encoder
3dfa43b [Uwe L. Korn] Directly read valid_bits
15aa324 [Uwe L. Korn] Only use ReadSpaced where needed
96dd347 [Korn, Uwe] PARQUET-820: Decoders should directly emit arrays with spacing for null entries

10946 of 11752 relevant lines covered (93.14%)

69726.18 hits per line

Source Files on job 1071.2
  • Tree
  • List 0
  • Changed 42
  • Source Changed 14
  • Coverage Changed 42
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 1071
  • Travis Job 1071.2
  • 65e7db19 on github
  • Prev Job for on master (#1063.2)
  • Next Job for on master (#1075.2)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc