• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

apache / parquet-cpp / 1071

Build:
DEFAULT BRANCH: master
Ran 18 Jan 2017 02:41AM UTC
Jobs 2
Files 1233
Run time 2min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

pending completion
1071

push

travis-ci

wesm
PARQUET-820: Decoders should directly emit arrays with spacing for null entries

Old:

```
In [3]: import pyarrow.io as paio
   ...: import pyarrow.parquet as pq
   ...:
   ...: with open('yellow_tripdata_2016-01.parquet', 'r') as f:
   ...:     buf = f.read()
   ...: buf = paio.buffer_from_bytes(buf)
   ...:
   ...: def read_parquet():
   ...:   reader = paio.BufferReader(buf)
   ...:   df = pq.read_table(reader)
   ...:
   ...: %timeit read_parquet()
   ...:
1 loop, best of 3: 1.21 s per loop
```

New:

```
In [1]: import pyarrow.io as paio
   ...: import pyarrow.parquet as pq
   ...:
   ...: with open('yellow_tripdata_2016-01.parquet', 'r') as f:
   ...:     buf = f.read()
   ...: buf = paio.buffer_from_bytes(buf)
   ...:
   ...: def read_parquet():
   ...:   reader = paio.BufferReader(buf)
   ...:   df = pq.read_table(reader)
   ...:
   ...: %timeit read_parquet()
   ...:
1 loop, best of 3: 906 ms per loop
```

Arrow->Pandas conversion for comparison:

```
In [5]: %timeit df.to_pandas()
1 loop, best of 3: 567 ms per loop
```

All benchmarks were done on a single core CPU

I have to add a better test coverage before this can go in. There is still some room for future improvements that won't be done in this PR:
 * `DefinitionLevelsToBitmap` should be done in the DefinitionLevelsDecoder
 * `GetBatchWithDictSpaced` is something for a vectorization/bitmap ninja.

Author: Uwe L. Korn <uwelk@xhochy.com>
Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes #218 from xhochy/PARQUET-820 and squashes the following commits:

e6db697 [Korn, Uwe] Add INIT_BITSET macro
8f17db9 [Korn, Uwe] Use arrow::TypeTraits
8dcab1b [Uwe L. Korn] Adjust documentation for ReadBatchSpaced
798bc83 [Uwe L. Korn] Test ReadSpaced
9dc6dc0 [Uwe L. Korn] Test DecodeSpaced
ccb70dc [Uwe L. Korn] Add fast path for non-nullable-batches
6f99191 [Uwe L. Korn] Move bit reading into a macro
393d99a [Uwe L. Korn] Explicitly mark overrides
3424ae3 [Uwe L. Korn] Make more use of the bitmaps
685ad34 [Uwe L. Korn] Remove unused include
9b0f105 [Uwe L. Korn] Use bitset in the whole GetBatchWithDict loop
907c165 [Uwe L. Korn] Use bitset in literalbatch
0ec4b38 [Uwe L. Korn] Remove unused code
f6c4b5e [Uwe L. Korn] ninja format
cbf0176 [Uwe L. Korn] DecodeSpaced in dictionary encoder
3dfa43b [Uwe L. Korn] Directly read valid_bits
15aa324 [Uwe L. Korn] Only use ReadSpaced where needed
96dd347 [Korn, Uwe] PARQUET-820: Decoders should directly emit arrays with spacing for null entries

10946 of 11752 relevant lines covered (93.14%)

139449.99 hits per line

Jobs
ID Job ID Ran Files Coverage
1 1071.1 18 Jan 2017 02:41AM UTC 0
93.14
Travis Job 1071.1
2 1071.2 18 Jan 2017 02:42AM UTC 0
93.14
Travis Job 1071.2
Source Files on build 1071
Detailed source file information is not available for this build.
  • Back to Repo
  • Travis Build #1071
  • 65e7db19 on github
  • Prev Build on master (#1063)
  • Next Build on master (#1075)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc