• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

lightcopy / parquet-index / 283 / 5
97%
master: 97%

Build:
DEFAULT BRANCH: master
Ran 18 Mar 2017 12:16AM UTC
Files 22
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

18 Mar 2017 12:11AM UTC coverage: 95.66% (-0.004%) from 95.664%
TEST_SPARK_VERSION="2.0.2" TEST_SPARK_RELEASE="spark-2.0.2-bin-hadoop2.6"

push

travis-ci

web-flow
add futures for pruning indexed partitions (#74)

This PR updates code for sequential resolution of `foldFilter` by using futures and executing `resolveSupported` method in parallel for each file in Parquet partition; partitions are still resolved sequentially. 

Manual testing shows about 1.5-2x performance improvement when filtering index partitions (with large portion of files being scanned). But it also introduces a little bit of overhead when filtering partitions on cached index (20ms vs 35ms). This approach is similar to eager loading.

Benchmarks:
Each test includes cold start and warm start (same query but all necessary column filters have been loaded on previous step)

### Dataset with 1000 partitions
#### Search 1 record
`master`
```
Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 438.53 ms
Post-Scan filters: isnotnull(strid#13),(strid#13 = 35732)

Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 37.332 ms
Post-Scan filters: isnotnull(strid#28),(strid#28 = 35732)
```

`filter-resolution`
```
Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 365.721 ms
Post-Scan filters: isnotnull(strid#7),(strid#7 = 35732)

Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 39.74 ms
Post-Scan filters: isnotnull(strid#22),(strid#22 = 35732)
```

#### Search all records
`master`
```
Applying index filters: IsNotNull(col4),EqualTo(col4,value)
Filtered indexed partitions in 641.424 ms
Post-Scan filters: isnotnull(col4#11),(col4#11 = value)
```

`filter-resolution`
```
Applying index filters: IsNotNull(col4),EqualTo(col4,value)
Filtered indexed partitions in 390.783 ms
Post-Scan filters: isnotnull(col4#11),(col4#11 = value)
```

### Dataset with 400 partitions
#### Search 1 record
`master`
```
Applying index filters: IsNotNull(code),EqualTo(code,339382)
Filtered indexed partitions in 1178.762 ms
Post-Scan filters: isnotnull(code#8),(code#8 = 339382)

Applying index filters: IsNotNull(code),EqualTo(code,339382)
Filtered indexed partitions in 17.909 ms
Post-Scan filters: isnotnull(code#21),(code#21 = 339382)
```

`filter-resolution`
```
Applying index filters: IsNotNull(code),EqualTo(code,339382)
Filtered indexed partitions in 509.551 ms
Post-Scan filters: isnotnull(code#8),(code#8 = 339382)

Applying index filters: IsNotNull(code),EqualTo(code,339382)
Filtered indexed partitions in 12.801 ms
Post-Scan filters: isnotnull(code#21),(code#21 = 339382)
```

Closes #56.

1058 of 1106 relevant lines covered (95.66%)

0.96 hits per line

Source Files on job 283.5 (TEST_SPARK_VERSION="2.0.2" TEST_SPARK_RELEASE="spark-2.0.2-bin-hadoop2.6")
  • Tree
  • List 0
  • Changed 3
  • Source Changed 2
  • Coverage Changed 3
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 283
  • Travis Job 283.5
  • bc3b5028 on github
  • Prev Job for TEST_SPARK_VERSION="2.0.2" TEST_SPARK_RELEASE="spark-2.0.2-bin-hadoop2.6" on master (#279.5)
  • Next Job for TEST_SPARK_VERSION="2.0.2" TEST_SPARK_RELEASE="spark-2.0.2-bin-hadoop2.6" on master (#291.5)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc