• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

lightcopy / parquet-index
96%
master: 97%

Build:
Build:
LAST BUILD BRANCH: v0.5.0
DEFAULT BRANCH: master
Repo Added 25 Nov 2016 08:16AM UTC
Files 23
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH dictionary-filter-2
branch: dictionary-filter-2
CHANGE BRANCH
x
Reset
  • dictionary-filter-2
  • add-api
  • add-cache-support
  • bloom-filter-oom
  • branch-0.1
  • branch-0.3
  • catalog-table-python
  • cleanup-tests
  • create-if-not-exists
  • dict-string-filter-stats
  • dictionary-filter
  • empty-block-issue
  • enable-filters
  • exists-index-issues
  • filter-doc
  • filter-loading
  • filter-resolution
  • filter-statistics
  • fix-column-not-exist-error-message
  • index-conf
  • index-date-types
  • location-spec
  • master
  • metastore-permissions
  • null-statistics
  • parquet-string-stats
  • rdd-locations
  • rdd-num-partitions
  • refactor-command-api
  • resolve-predicate
  • schema-metadata
  • spark-2-support
  • spark-2.1-support
  • spark-3.0
  • support-spark-2
  • table-support
  • table-support-2
  • tree-statistics
  • update-dict-filter
  • update-persistent-table-doc
  • v0.1.0
  • v0.2.0
  • v0.2.1
  • v0.2.2
  • v0.2.3
  • v0.3.0
  • v0.4.0
  • v0.5.0

pending completion
288

push

travis-ci

web-flow
add futures for pruning indexed partitions (#74)

This PR updates code for sequential resolution of `foldFilter` by using futures and executing `resolveSupported` method in parallel for each file in Parquet partition; partitions are still resolved sequentially. 

Manual testing shows about 1.5-2x performance improvement when filtering index partitions (with large portion of files being scanned). But it also introduces a little bit of overhead when filtering partitions on cached index (20ms vs 35ms). This approach is similar to eager loading.

Benchmarks:
Each test includes cold start and warm start (same query but all necessary column filters have been loaded on previous step)

### Dataset with 1000 partitions
#### Search 1 record
`master`
```
Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 438.53 ms
Post-Scan filters: isnotnull(strid#13),(strid#13 = 35732)

Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 37.332 ms
Post-Scan filters: isnotnull(strid#28),(strid#28 = 35732)
```

`filter-resolution`
```
Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 365.721 ms
Post-Scan filters: isnotnull(strid#7),(strid#7 = 35732)

Applying index filters: IsNotNull(strid),EqualTo(strid,35732)
Filtered indexed partitions in 39.74 ms
Post-Scan filters: isnotnull(strid#22),(strid#22 = 35732)
```

#### Search all records
`master`
```
Applying index filters: IsNotNull(col4),EqualTo(col4,value)
Filtered indexed partitions in 641.424 ms
Post-Scan filters: isnotnull(col4#11),(col4#11 = value)
```

`filter-resolution`
```
Applying index filters: IsNotNull(col4),EqualTo(col4,value)
Filtered indexed partitions in 390.783 ms
Post-Scan filters: isnotnull(col4#11),(col4#11 = value)
```

### Dataset with 400 partitions
#### Search 1 record
`master`
```
Applying index filters: IsNotNull(code),EqualTo(co... (continued)

1058 of 1106 relevant lines covered (95.66%)

0.96 hits per line

Relevant lines Covered
Build:
Build:
1106 RELEVANT LINES 1058 COVERED LINES
0.96 HITS PER LINE
Source Files on dictionary-filter-2
Detailed source file information is not available for this build.

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
288 dictionary-filter-2 add futures for pruning indexed partitions (#74) This PR updates code for sequential resolution of `foldFilter` by using futures and executing `resolveSupported` method in parallel for each file in Parquet partition; partitions are still resolved... push 22 Mar 2017 06:57AM UTC web-flow travis-ci pending completion  
186 dictionary-filter-2 update docs push 04 Feb 2017 04:18AM UTC sadikovi travis-ci pending completion  
185 dictionary-filter-2 add dict filter and tests push 04 Feb 2017 03:33AM UTC sadikovi travis-ci pending completion  
184 dictionary-filter-2 add report for bloom filter oom (#46) push 04 Feb 2017 02:40AM UTC web-flow travis-ci pending completion  
See All Builds (302)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc