1
95%
master: 95%

Ran 13 Jul 2020 06:55PM UTC

Files 6

Run time 4min

Badge

Embed ▾

Committed 13 Jul 2020 06:50PM UTC coverage: 27.017% (+0.02%) from 26.999%

Job # TF_EAGER="0" TF_KERAS="0" TF_VERSION="1.14.0" KERAS_VERSION="2.2.5"

Build Type

push

travis-ci-com

Committed by

web-flow

Commit Message

Correct normalization scheme; deprecate `batch_size`

Existing code normalized as: `norm = sqrt(batch_size / total_iterations)`, where `total_iterations` = (number of fits per epoch) * (number of epochs in restart). However, `total_iterations = total_samples / batch_size` --> `norm = batch_size * sqrt(1 / (total_iterations_per_epoch * epochs))`, making `norm` scale _linearly_ with `batch_size`, which differs from authors' sqrt.

Users who never changed `batch_size` throughout training will be unaffected. (λ = λ_norm * sqrt(b / BT); λ_norm is what we pick, our "guess". The idea of normalization is to make it so that if our guess works well for `batch_size=32`, it'll work well for `batch_size=16` - but if `batch_size` is never changed, then performance is only affected by the guess.)

Main change [here](https://github.com/OverLordGoldDragon/keras-adamw/pull/53/files#diff-220519926b87c12115d2f727803fbe6bR19), closing #52.

**Updating existing code**: for a choice of λ_norm that previously worked well, apply `*=  sqrt(batch_size)`. Ex: `Dense(bias_regularizer=l2(1e-4))` --> `Dense(bias_regularizer=l2(1e-4 * sqrt(32)))`.

Run Details

365 of 1351 relevant lines covered (27.02%)

0.27 hits per line

OverLordGoldDragon / keras-adamw / 195 / 1
95%
master: 95%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 195.1 (TF_EAGER="0" TF_KERAS="0" TF_VERSION="1.14.0" KERAS_VERSION="2.2.5")

OverLordGoldDragon / keras-adamw / 195 / 1 95% master: 95%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 195.1 (TF_EAGER="0" TF_KERAS="0" TF_VERSION="1.14.0" KERAS_VERSION="2.2.5")

OverLordGoldDragon / keras-adamw / 195 / 1
95%
master: 95%

README BADGES
x