OverLordGoldDragon/keras-adamw | Coveralls - Test Coverage History & Statistics

LAST BUILD ON BRANCH v1.36
branch: v1.36

CHANGE BRANCH
x

Reset

v1.36

1.0

1.2

1.22

1.23

114-fix

2.3.1

OverLordGoldDragon-patch-1

OverLordGoldDragon-patch-2

OverLordGoldDragon-patch-3

OverLordGoldDragon-sh-syntax-testing

OverLordGoldDragon-travis-1

TF2.2

master

patch-1

patch-2

v1.1

v1.1c

v1.1d

v1.1e

v1.2

v1.21

v1.22

v1.3

v1.30

v1.31

v1.32

v1.35

v1.37

v1.38

v1.38a

pending completion

Build # 196

Build Type

push

travis-ci-com

Committed by

web-flow

Commit Message

Correct normalization scheme; deprecate `batch_size`

Existing code normalized as: `norm = sqrt(batch_size / total_iterations)`, where `total_iterations` = (number of fits per epoch) * (number of epochs in restart). However, `total_iterations = total_samples / batch_size` --> `norm = batch_size * sqrt(1 / (total_iterations_per_epoch * epochs))`, making `norm` scale _linearly_ with `batch_size`, which differs from authors' sqrt.

Users who never changed `batch_size` throughout training will be unaffected. (λ = λ_norm * sqrt(b / BT); λ_norm is what we pick, our "guess". The idea of normalization is to make it so that if our guess works well for `batch_size=32`, it'll work well for `batch_size=16` - but if `batch_size` is never changed, then performance is only affected by the guess.)

Main change [here](https://github.com/OverLordGoldDragon/keras-adamw/pull/53/files#diff-220519926b87c12115d2f727803fbe6bR19), closing #52.

**Updating existing code**: for a choice of λ_norm that previously worked well, apply `*=  sqrt(batch_size)`. Ex: `Dense(bias_regularizer=l2(1e-4))` --> `Dense(bias_regularizer=l2(1e-4 * sqrt(32)))`.

Run Details

1317 of 1351 relevant lines covered (97.48%)

1.99 hits per line

Relevant lines Covered

1351 RELEVANT LINES 1317 COVERED LINES

1.99 HITS PER LINE

Source Files on v1.36

Recent builds

Builds	Branch	Commit	Type	Ran	Committer	Via	Coverage
196	v1.36	Correct normalization scheme; deprecate `batch_size` Existing code normalized as: `norm = sqrt(batch_size / total_iterations)`, where `total_iterations` = (number of fits per epoch) * (number of epochs in restart). However, `total_iterations = to...	push	13 Jul 2020 07:00PM UTC	web-flow	travis-ci-com	pending completion
194	v1.36	Correct normalization scheme; deprecate `batch_size`	push	13 Jul 2020 06:30PM UTC	OverLordGoldDragon	travis-ci-com	pending completion

See All Builds (149)

OverLordGoldDragon / keras-adamw
97%
master: 95%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

CHANGE BRANCH
x

Relevant lines Covered

Source Files on v1.36

Recent builds

OverLordGoldDragon / keras-adamw 97% master: 95%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

CHANGE BRANCH x

Relevant lines Covered

Source Files on v1.36

Recent builds

OverLordGoldDragon / keras-adamw
97%
master: 95%

README BADGES
x

CHANGE BRANCH
x