• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

SwamyDev / reinforcement / 69
100%
master: 100%

Build:
Build:
LAST BUILD BRANCH: dependabot/pip/tensorflow-1.15.4
DEFAULT BRANCH: master
Ran 15 Sep 2019 02:19PM UTC
Jobs 1
Files 21
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

pending completion
69

push

travis-ci

SwamyDev
Test baseline propagation & normalization

For the reinforce algorithm to converge it is important, that the value
estimates of the baseline also contain the accumulated reward which the
agent is using as a signal. Otherwise, the baseline adjustment is too
much off to be useful. Hence we introduced the one dimensional walk
environment, where the reward needs to propagate back to the start
position for the agent to stably learn the optimal policy. Additionally
the environment tests, whether the baseline estimates are done on
trajectory batches. This is important because one edge case is where the
basline would estimate perfect values for each state, reducing the
advanatage signals to almost 0 resulting in no useful signal at all.

470 of 470 relevant lines covered (100.0%)

1.0 hits per line

Jobs
ID Job ID Ran Files Coverage
1 69.1 15 Sep 2019 02:19PM UTC 0
100.0
Travis Job 69.1
Source Files on build 69
Detailed source file information is not available for this build.
  • Back to Repo
  • Travis Build #69
  • 2e924382 on github
  • Prev Build on learning-tests (#68)
  • Next Build on learning-tests (#71)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc