The tests were very large, making it unclear what they were testing and
making it harder to localize regressions in specific behaviors.
The new tests
- Test one specific behavior each, making it much easier to localize a
breakage of a specific behavior.
- Test additionally that small common chunks at the beginning or end do not
create context controls.
- Have unique names.
- Return promises instead of using the `done()` callback for async.
- Use `Array(n).fill(x)` instead of hard to read for loops.
Change-Id: I3145f37ed59448328aab7e8821e0a9a2d3e8e209