Post-commit tests policies

Post-commit tests validate that Beam works correctly in a live environment. The tests also catch errors that are hard to predict in the design and implementation stages.

Even though post-commit tests run after the code is merged into the repository, it is important that the tests pass reliably. Jenkins executes post-commit tests against the HEAD of the master branch. If post-commit tests fail, there is a problem with the HEAD build. In addition, post-commit tests are time consuming to run, and it is often hard to triage test failures.

Policies

To ensure that Beam’s post-commit tests are reliable and healthy, the Beam community follows these post-commit test policies:

Post-commit test failure scenarios

When a post-commit test fails, follow the provided steps for your situation.

I found a test failure

  1. Create a GitHub issue and assign it to yourself.
  1. Do high level triage of the failure.
  2. Assign the issue to a relevant person.

I was assigned an issue for a test failure

  1. Rollback the culprit change.
  2. If you determine that rollback will take longer than 8 hours, disable the test temporarily while you rollback or create a fix.

Note: Rollback is always the first course of action. If a fix is trivial, open a pull request with the proposed fix while doing rollback.

My change was rolled back due to a test failure

After rollback there is time for deeper investigation. Start by looking at the GitHub issue to see the background information for the rollback. These scenarios are all common:

These are all valid reasons for rollback. Maintaining clear signal is the highest priority.

The high level steps are the same:

  1. Create a fix and re-run the post-commit tests.
  2. Implement new pre-commit tests that will catch similar failures before future code is merged into the repository.
  3. Open a new PR that contains your fix and the new pre-commit tests.

If the bug is not in your code, here is how to “create a fix”:

  1. File a ticket for the existing bug, if it does not already exist. Remember that a flaky test is a critical bug. Other bad tests are similar: they may fail for arbitrary reasons having nothing to do with what is being tested, making our signal unreliable.
  2. Mark the problematic test to be skipped, with a link to the GitHub issue.

References

  1. Keeping post-commit tests green mailing list proposal thread.