Pre-commit Slowness Triage Guide

Beam pre-commit jobs are suites of tests run automatically on Jenkins build machines for each pull request (PR) submitted to apache/beam. For more information and the difference between pre-commits and post-commits, see testing.

What are fast pre-commits?

Pre-commit tests are required to pass before a pull request (PR) is merged. When these tests are slow they slow down Beam’s development process. The aim is to have 95% of pre-commit jobs complete within 30 minutes (failing or passing).

Technically, the 95th percentile of running time should be below 30 minutes over the past 4 weeks, where running time is the duration of time the job spends in the Jenkins queue + the actual time it spends running.

Determining Slowness

There are two main signs of slowness:

  1. Pre-commit jobs are timing out after 30 minutes. This can be determined from the console log of a job.
  2. Pre-commits aren’t timing out, but the total wait time for pre-commit results is >30m.

Pre-commit Dashboard

The Beam Community Metrics site contains a Pre-Commit Tests dashboard showing job timing trends. You can modify the time window (defaults to 7 days) or filter down to a specific test suite by clicking on it.

example pre-commit duration dashboard

Triage Process

  1. Search for existing issues
  2. Create a new issue if needed: Apache JIRA
    • Project: Beam
    • Components: testing, anything else relevant
    • Label: precommit
    • Reference this page in the description.
  3. Determine where the slowness is coming from and identify issues. Open additional issues if needed (such as for multiple issues).
  4. Assign the issue as appropriate, e.g., to the test’s or PR’s author.

Resolution

It is important that we quickly fix slow pre-commit tests. See pre-commit test policies for details.

Possible Causes and Solutions

This section lists some starting points for fixing pre-commit slowness.

Resource Exhaustion

Have a look at the graphs in the Jupyter notebook. Does the rise in total duration match the rise in queuing time? If so, the slowness might be unrelated to this specific pre-commit job.

Example of when total and queuing durations rise and fall together (mostly): graph of pre-commit times

Since Jenkins machines are a limited resource, other jobs can affect pre-commit queueing times. Try to figure out if other jobs have been recently slower, increased in frequency, or new jobs have been introduced.

Another option is to look at adding more Jenkins machines.

Slow individual tests

Sometimes a pre-commit job is slowed down due to one or more tests. One way of determining if this is the case is by looking at individual test timings.

Where to find individual test timings:

Sometimes tests can be made faster by refactoring. A test that spends a lot of time waiting (such as an integration test) could be made to run concurrently with the other tests.

If a test is determined to be too slow to be part of pre-commit tests, it should be removed from pre-commit and placed in post-commit instead. In addition, ensure that the code covered by the removed test is covered by a unit test in pre-commit.

Slow integration tests

Integration test slowdowns may be caused by dependent services.

References