RunInference Metrics

This example demonstrates and explains different metrics that are available when using the RunInference transform to perform inference using a machine learning model. The example uses a pipeline that reads a list of sentences, tokenizes the text, and uses the transformer-based model distilbert-base-uncased-finetuned-sst-2-english with RunInference to classify the pieces of text into two classes.

When you run the pipeline with the Dataflow runner, different RunInference metrics are available with CPU and with GPU. This example demonstrates both types of metrics.

The following diagram shows the file structure for the entire pipeline.

runinference_metrics/
├── pipeline/
│   ├── __init__.py
│   ├── options.py
│   └── transformations.py
├── __init__.py
├── config.py
├── main.py
└── setup.py

pipeline/transformations.py contains the code for beam.DoFn and additional functions that are used for the pipeline.

pipeline/options.py contains the pipeline options to configure the Dataflow pipeline.

config.py defines variables that are used multiple times, like the Google Cloud PROJECT_ID and NUM_WORKERS.

setup.py defines the packages and requirements for the pipeline to run.

main.py contains the pipeline code and additional functions used for running the pipeline.

Run the Pipeline

Install the required packages. For this example, you need access to a Google Cloud project, and you need to configure the Google Cloud variables, like PROJECT_ID, REGION, and others, in the config.py file. To use GPUs, follow the setup instructions in the PyTorch GPU minimal pipeline example on GitHub.

  1. Dataflow with CPU: python main.py --mode cloud --device CPU
  2. Dataflow with GPU: python main.py --mode cloud --device GPU

The pipeline includes the following steps:

  1. Create a list of texts to use as an input using beam.Create.
  2. Tokenize the text.
  3. Use RunInference to do inference.
  4. Postprocess the output of RunInference.
  with beam.Pipeline(options=pipeline_options) as pipeline:
    _ = (
        pipeline
        | "Create inputs" >> beam.Create(inputs)
        | "Tokenize" >> beam.ParDo(Tokenize(cfg.TOKENIZER_NAME))
        | "Inference" >>
        RunInference(model_handler=KeyedModelHandler(model_handler))
        | "Decode Predictions" >> beam.ParDo(PostProcessor()))

RunInference Metrics

As mentioned previously, we benchmarked the performance of RunInference using Dataflow on both CPU and GPU. You can see these metrics in the Google Cloud console, or you can use the following line to print the metrics:

metrics = pipeline.result.metrics().query(beam.metrics.MetricsFilter())

The following image shows a snapshot of different metrics in the Google Cloud console when using Dataflow on GPU:

RunInference GPU metrics rendered on Dataflow

Some metrics commonly used for benchmarking are:

You can also derive other relevant metrics, such as in the following example.