Table Row Inference Sklearn Streaming

Model: Scikit-learn classifier on structured table data (Beam.Row) Accelerator: CPU-based inference (fixed batch size) Host: 10 × n1-standard-4 (4 vCPUs, 15 GB RAM), autoscaling up to 20 workers (THROUGHPUT_BASED)

This streaming pipeline performs inference on continuous table rows using RunInference with a Scikit-learn model. It reads messages from Pub/Sub, applies windowing, runs batched inference while preserving the table schema, and writes results to BigQuery via streaming inserts.

The following graphs show various metrics when running Table Row Inference Sklearn Streaming pipeline. See the glossary for definitions.

Full pipeline implementation is available here.

What is the estimated cost to run the pipeline?

RunTime and EstimatedCost

How has various metrics changed when running the pipeline for different Beam SDK versions?

AvgThroughputBytesPerSec by Version

AvgThroughputElementsPerSec by Version

How has various metrics changed over time when running the pipeline?

AvgThroughputBytesPerSec by Date

AvgThroughputElementsPerSec by Date

See also Table Row Inference Sklearn Batch for the batch variant of this pipeline.

Last updated on 2026/07/11

Have you found everything you were looking for?

Was it all useful and clear? Is there anything that you would like to change? Let us know!