Table Row Inference Sklearn Streaming

Model: Scikit-learn classifier on structured table data (Beam.Row) Accelerator: CPU-based inference (fixed batch size) Host: 10 × n1-standard-4 (4 vCPUs, 15 GB RAM), autoscaling up to 20 workers (THROUGHPUT_BASED)

This streaming pipeline performs inference on continuous table rows using RunInference with a Scikit-learn model. It reads messages from Pub/Sub, applies windowing, runs batched inference while preserving the table schema, and writes results to BigQuery via streaming inserts.

The following graphs show various metrics when running Table Row Inference Sklearn Streaming pipeline. See the glossary for definitions.

Full pipeline implementation is available here.

What is the estimated cost to run the pipeline?

RunTime and EstimatedCost

RunTime and EstimatedCost

How has various metrics changed when running the pipeline for different Beam SDK versions?

AvgThroughputBytesPerSec by Version

AvgThroughputBytesPerSec by Version

AvgThroughputElementsPerSec by Version

AvgThroughputElementsPerSec by Version

How has various metrics changed over time when running the pipeline?

AvgThroughputBytesPerSec by Date

AvgThroughputBytesPerSec by Date

AvgThroughputElementsPerSec by Date

AvgThroughputElementsPerSec by Date

See also Table Row Inference Sklearn Batch for the batch variant of this pipeline.