Case Studies

Apache Beam powers many of today’s leading projects, industry-specific use cases, and startups.

Mass Ad Bidding With Beam at

Apache Beam powers’s global ads bidding and performance infrastructure, supporting 1M+ queries monthly for workflows across multiple data systems scanning 2 PB+ of analytical data and terabytes of transactional data. Apache Beam accelerated processing by 36x and expedited time-to-market by as much as 4x.'s PPC Team
Marketing Technology Department
Learn more Go to the case study

Self-service Machine Learning Workflows and Scaling MLOps with Apache Beam

Apache Beam has future-proofed Credit Karma’s data and ML platform for scalability and efficiency, enabling MLOps with unified pipelines, processing 5-10 TB daily at 5K events per second, and managing 20K+ ML features.

Avneesh Pratap
Senior Data Engineer II @ Credit Karma
Raj Katakam
Senior ML Engineer II @ Credit Karma
Learn more Go to the case study

Powering Streaming and Real-time ML at Intuit

We feel that the runner agnosticism of Apache Beam affords flexibility and future-proofs our Stream Processing Platform as new runtimes are developed. Apache Beam enabled the democratization of stream processing at Intuit and the migration of many batch jobs to streaming applications.

Nick Hwang
Engineering Manager, Stream Processing Platform @ Intuit
Learn more Go to the case study

Real-time ML with Beam at Lyft

Lyft Marketplace team aims to improve our business efficiency by being nimble to real-world dynamics. Apache Beam has enabled us to meet the goal of having a robust and scalable ML infrastructure for improving model accuracy with features in real-time. These real-time features support critical functions like Forecasting, Primetime, Dispatch.

Ravi Kiran Magham
Software Engineer @ Lyft
Learn more Go to the case study

Real-time Event Stream Processing at Scale for Palo Alto Networks

Palo Alto Networks is a global cybersecurity leader that deals with processing hundreds of billions of security events per day in real-time, which is on the high end of the industry. Apache Beam provides a high-performing, reliable, and resilient data processing framework to support this scale. With Apache Beam, Palo Alto Networks ultimately achieved high performance and low latency, and reduced processing costs by 60%.

Talat Uyarer
Sr Principal Software Engineer
Learn more Go to the case study

Visual Apache Beam Pipeline Design and Orchestration with Apache Hop

Apache Hop is an open source data orchestration and engineering platform that extends Apache Beam with visual pipeline lifecycle management. Neo4j’s Chief Solution Architect and Apache Hop’s co-founder, Matt Casters, sees Apache Beam as a driving force behind Hop.

Matt Casters
Chief Solutions Architect, Neo4j, Apache Hop co-founder
Learn more Go to the case study

Scalability and Cost Optimization for Search Engine's Workloads

Dive into the Czech search engine’s experience of scaling the on-premises infrastructure to learn more about the benefits of byte-based data shuffling and the use cases where Apache Beam portability and abstraction bring the utmost value.

Marek Simunek
Senior Software Engineer @
Learn more Go to the case study

Four Apache Technologies Combined for Fun and Profit

Ricardo, the largest online marketplace in Switzerland, uses Apache Beam to stream-process platform data and enables the Data Intelligence team to provide scalable data integration, analytics, and smart services.

Tobias Kaymak
Senior Data Engineer @ Ricardo
Learn more Go to the case study

Also used by

Mozilla is the non-profit Firefox browser. This use case focuses on complexity that comes from moving data from one system to another safely, modeling data as it passes from one transform to another, handling errors, testing the system, and organizing the code to make the pipeline configurable for different source and destination systems in their open source codebase for ingesting telemetry data from Firefox clients
Developed at Spotify and built on top of Apache Beam for Python, Klio is an open source framework that lets researchers and engineers build smarter data pipelines for processing audio and other media files, easily and at scale.
Kio is a set of Kotlin extensions for Apache Beam to implement fluent-like API for Java SDK.
GraalSystems is a cloud native data platform providing support for Beam, Spark, Tensorflow, Samza and many other data processing solutions. At the heart of our architecture are a set of distributed processing and analytics modules using Beam to route over 2 billion events per day from our Apache Pulsar clusters. For our clients, we run also more than 2,000 Beam jobs per day at a very large scale in our production platform.
Oriel Research Therapeutics (ORT) is a startup company in the greater Boston area that provides early detection services for multiple medical conditions, utilizing cutting edge Artificial Intelligence technologies and Next Generation Sequencing (NGS). ORT utilizes Apache Beam pipelines to process over 1 million samples of genomics and clinical information. The processed data is used by ORT in detecting Leukemia, Sepsis, and other medical conditions.
eBay is an American e-commerce company that provides business-to-consumer and consumer-to-consumer sales through the online website. They build feature pipelines with Apache Beam: unify feature extraction and selection in online and offline, speed up E2E iteration for model training, evaluation and serving, support different types (streaming, runtime, batch) of features, etc. eBay leverages Apache Beam for the streaming feature SDK as a foundation to integrate with Kafka, Hadoop, Flink, Airflow and others in eBay.
GOGA Data Analysis and Consulting is a company based in Japan that specializes in analytics of geospatial and mapping data. They use Apache Beam and Cloud Dataflow for a smooth data transformation process for analytical purposes. This use case focuses on handling multiple extractions, geocoding, and insertion process by wrangling and requesting API call of each data based on the location provided.

Akvelon is a software engineering company that helps start-ups, SMBs, and Fortune 500 companies unlock the full potential of cloud, data, and AI/ML to empower their strategic advantage. Akvelon team has deep expertise in integrating Apache Beam with diverse data processing ecosystems and is an enthusiastic Apache Beam community contributor.