Case Studies

Apache Beam powers many of today’s leading projects, industry-specific use cases, and startups.

Real-time ML with Beam at Lyft

Lyft Marketplace team aims to improve our business efficiency by being nimble to real-world dynamics. Apache Beam has enabled us to meet the goal of having a robust and scalable ML infrastructure for improving model accuracy with features in real-time. These real-time features support critical functions like Forecasting, Primetime, Dispatch.

Ravi Kiran Magham
Software Engineer @ Lyft
Learn more Go to the case study

Real-time Event Stream Processing at Scale for Palo Alto Networks

Palo Alto Networks is a global cybersecurity leader that deals with processing hundreds of billions of security events per day in real-time, which is on the high end of the industry. Apache Beam provides a high-performing, reliable, and resilient data processing framework to support this scale. With Apache Beam, Palo Alto Networks ultimately achieved high performance and low latency, and reduced processing costs by 60%.

Talat Uyarer
Sr Principal Software Engineer
Learn more Go to the case study

Visual Apache Beam Pipeline Design and Orchestration with Apache Hop

Apache Hop is an open source data orchestration and engineering platform that extends Apache Beam with visual pipeline lifecycle management. Neo4j’s Chief Solution Architect and Apache Hop’s co-founder, Matt Casters, sees Apache Beam as a driving force behind Hop.

Matt Casters
Chief Solutions Architect, Neo4j, Apache Hop co-founder
Learn more Go to the case study

Scalability and Cost Optimization for Search Engine's Workloads

Dive into the Czech search engine’s experience of scaling the on-premises infrastructure to learn more about the benefits of byte-based data shuffling and the use cases where Apache Beam portability and abstraction bring the utmost value.

Marek Simunek
Senior Software Engineer @ seznam.cz
Learn more Go to the case study

Four Apache Technologies Combined for Fun and Profit

Ricardo, the largest online marketplace in Switzerland, uses Apache Beam to stream-process platform data and enables the Data Intelligence team to provide scalable data integration, analytics, and smart services.

Tobias Kaymak
Senior Data Engineer @ Ricardo
Learn more Go to the case study

Also used by

Mozilla is the non-profit Firefox browser. This use case focuses on complexity that comes from moving data from one system to another safely, modeling data as it passes from one transform to another, handling errors, testing the system, and organizing the code to make the pipeline configurable for different source and destination systems in their open source codebase for ingesting telemetry data from Firefox clients
TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines based on Apache Beam.
Scio is a Scala API for Apache Beam and Google Cloud Dataflow inspired by Apache Spark and Scalding.
Developed at Spotify and built on top of Apache Beam for Python, Klio is an open source framework that lets researchers and engineers build smarter data pipelines for processing audio and other media files, easily and at scale.
Kio is a set of Kotlin extensions for Apache Beam to implement fluent-like API for Java SDK.
Oriel Research Therapeutics (ORT) is a startup company in the greater Boston area that provides early detection services for multiple medical conditions, utilizing cutting edge Artificial Intelligence technologies and Next Generation Sequencing (NGS). ORT utilizes Apache Beam pipelines to process over 1 million samples of genomics and clinical information. The processed data is used by ORT in detecting Leukemia, Sepsis, and other medical conditions.
eBay is an American e-commerce company that provides business-to-consumer and consumer-to-consumer sales through the online website. They build feature pipelines with Apache Beam: unify feature extraction and selection in online and offline, speed up E2E iteration for model training, evaluation and serving, support different types (streaming, runtime, batch) of features, etc. eBay leverages Apache Beam for the streaming feature SDK as a foundation to integrate with Kafka, Hadoop, Flink, Airflow and others in eBay.
Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.
GOGA Data Analysis and Consulting is a company based in Japan that specializes in analytics of geospatial and mapping data. They use Apache Beam and Cloud Dataflow for a smooth data transformation process for analytical purposes. This use case focuses on handling multiple extractions, geocoding, and insertion process by wrangling and requesting API call of each data based on the location provided.
Akvelon is a software engineering company that helps start-ups, SMBs, and Fortune 500 companies unlock the full potential of cloud, data, and AI/ML to empower their strategic advantage. Akvelon team has deep expertise in integrating Apache Beam with diverse data processing ecosystems and is an enthusiastic Apache Beam community contributor.