Case Studies

Apache Beam powers many of today’s leading projects, industry-specific use cases, and startups.

Revolutionizing Real-Time Stream Processing: 4 Trillion Events Daily at LinkedIn

Apache Beam serves as the backbone of LinkedIn's streaming infrastructure, handling the near real-time processing of an astounding 4 trillion events daily through 3,000+ pipelines and thus powering personalized experiences for LinkedIn’s vast network of over 950 million members worldwide. The adoption of Apache Beam brought about a series of impressive enhancements, including 2x cost optimization depending on the use case, an astounding acceleration from days to minutes in labeling abuse, and more than 6% improvement in detecting logged-in scrapping profiles.

Bingfeng Xia
Engineering Manager @LinkedIn
Xinyu Liu
Senior Staff Engineer @LinkedIn
Learn more Go to the case study

High-Performing and Efficient Transactional Data Processing for OCTO Technology’s Clients

With Apache Beam, OCTO accelerated the migration of one of France’s largest grocery retailers to streaming processing for transactional data. By leveraging Apache Beam's powerful transforms and robust streaming capabilities, they achieved a 5x reduction in infrastructure costs and a 4x boost in performance. The streaming Apache Beam pipelines now process over 100 million rows daily, consolidating hundreds of gigabytes of transactional data with over a terabyte of an external state in under 3 hours, a task that was not feasible without Apache Beam’s controlled aggregation.

OCTO Technology's Data Engineering Team
Large Retail Client Project
Learn more Go to the case study

High-Performance Quantitative Risk Analysis with Apache Beam at HSBC

HSBC finds Apache Beam to be more than a data processing framework. It is also a computational platform and a risk engine that allowed for 100x scaling and 2x faster performance of HSBC’s XVA pipelines, accelerated time-to-market by 24x, and simplified data distribution for modeling future scenarios with Monte Carlo simulations, powering quantitative risk analysis for forecasting and decision-making.

Chup Cheng
VP of XVA and CCR Capital Analytics @ HSBC
Andrzej Golonka
Lead Assistant Vice President @ HSBC
Learn more Go to the case study

Efficient Streaming Analytics: Making the Web a Safer Place with Project Shield

Project Shield defends the websites of over 3K vulnerable organizations in >150 countries against DDoS attacks with the mission of protecting freedom of speech. The Apache Beam streaming pipelines process about 3 TB of log data daily at >10,000 queries per second. The pipelines produce real-time user-facing analytics, tailored traffic rate limits, and defense recommendations. Apache Beam enabled the delivery of critical metrics at scale with a ~2x efficiency gain. This data supported Project Shield’s goal of eliminating the DDoS attack as a weapon for silencing the voices of journalists and others who speak the truth. Ultimately, Project Shield’s goal is to make the web a safer place.

Marc Howard
Founding Engineer @ Project Shield
Chad Hansen
Founding Engineer @ Project Shield
Learn more Go to the case study

Mass Ad Bidding With Beam at Booking.com

Apache Beam powers Booking.com’s global ads bidding and performance infrastructure, supporting 1M+ queries monthly for workflows across multiple data systems scanning 2 PB+ of analytical data and terabytes of transactional data. Apache Beam accelerated processing by 36x and expedited time-to-market by as much as 4x.

Booking.com's PPC Team
Marketing Technology Department
Learn more Go to the case study

Self-service Machine Learning Workflows and Scaling MLOps with Apache Beam

Apache Beam has future-proofed Credit Karma’s data and ML platform for scalability and efficiency, enabling MLOps with unified pipelines, processing 5-10 TB daily at 5K events per second, and managing 20K+ ML features.

Avneesh Pratap
Senior Data Engineer II @ Credit Karma
Raj Katakam
Senior ML Engineer II @ Credit Karma
Learn more Go to the case study

Powering Streaming and Real-time ML at Intuit

We feel that the runner agnosticism of Apache Beam affords flexibility and future-proofs our Stream Processing Platform as new runtimes are developed. Apache Beam enabled the democratization of stream processing at Intuit and the migration of many batch jobs to streaming applications.

Nick Hwang
Engineering Manager, Stream Processing Platform @ Intuit
Learn more Go to the case study

Real-time ML with Beam at Lyft

Lyft Marketplace team aims to improve our business efficiency by being nimble to real-world dynamics. Apache Beam has enabled us to meet the goal of having a robust and scalable ML infrastructure for improving model accuracy with features in real-time. These real-time features support critical functions like Forecasting, Primetime, Dispatch.

Ravi Kiran Magham
Software Engineer @ Lyft
Learn more Go to the case study

Real-time Event Stream Processing at Scale for Palo Alto Networks

Palo Alto Networks is a global cybersecurity leader that deals with processing hundreds of billions of security events per day in real-time, which is on the high end of the industry. Apache Beam provides a high-performing, reliable, and resilient data processing framework to support this scale. With Apache Beam, Palo Alto Networks ultimately achieved high performance and low latency, and reduced processing costs by 60%.

Talat Uyarer
Sr Principal Software Engineer
Learn more Go to the case study

Visual Apache Beam Pipeline Design and Orchestration with Apache Hop

Apache Hop is an open source data orchestration and engineering platform that extends Apache Beam with visual pipeline lifecycle management. Neo4j’s Chief Solution Architect and Apache Hop’s co-founder, Matt Casters, sees Apache Beam as a driving force behind Hop.

Matt Casters
Chief Solutions Architect, Neo4j, Apache Hop co-founder
Learn more Go to the case study

Scalability and Cost Optimization for Search Engine's Workloads

Dive into the Czech search engine’s experience of scaling the on-premises infrastructure to learn more about the benefits of byte-based data shuffling and the use cases where Apache Beam portability and abstraction bring the utmost value.

Marek Simunek
Senior Software Engineer @ seznam.cz
Learn more Go to the case study

Four Apache Technologies Combined for Fun and Profit

Ricardo, the largest online marketplace in Switzerland, uses Apache Beam to stream-process platform data and enables the Data Intelligence team to provide scalable data integration, analytics, and smart services.

Tobias Kaymak
Senior Data Engineer @ Ricardo
Learn more Go to the case study

Also used by

Mozilla is the non-profit Firefox browser. This use case focuses on complexity that comes from moving data from one system to another safely, modeling data as it passes from one transform to another, handling errors, testing the system, and organizing the code to make the pipeline configurable for different source and destination systems in their open source codebase for ingesting telemetry data from Firefox clients
Developed at Spotify and built on top of Apache Beam for Python, Klio is an open source framework that lets researchers and engineers build smarter data pipelines for processing audio and other media files, easily and at scale.
Kio is a set of Kotlin extensions for Apache Beam to implement fluent-like API for Java SDK.
GraalSystems is a cloud native data platform providing support for Beam, Spark, Tensorflow, Samza and many other data processing solutions. At the heart of our architecture are a set of distributed processing and analytics modules using Beam to route over 2 billion events per day from our Apache Pulsar clusters. For our clients, we run also more than 2,000 Beam jobs per day at a very large scale in our production platform.
Oriel Research Therapeutics (ORT) is a startup company in the greater Boston area that provides early detection services for multiple medical conditions, utilizing cutting edge Artificial Intelligence technologies and Next Generation Sequencing (NGS). ORT utilizes Apache Beam pipelines to process over 1 million samples of genomics and clinical information. The processed data is used by ORT in detecting Leukemia, Sepsis, and other medical conditions.
eBay is an American e-commerce company that provides business-to-consumer and consumer-to-consumer sales through the online website. They build feature pipelines with Apache Beam: unify feature extraction and selection in online and offline, speed up E2E iteration for model training, evaluation and serving, support different types (streaming, runtime, batch) of features, etc. eBay leverages Apache Beam for the streaming feature SDK as a foundation to integrate with Kafka, Hadoop, Flink, Airflow and others in eBay.
GOGA Data Analysis and Consulting is a company based in Japan that specializes in analytics of geospatial and mapping data. They use Apache Beam and Cloud Dataflow for a smooth data transformation process for analytical purposes. This use case focuses on handling multiple extractions, geocoding, and insertion process by wrangling and requesting API call of each data based on the location provided.

Akvelon is a software engineering company that helps start-ups, SMBs, and Fortune 500 companies unlock the full potential of cloud, data, and AI/ML to empower their strategic advantage. Akvelon team has deep expertise in integrating Apache Beam with diverse data processing ecosystems and is an enthusiastic Apache Beam community contributor.