Case Studies
Apache Beam powers many of today’s leading projects, industry-specific use cases, and startups.
Building data abstractions with streaming at Yelp
At Yelp, Apache Beam allows teams to create custom streaming pipelines using Python, eliminating the need to switch to Scala or Java. This reduces the learning curve for Python developers and minimizes friction, while providing the flexibility to utilize existing Python libraries.
Revolutionizing Real-Time Stream Processing: 4 Trillion Events Daily at LinkedIn
Apache Beam serves as the backbone of LinkedIn's streaming infrastructure, handling the near real-time processing of an astounding 4 trillion events daily through 3,000+ pipelines and thus powering personalized experiences for LinkedIn’s vast network of over 950 million members worldwide. The adoption of Apache Beam brought about a series of impressive enhancements, including 2x cost optimization depending on the use case, an astounding acceleration from days to minutes in labeling abuse, and more than 6% improvement in detecting logged-in scrapping profiles.
High-Performing and Efficient Transactional Data Processing for OCTO Technology’s Clients
With Apache Beam, OCTO accelerated the migration of one of France’s largest grocery retailers to streaming processing for transactional data. By leveraging Apache Beam's powerful transforms and robust streaming capabilities, they achieved a 5x reduction in infrastructure costs and a 4x boost in performance. The streaming Apache Beam pipelines now process over 100 million rows daily, consolidating hundreds of gigabytes of transactional data with over a terabyte of an external state in under 3 hours, a task that was not feasible without Apache Beam’s controlled aggregation.
High-Performance Quantitative Risk Analysis with Apache Beam at HSBC
HSBC finds Apache Beam to be more than a data processing framework. It is also a computational platform and a risk engine that allowed for 100x scaling and 2x faster performance of HSBC’s XVA pipelines, accelerated time-to-market by 24x, and simplified data distribution for modeling future scenarios with Monte Carlo simulations, powering quantitative risk analysis for forecasting and decision-making.
Efficient Streaming Analytics: Making the Web a Safer Place with Project Shield
Project Shield defends the websites of over 3K vulnerable organizations in >150 countries against DDoS attacks with the mission of protecting freedom of speech. The Apache Beam streaming pipelines process about 3 TB of log data daily at >10,000 queries per second. The pipelines produce real-time user-facing analytics, tailored traffic rate limits, and defense recommendations. Apache Beam enabled the delivery of critical metrics at scale with a ~2x efficiency gain. This data supported Project Shield’s goal of eliminating the DDoS attack as a weapon for silencing the voices of journalists and others who speak the truth. Ultimately, Project Shield’s goal is to make the web a safer place.
Mass Ad Bidding With Beam at Booking.com
Apache Beam powers Booking.com’s global ads bidding and performance infrastructure, supporting 1M+ queries monthly for workflows across multiple data systems scanning 2 PB+ of analytical data and terabytes of transactional data. Apache Beam accelerated processing by 36x and expedited time-to-market by as much as 4x.
Self-service Machine Learning Workflows and Scaling MLOps with Apache Beam
Apache Beam has future-proofed Credit Karma’s data and ML platform for scalability and efficiency, enabling MLOps with unified pipelines, processing 5-10 TB daily at 5K events per second, and managing 20K+ ML features.
Powering Streaming and Real-time ML at Intuit
We feel that the runner agnosticism of Apache Beam affords flexibility and future-proofs our Stream Processing Platform as new runtimes are developed. Apache Beam enabled the democratization of stream processing at Intuit and the migration of many batch jobs to streaming applications.
Real-time ML with Beam at Lyft
Lyft Marketplace team aims to improve our business efficiency by being nimble to real-world dynamics. Apache Beam has enabled us to meet the goal of having a robust and scalable ML infrastructure for improving model accuracy with features in real-time. These real-time features support critical functions like Forecasting, Primetime, Dispatch.
Real-time Event Stream Processing at Scale for Palo Alto Networks
Palo Alto Networks is a global cybersecurity leader that deals with processing hundreds of billions of security events per day in real-time, which is on the high end of the industry. Apache Beam provides a high-performing, reliable, and resilient data processing framework to support this scale. With Apache Beam, Palo Alto Networks ultimately achieved high performance and low latency, and reduced processing costs by 60%.
Visual Apache Beam Pipeline Design and Orchestration with Apache Hop
Apache Hop is an open source data orchestration and engineering platform that extends Apache Beam with visual pipeline lifecycle management. Neo4j’s Chief Solution Architect and Apache Hop’s co-founder, Matt Casters, sees Apache Beam as a driving force behind Hop.
Scalability and Cost Optimization for Search Engine's Workloads
Dive into the Czech search engine’s experience of scaling the on-premises infrastructure to learn more about the benefits of byte-based data shuffling and the use cases where Apache Beam portability and abstraction bring the utmost value.
Four Apache Technologies Combined for Fun and Profit
Ricardo, the largest online marketplace in Switzerland, uses Apache Beam to stream-process platform data and enables the Data Intelligence team to provide scalable data integration, analytics, and smart services.
Also used by
Akvelon is a software engineering company that helps start-ups, SMBs, and Fortune 500 companies unlock the full potential of cloud, data, and AI/ML to empower their strategic advantage. Akvelon team has deep expertise in integrating Apache Beam with diverse data processing ecosystems and is an enthusiastic Apache Beam community contributor.