An Interactive Overview of Beam
Here you can find a collection of the interactive notebooks available for Apache Beam, which are hosted in Colab. The notebooks allow you to interactively play with the code and see how your changes affect the pipeline. You don’t need to install anything or modify your computer in any way to use these notebooks.
You can also try an Apache Beam pipeline using the Java, Python, and Go SDKs.
Get started
Learn the basics
In this notebook we go through the basics of what is Apache Beam and how to get started.
We learn what is a data pipeline, a PCollection, a PTransform, as well as some basic transforms like Map
, FlatMap
, Filter
, Combine
, and GroupByKey
.
Run in Colab |
Reading and writing data
In this notebook we go through some examples on how to read and write data to and from different data formats.
We introduce the built-in ReadFromText
and WriteToText
transforms.
We also see how we can read from CSV files, read from a SQLite database, write fixed-sized batches of elements, and write windows of elements.
Run in Colab |
Windowing
In this notebook we go through how to aggregate data based on time intervals, or in streaming pipelines.
We introduce the GlobalWindow
, FixedWindows
, SlidingWindows
, and Sessions
.
Run in Colab |
DataFrames
Beam DataFrames provide a pandas-like DataFrame API to declare Beam pipelines. To learn more about Beam DataFrames, take a look at the Beam DataFrames overview page.
Run in Colab |
Transforms
Check the Python transform catalog for a complete list of the available transforms.
Element-wise transforms
Map
Applies a simple one-to-one mapping function over each element in the collection.
Run in Colab |
FlatMap
Applies a simple one-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.
Run in Colab |
Filter
Given a predicate, filter out all elements that don’t satisfy that predicate.
Run in Colab |
Partition
Separates elements in a collection into multiple output collections.
Run in Colab |
ParDo
A transform for generic parallel processing. It’s recommended to use Map
, FlatMap
, Filter
or other more specific transforms when possible.
Run in Colab |
Last updated on 2024/10/05
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!