An Interactive Overview of Beam

Here you can find a collection of the interactive notebooks available for Apache Beam, which are hosted in Colab. The notebooks allow you to interactively play with the code and see how your changes affect the pipeline. You don’t need to install anything or modify your computer in any way to use these notebooks.

You can also try an Apache Beam pipeline using the Java, Python, and Go SDKs.

Get started

Learn the basics

In this notebook we go through the basics of what is Apache Beam and how to get started. We learn what is a data pipeline, a PCollection, a PTransform, as well as some basic transforms like Map, FlatMap, Filter, Combine, and GroupByKey.

Run in Colab Run in Colab





Reading and writing data

In this notebook we go through some examples on how to read and write data to and from different data formats. We introduce the built-in ReadFromText and WriteToText transforms. We also see how we can read from CSV files, read from a SQLite database, write fixed-sized batches of elements, and write windows of elements.

Run in Colab Run in Colab





Windowing

In this notebook we go through how to aggregate data based on time intervals, or in streaming pipelines. We introduce the GlobalWindow, FixedWindows, SlidingWindows, and Sessions.

Run in Colab Run in Colab





DataFrames

Beam DataFrames provide a pandas-like DataFrame API to declare Beam pipelines. To learn more about Beam DataFrames, take a look at the Beam DataFrames overview page.

Run in Colab Run in Colab





Transforms

Check the Python transform catalog for a complete list of the available transforms.

Element-wise transforms

Map

Applies a simple one-to-one mapping function over each element in the collection.

Run in Colab Run in Colab





FlatMap

Applies a simple one-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.

Run in Colab Run in Colab





Filter

Given a predicate, filter out all elements that don’t satisfy that predicate.

Run in Colab Run in Colab





Partition

Separates elements in a collection into multiple output collections.

Run in Colab Run in Colab





ParDo

A transform for generic parallel processing. It’s recommended to use Map, FlatMap, Filter or other more specific transforms when possible.

Run in Colab Run in Colab