Apache Beam Typescript SDK quickstart
This quickstart shows you how to run an example pipeline written with the Apache Beam Typescript SDK, using the Direct Runner. The Direct Runner executes pipelines locally on your machine.
If you’re interested in contributing to the Apache Beam Typescript codebase, see the Contribution Guide.
On this page:
Set up your development environment
Make sure you have a Node.js development environment installed. If you don’t, you can download and install it from the downloads page.
Due to its extensive use of cross-language transforms, it is recommended that Python 3 and Java be available on the system as well.
Clone the GitHub repository
Clone or download the
apache/beam-starter-typescript
GitHub repository and change into the beam-starter-typescript
directory.
Install the project dependences
Run the following command to install the project’s dependencies.
Compile the pipeline
The pipeline is then built with
Run the quickstart
Run the following command:
The output is similar to the following:
The lines might appear in a different order.
Explore the code
The main code file for this quickstart is app.ts (GitHub). The code performs the following steps:
- Define a Beam pipeline that.
- Creates an initial
PCollection
. - Applies a transform (map) to the
PCollection
.
- Run the pipeline, using the Direct Runner.
Create a pipeline
A Pipeline
is simply a callable that takes a single root
object.
The Pipeline
function builds up the graph of transformations to be executed.
Create an initial PCollection
The PCollection
abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial PCollection
. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).
This example uses the
Create
method to create a PCollection
from an in-memory array of strings. The
resulting PCollection
contains the strings “Hello”, “World!”, and a
user-provided input string.
root.apply(beam.create(["Hello", "World!", input_text]))
Apply a transform to the PCollection
Transforms can change, filter, group, analyze, or otherwise process the
elements in a PCollection
. This example uses the
Map
transform, which maps the elements of a collection into a new collection:
.map(printAndReturn);
For convenience, PColletion
has a map
method, but more generally transforms
are applied with .apply(someTransform())
.
Run the pipeline
To run the pipeline, a runner is created (possibly with some options)
createRunner(options)
and then its run
method is invoked on the pipeline callable created above.
.run(createPipeline(...));
A Beam runner runs a Beam pipeline on a specific platform. If you don’t specify a runner, the Direct Runner is the default. The Direct Runner runs the pipeline locally on your machine. It is meant for testing and development, rather than being optimized for efficiency. For more information, see Using the Direct Runner.
For production workloads, you typically use a distributed runner that runs the
pipeline on a big data processing system such as Apache Flink, Apache Spark, or
Google Cloud Dataflow. These systems support massively parallel processing.
Different runners can be requested via the runner property on options, e.g.
createRunner({runner: "dataflow"})
or createRunner({runner: "flink"})
.
In this example this value can be passed in via the command line as
--runner=...
, e.g. to run on Dataflow one would write
node dist/src/main.js \
--runner=dataflow \
--project=${PROJECT_ID} \
--tempLocation=gs://${GCS_BUCKET}/wordcount-js/temp --region=${REGION}
Next Steps
- Learn more about the Beam SDK for Typescript and look through the Typescript SDK API reference.
- Take a self-paced tour through our Learning Resources.
- Dive in to some of our favorite Videos and Podcasts.
- Join the Beam users@ mailing list.
Please don’t hesitate to reach out if you encounter any issues!
Last updated on 2024/11/20
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!