Using the JStorm Runner

The JStorm Runner can be used to execute Beam pipelines using JStorm, while providing:

Like a native JStorm topology, users can execute Beam topology with local mode, standalone cluster or jstorm-on-yarn cluster.

The Beam Capability Matrix documents the currently supported capabilities of the JStorm Runner.

JStorm Runner prerequisites and setup

The JStorm runner currently supports JStorm version 2.5.0-SNAPSHOT.

You can add a dependency on the latest version of the JStorm runner by adding the following to your pom.xml:

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-jstorm</artifactId>
  <version>2.2.0</version>
</dependency>

Deploying JStorm with your application

To run against a Standalone cluster, you can package your program with all Beam dependencies into a fat jar, and then submit the topology with the following command.

jstorm jar WordCount.jar org.apache.beam.examples.WordCount --runner=org.apache.beam.runners.jstorm.JStormRunner

If you don’t want to package a fat jar, you can upload the Beam dependencies onto all cluster nodes($JSTORM_HOME/lib/ext/beam) first. When you submit a topology with argument "--external-libs beam", JStorm will load the Beam dependencies automatically.

jstorm jar WordCount.jar org.apache.beam.examples.WordCount --external-libs beam  --runner=org.apache.beam.runners.jstorm.JStormRunner

To learn about deploying a JStorm cluster, please refer to JStorm cluster deploy

Pipeline options for the JStorm Runner

When executing your pipeline with the JStorm Runner, you should consider the following pipeline options.

Field Description Default Value
runner The pipeline runner to use. This option allows you to determine the pipeline runner at runtime. Set to JStormRunner to run using JStorm.
topologyConfig System topology config of JStorm DefaultMapValueFactory.class
workerNumber Worker number of topology 1
parallelism Global parallelism number of a component 1
parallelismMap Parallelism number of a specified composite PTransform DefaultMapValueFactory.class
exactlyOnceTopology Indicate if it is an exactly once topology false
localMode Indicate if the topology is running on local machine or distributed cluster false
localModeExecuteTimeSec Executing time(sec) of topology on local mode. 60

Additional notes

Monitoring your job

You can monitor your job with the JStorm UI, which displays all JStorm system metrics and Beam metrics. For testing on local mode, you can retreive the Beam metrics with the metrics method of PipelineResult.