Using the JStorm Runner

The JStorm Runner can be used to execute Beam pipelines using JStorm, while providing:

Like a native JStorm topology, users can execute Beam topology with local mode, standalone cluster or jstorm-on-yarn cluster.

The Beam Capability Matrix documents the currently supported capabilities of the JStorm Runner.

JStorm Runner prerequisites and setup

The JStorm runner currently supports JStorm version 2.5.0-SNAPSHOT.

You can add a dependency on the latest version of the JStorm runner by adding the following to your pom.xml:

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-jstorm</artifactId>
  <version>2.55.1</version>
</dependency>

Deploying JStorm with your application

To run against a Standalone cluster, you can package your program with all Beam dependencies into a fat jar, and then submit the topology with the following command.

jstorm jar WordCount.jar org.apache.beam.examples.WordCount --runner=org.apache.beam.runners.jstorm.JStormRunner

If you don’t want to package a fat jar, you can upload the Beam dependencies onto all cluster nodes($JSTORM_HOME/lib/ext/beam) first. When you submit a topology with argument "--external-libs beam", JStorm will load the Beam dependencies automatically.

jstorm jar WordCount.jar org.apache.beam.examples.WordCount --external-libs beam  --runner=org.apache.beam.runners.jstorm.JStormRunner

Pipeline options for the JStorm Runner

When executing your pipeline with the JStorm Runner, you should consider the following pipeline options.

FieldDescriptionDefault Value
runnerThe pipeline runner to use. This option allows you to determine the pipeline runner at runtime.Set to JStormRunner to run using JStorm.
topologyConfigSystem topology config of JStormDefaultMapValueFactory.class
workerNumberWorker number of topology1
parallelismGlobal parallelism number of a component1
parallelismMapParallelism number of a specified composite PTransformDefaultMapValueFactory.class
exactlyOnceTopologyIndicate if it is an exactly once topologyfalse
localModeIndicate if the topology is running on local machine or distributed clusterfalse
localModeExecuteTimeSecExecuting time(sec) of topology on local mode.60

Additional notes

Monitoring your job

You can monitor your job with the JStorm UI, which displays all JStorm system metrics and Beam metrics. For testing on local mode, you can retrieve the Beam metrics with the metrics method of PipelineResult.