Using the JStorm Runner
The JStorm Runner can be used to execute Beam pipelines using JStorm, while providing:
- High throughput and low latency.
- At-least-once and exactly-once fault tolerance.
Like a native JStorm topology, users can execute Beam topology with local mode, standalone cluster or jstorm-on-yarn cluster.
The Beam Capability Matrix documents the currently supported capabilities of the JStorm Runner.
JStorm Runner prerequisites and setup
The JStorm runner currently supports JStorm version 2.5.0-SNAPSHOT.
You can add a dependency on the latest version of the JStorm runner by adding the following to your pom.xml:
Deploying JStorm with your application
To run against a Standalone cluster, you can package your program with all Beam dependencies into a fat jar, and then submit the topology with the following command.
jstorm jar WordCount.jar org.apache.beam.examples.WordCount --runner=org.apache.beam.runners.jstorm.JStormRunner
If you don’t want to package a fat jar, you can upload the Beam dependencies onto all cluster nodes($JSTORM_HOME/lib/ext/beam
) first.
When you submit a topology with argument "--external-libs beam"
, JStorm will load the Beam dependencies automatically.
jstorm jar WordCount.jar org.apache.beam.examples.WordCount --external-libs beam --runner=org.apache.beam.runners.jstorm.JStormRunner
Pipeline options for the JStorm Runner
When executing your pipeline with the JStorm Runner, you should consider the following pipeline options.
Field | Description | Default Value |
---|---|---|
runner | The pipeline runner to use. This option allows you to determine the pipeline runner at runtime. | Set to JStormRunner to run using JStorm. |
topologyConfig | System topology config of JStorm | DefaultMapValueFactory.class |
workerNumber | Worker number of topology | 1 |
parallelism | Global parallelism number of a component | 1 |
parallelismMap | Parallelism number of a specified composite PTransform | DefaultMapValueFactory.class |
exactlyOnceTopology | Indicate if it is an exactly once topology | false |
localMode | Indicate if the topology is running on local machine or distributed cluster | false |
localModeExecuteTimeSec | Executing time(sec) of topology on local mode. | 60 |
Additional notes
Monitoring your job
You can monitor your job with the JStorm UI, which displays all JStorm system metrics and Beam metrics. For testing on local mode, you can retrieve the Beam metrics with the metrics method of PipelineResult.
Last updated on 2024/04/22
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!