Using the Apache Gearpump Runner

The Apache Gearpump Runner can be used to execute Beam pipelines using Apache Gearpump (incubating). When you are running your pipeline with Gearpump Runner you just need to create a jar file containing your job and then it can be executed on a regular Gearpump distributed cluster, or a local cluster which is useful for development and debugging of your pipeline.

The Gearpump Runner and Gearpump are suitable for large scale, continuous jobs, and provide:

The Beam Capability Matrix documents the currently supported capabilities of the Gearpump Runner.

Writing Beam Pipeline with Gearpump Runner

To use the Gearpump Runner in a distributed mode, you have to setup a Gearpump cluster first by following the Gearpump setup quickstart.

Suppose you are writing a Beam pipeline, you can add a dependency on the latest version of the Gearpump runner by adding to your pom.xml to enable Gearpump runner. And your Beam application should also pack Beam SDK explicitly and here is a snippet of example pom.xml:

<dependencies>
  <dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-runners-gearpump</artifactId>
    <version>2.1.0</version>
  </dependency>

  <dependency>
    <groupId>org.apache.gearpump</groupId>
    <artifactId>gearpump-streaming_2.11</artifactId>
    <version>${gearpump.version}</version>
    <scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>org.apache.gearpump</groupId>
    <artifactId>gearpump-core_2.11</artifactId>
    <version>${gearpump.version}</version>
    <scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-core</artifactId>
    <version>2.1.0</version>
  </dependency>
</dependencies>

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <configuration>
        <createDependencyReducedPom>false</createDependencyReducedPom>
        <filters>
          <filter>
            <artifact>*:*</artifact>
            <excludes>
              <exclude>META-INF/*.SF</exclude>
              <exclude>META-INF/*.DSA</exclude>
              <exclude>META-INF/*.RSA</exclude>
            </excludes>
          </filter>
        </filters>
      </configuration>
      <executions>
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <shadedArtifactAttached>true</shadedArtifactAttached>
            <shadedClassifierName>shaded</shadedClassifierName>
          </configuration>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

After running mvn package, run ls target and you should see your application jar like:

{your_application}-{version}-shaded.jar

Executing the pipeline on a Gearpump cluster

To run against a Gearpump cluster simply run:

gear app -jar /path/to/{your_application}-{version}-shaded.jar com.beam.examples.BeamPipeline --runner=GearpumpRunner ...

Monitoring your application

You can monitor a running Gearpump application using Gearpump’s Dashboard. Please follow the Gearpump Start UI to start the dashboard.

Pipeline options for the Gearpump Runner

When executing your pipeline with the Gearpump Runner, you should consider the following pipeline options.

Field Description Default Value
runner The pipeline runner to use. This option allows you to determine the pipeline runner at runtime. Set to GearpumpRunner to run using Gearpump.
parallelism The parallelism for Gearpump processor. 1
applicationName The application name for Gearpump runner. beam_gearpump_app