SparkReceiver IO
SparkReceiverIO is a transform for reading data from an Apache Spark Receiver as an unbounded source.
Spark Receivers support
SparkReceiverIO currently supports Apache Spark Receiver.
Requirements for Spark Receiver:
- Version of Spark should be 2.4.*.
Spark Receivershould support work with offsets.Spark Receivershould implement HasOffset interface.- Records should have the numeric field that represents record offset.
For more details please see SparkReceiverIO readme.
Streaming reading using SparkReceiverIO
In order to read from Spark Receiver you will need to pass:
getOffsetFn, which isSerializableFunctionthat defines how to getLongrecord offset from a record.receiverBuilder, which is needed for building instances ofSpark Receiverthat use Apache Beam mechanisms instead of Spark environment.
You can easily create receiverBuilder object by passing the following parameters:
- Class of your
Spark Receiver. - Constructor arguments needed to create an instance of your
Spark Receiver.
For example:
//In this example, MyReceiver accepts a MyConfig object as its only constructor parameter.
MyConfig myPluginConfig = new MyConfig(authToken, apiServerUrl);
Object[] myConstructorArgs = new Object[] {myConfig};
ReceiverBuilder<String, MyReceiver<String>> myReceiverBuilder =
new ReceiverBuilder<>(MyReceiver.class)
.withConstructorArgs(myConstructorArgs);Then you will be able to pass this receiverBuilder object to SparkReceiverIO.
For example:
Read data with optional parameters
Optionally you can pass the following optional parameters:
pullFrequencySec, which is delay in seconds between polling for new records updates.startOffset, which is inclusive start offset from which the reading should be started.timestampFn, which is aSerializableFunctionthat defines how to get anInstant timestampfrom a record.
For example:
Examples for specific Spark Receiver
CDAP Hubspot Receiver
ReceiverBuilder<String, HubspotReceiver<String>> hubspotReceiverBuilder =
new ReceiverBuilder<>(HubspotReceiver.class)
.withConstructorArgs(hubspotConfig);
SparkReceiverIO.Read<String> readTransform =
SparkReceiverIO.<String>read()
.withGetOffsetFn(GetOffsetUtils.getOffsetFnForHubspot())
.withSparkReceiverBuilder(hubspotReceiverBuilder)
p.apply("readFromHubspotReceiver", readTransform);Last updated on 2025/10/29
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!

