Class PubsubIO

java.lang.Object
org.apache.beam.sdk.io.gcp.pubsub.PubsubIO

public class PubsubIO extends Object
Read and Write PTransforms for Cloud Pub/Sub streams. These transforms create and consume unbounded PCollections.

Using local emulator

In order to use local emulator for Pubsub you should use PubsubOptions#setPubsubRootUrl(String) method to set host and port of your local emulator.

Permissions

Permission requirements depend on the PipelineRunner that is used to execute the Beam pipeline. Please refer to the documentation of corresponding PipelineRunners for more details.

Updates to the I/O connector code

For any significant updates to this I/O connector, please consider involving corresponding code reviewers mentioned here.

Example PubsubIO read usage


 // Read from a specific topic; a subscription will be created at pipeline start time.
 PCollection<PubsubMessage> messages = PubsubIO.readMessages().fromTopic(topic);

 // Read from a subscription.
 PCollection<PubsubMessage> messages = PubsubIO.readMessages().fromSubscription(subscription);

 // Read messages including attributes. All PubSub attributes will be included in the PubsubMessage.
 PCollection<PubsubMessage> messages = PubsubIO.readMessagesWithAttributes().fromTopic(topic);

 // Examples of reading different types from PubSub.
 PCollection<String> strings = PubsubIO.readStrings().fromTopic(topic);
 PCollection<MyProto> protos = PubsubIO.readProtos(MyProto.class).fromTopic(topic);
 PCollection<MyType> avros = PubsubIO.readAvros(MyType.class).fromTopic(topic);

 

Example PubsubIO write usage

Data can be written to a single topic or to a dynamic set of topics. In order to write to a single topic, the PubsubIO.Write.to(String) method can be used. For example:

 avros.apply(PubsubIO.writeAvros(MyType.class).to(topic));
 protos.apply(PubsubIO.writeProtos(MyProto.class).to(topic));
 strings.apply(PubsubIO.writeStrings().to(topic));
 
Dynamic topic destinations can be accomplished by specifying a function to extract the topic from the record using the PubsubIO.Write.to(SerializableFunction) method. For example:

 avros.apply(PubsubIO.writeAvros(MyType.class).
      to((ValueInSingleWindow<Event> quote) -> {
               String country = quote.getCountry();
               return "projects/myproject/topics/events_" + country;
              });
 
Dynamic topics can also be specified by writing PubsubMessage objects containing the topic and writing using the writeMessagesDynamic() method. For example:

 events.apply(MapElements.into(new TypeDescriptor<PubsubMessage>() {})
                         .via(e -> new PubsubMessage(
                             e.toByteString(), Collections.emptyMap()).withTopic(e.getCountry())))
 .apply(PubsubIO.writeMessagesDynamic());
 

Custom timestamps

All messages read from PubSub have a stable publish timestamp that is independent of when the message is read from the PubSub topic. By default, the publish time is used as the timestamp for all messages read and the watermark is based on that. If there is a different logical timestamp to be used, that timestamp must be published in a PubSub attribute and specified using PubsubIO.Read.withTimestampAttribute(java.lang.String). See the Javadoc for that method for the timestamp format.