Class PubsubIO.Read<T>

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<T>>
org.apache.beam.sdk.io.gcp.pubsub.PubsubIO.Read<T>
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
PubsubIO

public abstract static class PubsubIO.Read<T> extends PTransform<PBegin,PCollection<T>>
Implementation of read methods.
See Also:
  • Constructor Details

    • Read

      public Read()
  • Method Details

    • fromSubscription

      public PubsubIO.Read<T> fromSubscription(String subscription)
      Reads from the given subscription.

      See PubsubIO.PubsubSubscription.fromPath(String) for more details on the format of the subscription string.

      Multiple readers reading from the same subscription will each receive some arbitrary portion of the data. Most likely, separate readers should use their own subscriptions.

    • fromSubscription

      public PubsubIO.Read<T> fromSubscription(ValueProvider<String> subscription)
      Like subscription() but with a ValueProvider.
    • fromTopic

      public PubsubIO.Read<T> fromTopic(String topic)
      Creates and returns a transform for reading from a Cloud Pub/Sub topic. Mutually exclusive with fromSubscription(String).

      See PubsubIO.PubsubTopic.fromPath(String) for more details on the format of the topic string.

      The Beam runner will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by the runner.

    • fromTopic

      public PubsubIO.Read<T> fromTopic(ValueProvider<String> topic)
    • withDeadLetterTopic

      public PubsubIO.Read<T> withDeadLetterTopic(String deadLetterTopic)
      Creates and returns a transform for writing read failures out to a dead-letter topic.

      The message written to the dead-letter will contain three attributes:

      • exceptionClassName: The type of exception that was thrown.
      • exceptionMessage: The message in the exception
      • pubsubMessageId: The message id of the original Pub/Sub message if it was read in, otherwise ""

      The PubsubClient.PubsubClientFactory used in the PubsubIO.Write transform for errors will be the same as used in the final PubsubIO.Read transform.

      If there might be a parsing error (or similar), then this should be set up on the topic to avoid wasting resources and to provide more error details with the message written to Pub/Sub. Otherwise, the Pub/Sub topic should have a dead-letter configuration set up to avoid an infinite retry loop.

      Only failures that result from the PubsubIO.Read configuration (e.g. parsing errors) will be sent to the dead-letter topic. Errors that occur after a successful read will need to set up their own PubsubIO.Write transform. Errors with delivery require configuring Pub/Sub itself to write to the dead-letter topic after a certain number of failed attempts.

      See PubsubIO.PubsubTopic.fromPath(String) for more details on the format of the deadLetterTopic string.

      This functionality is mutually exclusive with withErrorHandler(ErrorHandler)

    • withDeadLetterTopic

      public PubsubIO.Read<T> withDeadLetterTopic(ValueProvider<String> deadLetterTopic)
    • withClientFactory

      public PubsubIO.Read<T> withClientFactory(PubsubClient.PubsubClientFactory factory)
      The default client to write to Pub/Sub is the PubsubJsonClient, created by the
      invalid reference
      PubsubJsonClient.PubsubJsonClientFactory
      . This function allows to change the Pub/Sub client by providing another PubsubClient.PubsubClientFactory like the
      invalid reference
      PubsubGrpcClientFactory
      .
    • withTimestampAttribute

      public PubsubIO.Read<T> withTimestampAttribute(String timestampAttribute)
      When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message attributes, specifies the name of the attribute that contains the timestamp.

      The timestamp value is expected to be represented in the attribute as either:

      • a numerical value representing the number of milliseconds since the Unix epoch. For example, if using the Joda time classes, Instant.getMillis() returns the correct value for this attribute.
      • a String in RFC 3339 format. For example, 2015-10-29T23:41:41.123Z. The sub-second component of the timestamp is optional, and digits beyond the first three (i.e., time units smaller than milliseconds) will be ignored.

      If timestampAttribute is not provided, the timestamp will be taken from the Pubsub message's publish timestamp. All windowing will be done relative to these timestamps.

      By default, windows are emitted based on an estimate of when this source is likely done producing data for a given timestamp (referred to as the Watermark; see AfterWatermark for more details). Any late data will be handled by the trigger specified with the windowing strategy – by default it will be output immediately.

      Note that the system can guarantee that no late data will ever be seen when it assigns timestamps by arrival time (i.e. timestampAttribute is not provided).

      See Also:
    • withIdAttribute

      public PubsubIO.Read<T> withIdAttribute(String idAttribute)
      When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub message attributes, specifies the name of the attribute containing the unique identifier. The value of the attribute can be any string that uniquely identifies this record.

      Pub/Sub cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. If idAttribute is not provided, Beam cannot guarantee that no duplicate data will be delivered, and deduplication of the stream will be strictly best effort.

    • withCoderAndParseFn

      public PubsubIO.Read<T> withCoderAndParseFn(Coder<T> coder, SimpleFunction<PubsubMessage,T> parseFn)
      Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the given parsing function to transform the PubsubMessage into an output type. A Coder for the output type T must be registered or set on the output via PCollection.setCoder(Coder).
    • withErrorHandler

      public PubsubIO.Read<T> withErrorHandler(ErrorHandler<BadRecord,?> badRecordErrorHandler)
      Configures the PubSub read with an alternate error handler. When a message is read from PubSub, but fails to parse, the message and the parse failure information will be sent to the error handler. See ErrorHandler for more details on configuring an Error Handler. This functionality is mutually exclusive with withDeadLetterTopic(String).
    • withValidation

      public PubsubIO.Read<T> withValidation()
      Enable validation of the PubSub Read.
    • expand

      public PCollection<T> expand(PBegin input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PBegin,PCollection<T>>
    • validate

      public void validate(PipelineOptions options)
      Description copied from class: PTransform
      Called before running the Pipeline to verify this transform is fully and correctly specified.

      By default, does nothing.

      Overrides:
      validate in class PTransform<PBegin,PCollection<T>>
    • populateDisplayData

      public void populateDisplayData(DisplayData.Builder builder)
      Description copied from class: PTransform
      Register display data for the given transform or component.

      populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

      By default, does not register any display data. Implementors may override this method to provide their own display data.

      Specified by:
      populateDisplayData in interface HasDisplayData
      Overrides:
      populateDisplayData in class PTransform<PBegin,PCollection<T>>
      Parameters:
      builder - The builder to populate with display data.
      See Also: