Class PubsubIO.Read<T>
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
PubsubIO
- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionOverride this method to specify how thisPTransform
should be expanded on the givenInputT
.fromSubscription
(String subscription) Reads from the given subscription.fromSubscription
(ValueProvider<String> subscription) Likesubscription()
but with aValueProvider
.Creates and returns a transform for reading from a Cloud Pub/Sub topic.fromTopic
(ValueProvider<String> topic) LikefromTopic(String)
but with aValueProvider
.void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.void
validate
(PipelineOptions options) Called before running the Pipeline to verify this transform is fully and correctly specified.The default client to write to Pub/Sub is thePubsubJsonClient
, created by theinvalid reference
PubsubJsonClient.PubsubJsonClientFactory
withCoderAndParseFn
(Coder<T> coder, SimpleFunction<PubsubMessage, T> parseFn) Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the given parsing function to transform the PubsubMessage into an output type.withDeadLetterTopic
(String deadLetterTopic) Creates and returns a transform for writing read failures out to a dead-letter topic.withDeadLetterTopic
(ValueProvider<String> deadLetterTopic) LikewithDeadLetterTopic(String)
but with aValueProvider
.withErrorHandler
(ErrorHandler<BadRecord, ?> badRecordErrorHandler) Configures the PubSub read with an alternate error handler.withIdAttribute
(String idAttribute) When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub message attributes, specifies the name of the attribute containing the unique identifier.withTimestampAttribute
(String timestampAttribute) When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message attributes, specifies the name of the attribute that contains the timestamp.Enable validation of the PubSub Read.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate
-
Constructor Details
-
Read
public Read()
-
-
Method Details
-
fromSubscription
Reads from the given subscription.See
PubsubIO.PubsubSubscription.fromPath(String)
for more details on the format of thesubscription
string.Multiple readers reading from the same subscription will each receive some arbitrary portion of the data. Most likely, separate readers should use their own subscriptions.
-
fromSubscription
Likesubscription()
but with aValueProvider
. -
fromTopic
Creates and returns a transform for reading from a Cloud Pub/Sub topic. Mutually exclusive withfromSubscription(String)
.See
PubsubIO.PubsubTopic.fromPath(String)
for more details on the format of thetopic
string.The Beam runner will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by the runner.
-
fromTopic
LikefromTopic(String)
but with aValueProvider
. -
withDeadLetterTopic
Creates and returns a transform for writing read failures out to a dead-letter topic.The message written to the dead-letter will contain three attributes:
- exceptionClassName: The type of exception that was thrown.
- exceptionMessage: The message in the exception
- pubsubMessageId: The message id of the original Pub/Sub message if it was read in,
otherwise "
"
The
PubsubClient.PubsubClientFactory
used in thePubsubIO.Write
transform for errors will be the same as used in the finalPubsubIO.Read
transform.If there might be a parsing error (or similar), then this should be set up on the topic to avoid wasting resources and to provide more error details with the message written to Pub/Sub. Otherwise, the Pub/Sub topic should have a dead-letter configuration set up to avoid an infinite retry loop.
Only failures that result from the
PubsubIO.Read
configuration (e.g. parsing errors) will be sent to the dead-letter topic. Errors that occur after a successful read will need to set up their ownPubsubIO.Write
transform. Errors with delivery require configuring Pub/Sub itself to write to the dead-letter topic after a certain number of failed attempts.See
PubsubIO.PubsubTopic.fromPath(String)
for more details on the format of thedeadLetterTopic
string.This functionality is mutually exclusive with
withErrorHandler(ErrorHandler)
-
withDeadLetterTopic
LikewithDeadLetterTopic(String)
but with aValueProvider
. -
withClientFactory
The default client to write to Pub/Sub is thePubsubJsonClient
, created by theinvalid reference
PubsubJsonClient.PubsubJsonClientFactory
PubsubClient.PubsubClientFactory
like theinvalid reference
PubsubGrpcClientFactory
-
withTimestampAttribute
When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message attributes, specifies the name of the attribute that contains the timestamp.The timestamp value is expected to be represented in the attribute as either:
- a numerical value representing the number of milliseconds since the Unix epoch. For
example, if using the Joda time classes,
Instant.getMillis()
returns the correct value for this attribute. - a String in RFC 3339 format. For example,
2015-10-29T23:41:41.123Z
. The sub-second component of the timestamp is optional, and digits beyond the first three (i.e., time units smaller than milliseconds) will be ignored.
If
timestampAttribute
is not provided, the timestamp will be taken from the Pubsub message's publish timestamp. All windowing will be done relative to these timestamps.By default, windows are emitted based on an estimate of when this source is likely done producing data for a given timestamp (referred to as the Watermark; see
AfterWatermark
for more details). Any late data will be handled by the trigger specified with the windowing strategy – by default it will be output immediately.Note that the system can guarantee that no late data will ever be seen when it assigns timestamps by arrival time (i.e.
timestampAttribute
is not provided).- See Also:
- a numerical value representing the number of milliseconds since the Unix epoch. For
example, if using the Joda time classes,
-
withIdAttribute
When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub message attributes, specifies the name of the attribute containing the unique identifier. The value of the attribute can be any string that uniquely identifies this record.Pub/Sub cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. If
idAttribute
is not provided, Beam cannot guarantee that no duplicate data will be delivered, and deduplication of the stream will be strictly best effort. -
withCoderAndParseFn
public PubsubIO.Read<T> withCoderAndParseFn(Coder<T> coder, SimpleFunction<PubsubMessage, T> parseFn) Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the given parsing function to transform the PubsubMessage into an output type. A Coder for the output type T must be registered or set on the output viaPCollection.setCoder(Coder)
. -
withErrorHandler
Configures the PubSub read with an alternate error handler. When a message is read from PubSub, but fails to parse, the message and the parse failure information will be sent to the error handler. SeeErrorHandler
for more details on configuring an Error Handler. This functionality is mutually exclusive withwithDeadLetterTopic(String)
. -
withValidation
Enable validation of the PubSub Read. -
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PBegin,
PCollection<T>>
-
validate
Description copied from class:PTransform
Called before running the Pipeline to verify this transform is fully and correctly specified.By default, does nothing.
- Overrides:
validate
in classPTransform<PBegin,
PCollection<T>>
-
populateDisplayData
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PBegin,
PCollection<T>> - Parameters:
builder
- The builder to populate with display data.- See Also:
-