public abstract static class PubsubIO.Read<T> extends PTransform<PBegin,PCollection<T>>
name, resourceHints
Constructor and Description |
---|
Read() |
Modifier and Type | Method and Description |
---|---|
PCollection<T> |
expand(PBegin input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
PubsubIO.Read<T> |
fromSubscription(java.lang.String subscription)
Reads from the given subscription.
|
PubsubIO.Read<T> |
fromSubscription(ValueProvider<java.lang.String> subscription)
Like
subscription() but with a ValueProvider . |
PubsubIO.Read<T> |
fromTopic(java.lang.String topic)
Creates and returns a transform for reading from a Cloud Pub/Sub topic.
|
PubsubIO.Read<T> |
fromTopic(ValueProvider<java.lang.String> topic)
Like
fromTopic(String) but with a ValueProvider . |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
PubsubIO.Read<T> |
withClientFactory(PubsubClient.PubsubClientFactory factory)
The default client to write to Pub/Sub is the
PubsubJsonClient , created by the PubsubJsonClient.PubsubJsonClientFactory . |
PubsubIO.Read<T> |
withCoderAndParseFn(Coder<T> coder,
SimpleFunction<PubsubMessage,T> parseFn)
Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the
given parsing function to transform the PubsubMessage into an output type.
|
PubsubIO.Read<T> |
withDeadLetterTopic(java.lang.String deadLetterTopic)
Creates and returns a transform for writing read failures out to a dead-letter topic.
|
PubsubIO.Read<T> |
withDeadLetterTopic(ValueProvider<java.lang.String> deadLetterTopic)
Like
withDeadLetterTopic(String) but with a ValueProvider . |
PubsubIO.Read<T> |
withIdAttribute(java.lang.String idAttribute)
When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub
message attributes, specifies the name of the attribute containing the unique identifier.
|
PubsubIO.Read<T> |
withTimestampAttribute(java.lang.String timestampAttribute)
When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message
attributes, specifies the name of the attribute that contains the timestamp.
|
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setResourceHints, toString, validate, validate
public PubsubIO.Read<T> fromSubscription(java.lang.String subscription)
See PubsubIO.PubsubSubscription.fromPath(String)
for more details on the format of
the subscription
string.
Multiple readers reading from the same subscription will each receive some arbitrary portion of the data. Most likely, separate readers should use their own subscriptions.
public PubsubIO.Read<T> fromSubscription(ValueProvider<java.lang.String> subscription)
subscription()
but with a ValueProvider
.public PubsubIO.Read<T> fromTopic(java.lang.String topic)
fromSubscription(String)
.
See PubsubIO.PubsubTopic.fromPath(String)
for more details on the format of the
topic
string.
The Beam runner will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by the runner.
public PubsubIO.Read<T> fromTopic(ValueProvider<java.lang.String> topic)
fromTopic(String)
but with a ValueProvider
.public PubsubIO.Read<T> withDeadLetterTopic(java.lang.String deadLetterTopic)
The message written to the dead-letter will contain three attributes:
The PubsubClient.PubsubClientFactory
used in the PubsubIO.Write
transform for
errors will be the same as used in the final PubsubIO.Read
transform.
If there might be a parsing error (or similar), then this should be set up on the topic to avoid wasting resources and to provide more error details with the message written to Pub/Sub. Otherwise, the Pub/Sub topic should have a dead-letter configuration set up to avoid an infinite retry loop.
Only failures that result from the PubsubIO.Read
configuration (e.g. parsing errors) will
be sent to the dead-letter topic. Errors that occur after a successful read will need to set
up their own PubsubIO.Write
transform. Errors with delivery require configuring Pub/Sub itself
to write to the dead-letter topic after a certain number of failed attempts.
See PubsubIO.PubsubTopic.fromPath(String)
for more details on the format of the
deadLetterTopic
string.
public PubsubIO.Read<T> withDeadLetterTopic(ValueProvider<java.lang.String> deadLetterTopic)
withDeadLetterTopic(String)
but with a ValueProvider
.public PubsubIO.Read<T> withClientFactory(PubsubClient.PubsubClientFactory factory)
PubsubJsonClient
, created by the PubsubJsonClient.PubsubJsonClientFactory
. This function allows to change the Pub/Sub client
by providing another PubsubClient.PubsubClientFactory
like the PubsubGrpcClientFactory
.public PubsubIO.Read<T> withTimestampAttribute(java.lang.String timestampAttribute)
The timestamp value is expected to be represented in the attribute as either:
Instant.getMillis()
returns the
correct value for this attribute.
2015-10-29T23:41:41.123Z
. The
sub-second component of the timestamp is optional, and digits beyond the first three
(i.e., time units smaller than milliseconds) will be ignored.
If timestampAttribute
is not provided, the timestamp will be taken from the Pubsub
message's publish timestamp. All windowing will be done relative to these timestamps.
By default, windows are emitted based on an estimate of when this source is likely done
producing data for a given timestamp (referred to as the Watermark; see AfterWatermark
for more details). Any late data will be handled by the trigger specified
with the windowing strategy – by default it will be output immediately.
Note that the system can guarantee that no late data will ever be seen when it assigns
timestamps by arrival time (i.e. timestampAttribute
is not provided).
public PubsubIO.Read<T> withIdAttribute(java.lang.String idAttribute)
Pub/Sub cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream.
If idAttribute
is not provided, Beam cannot guarantee that no duplicate data will be
delivered, and deduplication of the stream will be strictly best effort.
public PubsubIO.Read<T> withCoderAndParseFn(Coder<T> coder, SimpleFunction<PubsubMessage,T> parseFn)
PCollection.setCoder(Coder)
.public PCollection<T> expand(PBegin input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PBegin,PCollection<T>>
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PBegin,PCollection<T>>
builder
- The builder to populate with display data.HasDisplayData