Class PubsubUnboundedSource
java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<PubsubMessage>>
org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSource
- All Implemented Interfaces:
Serializable
,HasDisplayData
Users should use
instead.
invalid reference
PubsubIO#read
A PTransform which streams messages from Pubsub.
- The underlying implementation in an
UnboundedSource
which receives messages in batches and hands them out one at a time. - The watermark (either in Pubsub processing time or custom timestamp time) is estimated by keeping track of the minimum of the last minutes worth of messages. This assumes Pubsub delivers the oldest (in Pubsub processing time) available message at least once a minute, and that custom timestamps are 'mostly' monotonic with Pubsub processing time. Unfortunately both of those assumptions are fragile. Thus the estimated watermark may get ahead of the 'true' watermark and cause some messages to be late.
- Checkpoints are used both to ACK received messages back to Pubsub (so that they may be retired on the Pubsub end), and to NACK already consumed messages should a checkpoint need to be restored (so that Pubsub will resend those messages promptly).
- The backlog is determined by each reader using the messages which have been pulled from Pubsub but not yet consumed downstream. The backlog does not take account of any messages queued by Pubsub for the subscription. Unfortunately there is currently no API to determine the size of the Pubsub queue's backlog.
- The subscription must already exist.
- The subscription timeout is read whenever a reader is started. However it is not checked thereafter despite the timeout being user-changeable on-the-fly.
- We log vital stats every 30 seconds.
- Though some background threads may be used by the underlying transport all Pubsub calls are
blocking. We rely on the underlying runner to allow multiple
UnboundedSource.UnboundedReader
instances to execute concurrently and thus hide latency.
- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
ConstructorsConstructorDescriptionPubsubUnboundedSource
(com.google.api.client.util.Clock clock, PubsubClient.PubsubClientFactory pubsubFactory, @Nullable ValueProvider<PubsubClient.ProjectPath> project, @Nullable ValueProvider<PubsubClient.TopicPath> topic, @Nullable ValueProvider<PubsubClient.SubscriptionPath> subscription, @Nullable String timestampAttribute, @Nullable String idAttribute, boolean needsAttributes) Construct an unbounded source to consume from the Pubsubsubscription
.PubsubUnboundedSource
(PubsubClient.PubsubClientFactory pubsubFactory, @Nullable ValueProvider<PubsubClient.ProjectPath> project, @Nullable ValueProvider<PubsubClient.TopicPath> topic, @Nullable ValueProvider<PubsubClient.SubscriptionPath> subscription, @Nullable String timestampAttribute, @Nullable String idAttribute, boolean needsAttributes) Construct an unbounded source to consume from the Pubsubsubscription
.PubsubUnboundedSource
(PubsubClient.PubsubClientFactory pubsubFactory, @Nullable ValueProvider<PubsubClient.ProjectPath> project, @Nullable ValueProvider<PubsubClient.TopicPath> topic, @Nullable ValueProvider<PubsubClient.SubscriptionPath> subscription, @Nullable String timestampAttribute, @Nullable String idAttribute, boolean needsAttributes, boolean needsMessageId) Construct an unbounded source to consume from the Pubsubsubscription
. -
Method Summary
Modifier and TypeMethodDescriptionOverride this method to specify how thisPTransform
should be expanded on the givenInputT
.Get the id attribute.boolean
boolean
boolean
Get the project path.Get the subscription being read from.Get theValueProvider
for the subscription being read from.Get the timestamp attribute.getTopic()
Get the topic being read from.Get theValueProvider
for the topic being read from.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
PubsubUnboundedSource
public PubsubUnboundedSource(PubsubClient.PubsubClientFactory pubsubFactory, @Nullable ValueProvider<PubsubClient.ProjectPath> project, @Nullable ValueProvider<PubsubClient.TopicPath> topic, @Nullable ValueProvider<PubsubClient.SubscriptionPath> subscription, @Nullable String timestampAttribute, @Nullable String idAttribute, boolean needsAttributes) Construct an unbounded source to consume from the Pubsubsubscription
. -
PubsubUnboundedSource
public PubsubUnboundedSource(com.google.api.client.util.Clock clock, PubsubClient.PubsubClientFactory pubsubFactory, @Nullable ValueProvider<PubsubClient.ProjectPath> project, @Nullable ValueProvider<PubsubClient.TopicPath> topic, @Nullable ValueProvider<PubsubClient.SubscriptionPath> subscription, @Nullable String timestampAttribute, @Nullable String idAttribute, boolean needsAttributes) Construct an unbounded source to consume from the Pubsubsubscription
. -
PubsubUnboundedSource
public PubsubUnboundedSource(PubsubClient.PubsubClientFactory pubsubFactory, @Nullable ValueProvider<PubsubClient.ProjectPath> project, @Nullable ValueProvider<PubsubClient.TopicPath> topic, @Nullable ValueProvider<PubsubClient.SubscriptionPath> subscription, @Nullable String timestampAttribute, @Nullable String idAttribute, boolean needsAttributes, boolean needsMessageId) Construct an unbounded source to consume from the Pubsubsubscription
.
-
-
Method Details
-
getProject
Get the project path. -
getTopic
Get the topic being read from. -
getTopicProvider
Get theValueProvider
for the topic being read from. -
getSubscription
Get the subscription being read from. -
getSubscriptionProvider
Get theValueProvider
for the subscription being read from. -
getTimestampAttribute
Get the timestamp attribute. -
getIdAttribute
Get the id attribute. -
getNeedsAttributes
public boolean getNeedsAttributes() -
getNeedsMessageId
public boolean getNeedsMessageId() -
getNeedsOrderingKey
public boolean getNeedsOrderingKey() -
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PBegin,
PCollection<PubsubMessage>>
-