Class PubsubUnboundedSink
java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<PubsubMessage>,PDone>
org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSink
- All Implemented Interfaces:
Serializable
,HasDisplayData
A PTransform which streams messages to Pubsub.
- The underlying implementation is just a
GroupByKey
followed by aParDo
which publishes as a side effect. (In the future we want to design and switch to a customUnboundedSink
implementation so as to gain access to system watermark and end-of-pipeline cleanup.) - We try to send messages in batches while also limiting send latency.
- No stats are logged. Rather some counters are used to keep track of elements and batches.
- Though some background threads are used by the underlying netty system all actual Pubsub
calls are blocking. We rely on the underlying runner to allow multiple
DoFn
instances to execute concurrently and hide latency. - A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.
- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
ConstructorsConstructorDescriptionPubsubUnboundedSink
(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey) PubsubUnboundedSink
(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes) PubsubUnboundedSink
(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes, String pubsubRootUrl) PubsubUnboundedSink
(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, String pubsubRootUrl) -
Method Summary
Modifier and TypeMethodDescriptionexpand
(PCollection<PubsubMessage> input) Override this method to specify how thisPTransform
should be expanded on the givenInputT
.Get the id attribute.boolean
Get the timestamp attribute.getTopic()
Get the topic being written to.Get theValueProvider
for the topic being written to.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey) -
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, String pubsubRootUrl) -
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes) -
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes, String pubsubRootUrl)
-
-
Method Details
-
getTopic
Get the topic being written to. -
getTopicProvider
Get theValueProvider
for the topic being written to. -
getTimestampAttribute
Get the timestamp attribute. -
getIdAttribute
Get the id attribute. -
getPublishBatchWithOrderingKey
public boolean getPublishBatchWithOrderingKey() -
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PCollection<PubsubMessage>,
PDone>
-