Class PubsubUnboundedSink
java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<PubsubMessage>,PDone>
org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSink
- All Implemented Interfaces:
Serializable,HasDisplayData
A PTransform which streams messages to Pubsub.
- The underlying implementation is just a
GroupByKeyfollowed by aParDowhich publishes as a side effect. (In the future we want to design and switch to a customUnboundedSinkimplementation so as to gain access to system watermark and end-of-pipeline cleanup.) - We try to send messages in batches while also limiting send latency.
- No stats are logged. Rather some counters are used to keep track of elements and batches.
- Though some background threads are used by the underlying netty system all actual Pubsub
calls are blocking. We rely on the underlying runner to allow multiple
DoFninstances to execute concurrently and hide latency. - A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.
- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Constructor Summary
ConstructorsConstructorDescriptionPubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey) PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes) PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes, String pubsubRootUrl) PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, String pubsubRootUrl) -
Method Summary
Modifier and TypeMethodDescriptionexpand(PCollection<PubsubMessage> input) Override this method to specify how thisPTransformshould be expanded on the givenInputT.Get the id attribute.booleanGet the timestamp attribute.getTopic()Get the topic being written to.Get theValueProviderfor the topic being written to.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey) -
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, String pubsubRootUrl) -
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes) -
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<PubsubClient.TopicPath> topic, String timestampAttribute, String idAttribute, int numShards, boolean publishBatchWithOrderingKey, int publishBatchSize, int publishBatchBytes, String pubsubRootUrl)
-
-
Method Details
-
getTopic
Get the topic being written to. -
getTopicProvider
Get theValueProviderfor the topic being written to. -
getTimestampAttribute
Get the timestamp attribute. -
getIdAttribute
Get the id attribute. -
getPublishBatchWithOrderingKey
public boolean getPublishBatchWithOrderingKey() -
expand
Description copied from class:PTransformOverride this method to specify how thisPTransformshould be expanded on the givenInputT.NOTE: This method should not be called directly. Instead apply the
PTransformshould be applied to theInputTusing theapplymethod.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expandin classPTransform<PCollection<PubsubMessage>,PDone>
-