Class PubsubSchemaIOProvider
- All Implemented Interfaces:
SchemaIOProvider
SchemaIOProvider for reading and writing JSON/AVRO payloads with
PubsubIO.
Schema
The data schema passed to from(String, Row, Schema) must either be of the nested or
flat style.
Nested style
If nested structure is used, the required fields included in the Pubsub message model are 'event_timestamp', 'attributes', and 'payload'.
Flat style
If flat structure is used, the required fields include just 'event_timestamp'. Every other
field is assumed part of the payload. See PubsubMessageToRow for details.
Configuration
configurationSchema() consists of two attributes, timestampAttributeKey and
deadLetterQueue.
timestampAttributeKey
An optional attribute key of the Pubsub message from which to extract the event timestamp. If not specified, the message publish time will be used as event timestamp.
This attribute has to conform to the same requirements as in PubsubIO.Read.withTimestampAttribute(String)
Short version: it has to be either millis since epoch or string in RFC 3339 format.
If the attribute is specified then event timestamps will be extracted from the specified attribute. If it is not specified then message publish timestamp will be used.
deadLetterQueue
deadLetterQueue is an optional topic path which will be used as a dead letter queue.
Messages that cannot be processed will be sent to this topic. If it is not specified then exception will be thrown for errors during processing causing the pipeline to crash.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Schemastatic final Schema.FieldTypestatic final Schema.FieldType -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the expected schema of the configuration object.org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIOProduce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.Returns an id that uniquely represents this IO.booleanIndicates whether the dataSchema value is necessary.
-
Field Details
-
ATTRIBUTE_MAP_FIELD_TYPE
-
ATTRIBUTE_ARRAY_ENTRY_SCHEMA
-
ATTRIBUTE_ARRAY_FIELD_TYPE
-
-
Constructor Details
-
PubsubSchemaIOProvider
public PubsubSchemaIOProvider()
-
-
Method Details
-
identifier
Returns an id that uniquely represents this IO.- Specified by:
identifierin interfaceSchemaIOProvider
-
configurationSchema
Returns the expected schema of the configuration object. Note this is distinct from the schema of the data source itself.- Specified by:
configurationSchemain interfaceSchemaIOProvider
-
from
public org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from(String location, Row configuration, Schema dataSchema) Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.- Specified by:
fromin interfaceSchemaIOProvider
-
requiresDataSchema
public boolean requiresDataSchema()Description copied from interface:SchemaIOProviderIndicates whether the dataSchema value is necessary.- Specified by:
requiresDataSchemain interfaceSchemaIOProvider
-
isBounded
- Specified by:
isBoundedin interfaceSchemaIOProvider
-