Class PubsubSchemaIOProvider
- All Implemented Interfaces:
SchemaIOProvider
SchemaIOProvider
for reading and writing JSON/AVRO payloads with
PubsubIO
.
Schema
The data schema passed to from(String, Row, Schema)
must either be of the nested or
flat style.
Nested style
If nested structure is used, the required fields included in the Pubsub message model are 'event_timestamp', 'attributes', and 'payload'.
Flat style
If flat structure is used, the required fields include just 'event_timestamp'. Every other
field is assumed part of the payload. See PubsubMessageToRow
for details.
Configuration
configurationSchema()
consists of two attributes, timestampAttributeKey and
deadLetterQueue.
timestampAttributeKey
An optional attribute key of the Pubsub message from which to extract the event timestamp. If not specified, the message publish time will be used as event timestamp.
This attribute has to conform to the same requirements as in PubsubIO.Read.withTimestampAttribute(String)
Short version: it has to be either millis since epoch or string in RFC 3339 format.
If the attribute is specified then event timestamps will be extracted from the specified attribute. If it is not specified then message publish timestamp will be used.
deadLetterQueue
deadLetterQueue is an optional topic path which will be used as a dead letter queue.
Messages that cannot be processed will be sent to this topic. If it is not specified then exception will be thrown for errors during processing causing the pipeline to crash.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Schema
static final Schema.FieldType
static final Schema.FieldType
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the expected schema of the configuration object.org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO
Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.Returns an id that uniquely represents this IO.boolean
Indicates whether the dataSchema value is necessary.
-
Field Details
-
ATTRIBUTE_MAP_FIELD_TYPE
-
ATTRIBUTE_ARRAY_ENTRY_SCHEMA
-
ATTRIBUTE_ARRAY_FIELD_TYPE
-
-
Constructor Details
-
PubsubSchemaIOProvider
public PubsubSchemaIOProvider()
-
-
Method Details
-
identifier
Returns an id that uniquely represents this IO.- Specified by:
identifier
in interfaceSchemaIOProvider
-
configurationSchema
Returns the expected schema of the configuration object. Note this is distinct from the schema of the data source itself.- Specified by:
configurationSchema
in interfaceSchemaIOProvider
-
from
public org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from(String location, Row configuration, Schema dataSchema) Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.- Specified by:
from
in interfaceSchemaIOProvider
-
requiresDataSchema
public boolean requiresDataSchema()Description copied from interface:SchemaIOProvider
Indicates whether the dataSchema value is necessary.- Specified by:
requiresDataSchema
in interfaceSchemaIOProvider
-
isBounded
- Specified by:
isBounded
in interfaceSchemaIOProvider
-