@Internal @AutoService(value=SchemaIOProvider.class) public class PubsubSchemaIOProvider extends java.lang.Object implements SchemaIOProvider
SchemaIOProvider
for reading and writing JSON payloads with PubsubIO
.
The data schema passed to from(String, Row, Schema)
must either be of the nested or
flat style.
If nested structure is used, the required fields included in the Pubsub message model are 'event_timestamp', 'attributes', and 'payload'.
If flat structure is used, the required fields include just 'event_timestamp'. Every other
field is assumed part of the payload. See PubsubMessageToRow
for details.
configurationSchema()
consists of two attributes, timestampAttributeKey and
deadLetterQueue.
An optional attribute key of the Pubsub message from which to extract the event timestamp. If not specified, the message publish time will be used as event timestamp.
This attribute has to conform to the same requirements as in PubsubIO.Read.withTimestampAttribute(String)
Short version: it has to be either millis since epoch or string in RFC 3339 format.
If the attribute is specified then event timestamps will be extracted from the specified attribute. If it is not specified then message publish timestamp will be used.
deadLetterQueue is an optional topic path which will be used as a dead letter queue.
Messages that cannot be processed will be sent to this topic. If it is not specified then exception will be thrown for errors during processing causing the pipeline to crash.
Modifier and Type | Field and Description |
---|---|
static Schema.FieldType |
TIMESTAMP |
static Schema.FieldType |
VARCHAR |
Constructor and Description |
---|
PubsubSchemaIOProvider() |
Modifier and Type | Method and Description |
---|---|
Schema |
configurationSchema()
Returns the expected schema of the configuration object.
|
org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO |
from(java.lang.String location,
Row configuration,
Schema dataSchema)
Produce a SchemaIO given a String representing the data's location, the schema of the data that
resides there, and some IO-specific configuration object.
|
java.lang.String |
identifier()
Returns an id that uniquely represents this IO.
|
PCollection.IsBounded |
isBounded() |
boolean |
requiresDataSchema()
Indicates whether the dataSchema value is necessary.
|
public static final Schema.FieldType VARCHAR
public static final Schema.FieldType TIMESTAMP
public java.lang.String identifier()
identifier
in interface SchemaIOProvider
public Schema configurationSchema()
configurationSchema
in interface SchemaIOProvider
public org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from(java.lang.String location, Row configuration, Schema dataSchema)
from
in interface SchemaIOProvider
public boolean requiresDataSchema()
SchemaIOProvider
requiresDataSchema
in interface SchemaIOProvider
public PCollection.IsBounded isBounded()
isBounded
in interface SchemaIOProvider