Class BigQueryIO.Write<T>
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
BigQueryIO
BigQueryIO.write()
.- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
An enumeration type for the BigQuery create disposition strings.static enum
Determines the method used to insert data in BigQuery.static enum
An enumeration type for the BigQuery schema update options strings.static enum
An enumeration type for the BigQuery write disposition strings. -
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionexpand
(PCollection<T> input) Override this method to specify how thisPTransform
should be expanded on the givenInputT
.abstract BigQueryIO.Write.Method
getTable()
Returns the table reference, ornull
.Setting this option to true disables insertId based data deduplication offered by BigQuery.Accept rows that contain values that do not match the schema.If true, enables new codepaths that are expected to use less resources while writing to BigQuery.void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.Insert all valid rows of a request, even if invalid rows exist.to
(TableReference table) Writes to the given table, specified as aTableReference
.Writes to the given table, specified in the format described inBigQueryHelpers.parseTableSpec(java.lang.String)
.to
(DynamicDestinations<T, ?> dynamicDestinations) Writes to the table and schema specified by theDynamicDestinations
object.to
(ValueProvider<String> tableSpec) Same asto(String)
, but with aValueProvider
.to
(SerializableFunction<ValueInSingleWindow<T>, TableDestination> tableFunction) Writes to table specified by the specified table function.Enables interpreting logical types into their corresponding types (ie.If true, then the BigQuery schema will be inferred from the input schema.void
validate
(PipelineOptions pipelineOptions) Called before running the Pipeline to verify this transform is fully and correctly specified.withAutoSchemaUpdate
(boolean autoSchemaUpdate) If true, enables automatically detecting BigQuery table schema updates.If true, enables using a dynamically determined number of shards to write to BigQuery.withAvroFormatFunction
(SerializableFunction<AvroWriteRequest<T>, GenericRecord> avroFormatFunction) Formats the user's type into aGenericRecord
to be written to BigQuery.withAvroSchemaFactory
(SerializableFunction<@Nullable TableSchema, Schema> avroSchemaFactory) Uses the specified function to convert aTableSchema
to aSchema
.withAvroWriter
(SerializableFunction<Schema, DatumWriter<T>> writerFactory) Writes the user's type as avro using the suppliedDatumWriter
.<AvroT> BigQueryIO.Write
<T> withAvroWriter
(SerializableFunction<AvroWriteRequest<T>, AvroT> avroFormatFunction, SerializableFunction<Schema, DatumWriter<AvroT>> writerFactory) Convert's the user's type to an avro record using the supplied avroFormatFunction.withBigLakeConfiguration
(Map<String, String> bigLakeConfiguration) Specifies a configuration to create BigLake tables.Allows writing to clustered tables whento(SerializableFunction)
orto(DynamicDestinations)
is used.withClustering
(Clustering clustering) Specifies the clustering fields to use when writing to a single output table.withCreateDisposition
(BigQueryIO.Write.CreateDisposition createDisposition) Specifies whether the table should be created if it does not exist.withCustomGcsTempLocation
(ValueProvider<String> customGcsTempLocation) Provides a custom location on GCS for storing temporary files to be loaded via BigQuery batch load jobs.withDefaultMissingValueInterpretation
(com.google.cloud.bigquery.storage.v1.AppendRowsRequest.MissingValueInterpretation missingValueInterpretation) Specify how missing values should be interpreted when there is a default value in the schema.withDeterministicRecordIdFn
(SerializableFunction<T, String> toUniqueIdFunction) withDirectWriteProtos
(boolean directWriteProtos) withErrorHandler
(ErrorHandler<BadRecord, ?> errorHandler) Enables extended error information by enablingWriteResult.getFailedInsertsWithErr()
withFailedInsertRetryPolicy
(InsertRetryPolicy retryPolicy) Specifies a policy for handling failed inserts.withFormatFunction
(SerializableFunction<T, TableRow> formatFunction) Formats the user's type into aTableRow
to be written to BigQuery.withFormatRecordOnFailureFunction
(SerializableFunction<T, TableRow> formatFunction) If an insert failure occurs, this function is applied to the originally supplied T element.withJsonClustering
(ValueProvider<String> jsonClustering) The same aswithClustering(Clustering)
, but takes a JSON-serialized Clustering object in a deferredValueProvider
.withJsonSchema
(String jsonSchema) Similar towithSchema(TableSchema)
but takes in a JSON-serializedTableSchema
.withJsonSchema
(ValueProvider<String> jsonSchema) Same aswithJsonSchema(String)
but using a deferredValueProvider
.withJsonTimePartitioning
(ValueProvider<String> partitioning) The same aswithTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning)
, but takes a JSON-serialized object.withKmsKey
(String kmsKey) withLoadJobProjectId
(String loadJobProjectId) Set the project the BigQuery load job will be initiated from.withLoadJobProjectId
(ValueProvider<String> loadJobProjectId) withMaxBytesPerPartition
(long maxBytesPerPartition) Control how much data will be assigned to a single BigQuery load job.withMaxFileSize
(long maxFileSize) Controls the maximum byte size per file to be loaded into BigQuery.withMaxFilesPerBundle
(int maxFilesPerBundle) Control how many files will be written concurrently by a single worker when using BigQuery load jobs before spilling to a shuffle.withMaxFilesPerPartition
(int maxFilesPerPartition) Controls how many files will be assigned to a single BigQuery load job.withMaxRetryJobs
(int maxRetryJobs) If set, this will set the max number of retry of batch load jobs.withMethod
(BigQueryIO.Write.Method method) Choose the method used to write data to BigQuery.withNumFileShards
(int numFileShards) Control how many file shards are written when using BigQuery load jobs.withNumStorageWriteApiStreams
(int numStorageWriteApiStreams) Control how many parallel streams are used when using Storage API writes.Disables BigQuery table validation.withPrimaryKey
(List<String> primaryKey) withPropagateSuccessfulStorageApiWrites
(boolean propagateSuccessfulStorageApiWrites) If set to true, then all successful writes will be propagated toWriteResult
and accessible via theWriteResult.getSuccessfulStorageApiInserts()
method.withPropagateSuccessfulStorageApiWrites
(Predicate<String> columnsToPropagate) If called, then all successful writes will be propagated toWriteResult
and accessible via theWriteResult.getSuccessfulStorageApiInserts()
method.Allows upserting and deleting rows for tables with a primary key defined.withSchema
(TableSchema schema) Uses the specified schema for rows to be written.withSchema
(ValueProvider<TableSchema> schema) Same aswithSchema(TableSchema)
but using a deferredValueProvider
.withSchemaFromView
(PCollectionView<Map<String, String>> view) Allows the schemas for each table to be computed within the pipeline itself.withSchemaUpdateOptions
(Set<BigQueryIO.Write.SchemaUpdateOption> schemaUpdateOptions) Allows the schema of the destination table to be updated as a side effect of the write.withSuccessfulInsertsPropagation
(boolean propagateSuccessful) If true, it enables the propagation of the successfully inserted TableRows on BigQuery as part of theWriteResult
object when usingBigQueryIO.Write.Method.STREAMING_INSERTS
.withTableDescription
(String tableDescription) Specifies the table description.withTestServices
(BigQueryServices testServices) withTimePartitioning
(TimePartitioning partitioning) Allows newly created tables to include aTimePartitioning
class.withTimePartitioning
(ValueProvider<TimePartitioning> partitioning) LikewithTimePartitioning(TimePartitioning)
but using a deferredValueProvider
.withTriggeringFrequency
(Duration triggeringFrequency) Choose the frequency at which file writes are triggered.withWriteDisposition
(BigQueryIO.Write.WriteDisposition writeDisposition) Specifies what to do with existing data in the table, in case the table already exists.withWriteTempDataset
(String writeTempDataset) Temporary dataset.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate
-
Constructor Details
-
Write
public Write()
-
-
Method Details
-
getMethod
-
to
Writes to the given table, specified in the format described inBigQueryHelpers.parseTableSpec(java.lang.String)
. -
to
Writes to the given table, specified as aTableReference
. -
to
Same asto(String)
, but with aValueProvider
. -
to
public BigQueryIO.Write<T> to(SerializableFunction<ValueInSingleWindow<T>, TableDestination> tableFunction) Writes to table specified by the specified table function. The table is a function ofValueInSingleWindow
, so can be determined by the value or by the window.If the function produces destinations configured with clustering fields, ensure that
withClustering()
is also set so that the clustering configurations get properly encoded and decoded. -
to
Writes to the table and schema specified by theDynamicDestinations
object.If any of the returned destinations are configured with clustering fields, ensure that the passed
DynamicDestinations
object returnsTableDestinationCoderV3
whenDynamicDestinations.getDestinationCoder()
is called. -
withFormatFunction
Formats the user's type into aTableRow
to be written to BigQuery. -
withFormatRecordOnFailureFunction
public BigQueryIO.Write<T> withFormatRecordOnFailureFunction(SerializableFunction<T, TableRow> formatFunction) If an insert failure occurs, this function is applied to the originally supplied T element.For
BigQueryIO.Write.Method.STREAMING_INSERTS
method, the resultingTableRow
will be accessed viaWriteResult.getFailedInsertsWithErr()
.For
BigQueryIO.Write.Method.STORAGE_WRITE_API
andBigQueryIO.Write.Method.STORAGE_API_AT_LEAST_ONCE
methods, the resultingTableRow
will be accessed viaWriteResult.getFailedStorageApiInserts()
. -
withAvroFormatFunction
public BigQueryIO.Write<T> withAvroFormatFunction(SerializableFunction<AvroWriteRequest<T>, GenericRecord> avroFormatFunction) Formats the user's type into aGenericRecord
to be written to BigQuery. The GenericRecords are written as avro using the standardGenericDatumWriter
.This is mutually exclusive with
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>)
, only one may be set. -
withAvroWriter
public BigQueryIO.Write<T> withAvroWriter(SerializableFunction<Schema, DatumWriter<T>> writerFactory) Writes the user's type as avro using the suppliedDatumWriter
.This is mutually exclusive with
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>)
, only one may be set.Overwrites
withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.io.gcp.bigquery.AvroWriteRequest<T>, org.apache.avro.generic.GenericRecord>)
if it has been set. -
withAvroWriter
public <AvroT> BigQueryIO.Write<T> withAvroWriter(SerializableFunction<AvroWriteRequest<T>, AvroT> avroFormatFunction, SerializableFunction<Schema, DatumWriter<AvroT>> writerFactory) Convert's the user's type to an avro record using the supplied avroFormatFunction. Records are then written using the supplied writer instances returned from writerFactory.This is mutually exclusive with
withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>)
, only one may be set.Overwrites
withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.io.gcp.bigquery.AvroWriteRequest<T>, org.apache.avro.generic.GenericRecord>)
if it has been set. -
withAvroSchemaFactory
public BigQueryIO.Write<T> withAvroSchemaFactory(SerializableFunction<@Nullable TableSchema, Schema> avroSchemaFactory) Uses the specified function to convert aTableSchema
to aSchema
.If not specified, the TableSchema will automatically be converted to an avro schema.
-
withSchema
Uses the specified schema for rows to be written.The schema is required only if writing to a table that does not already exist, and
BigQueryIO.Write.CreateDisposition
is set toBigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED
. -
withSchema
Same aswithSchema(TableSchema)
but using a deferredValueProvider
. -
withJsonSchema
Similar towithSchema(TableSchema)
but takes in a JSON-serializedTableSchema
. -
withJsonSchema
Same aswithJsonSchema(String)
but using a deferredValueProvider
. -
withSchemaFromView
Allows the schemas for each table to be computed within the pipeline itself.The input is a map-valued
PCollectionView
mapping string tablespecs to JSON-formattedTableSchema
s. Tablespecs must be in the same format as taken byto(String)
. -
withTimePartitioning
Allows newly created tables to include aTimePartitioning
class. Can only be used when writing to a single table. Ifto(SerializableFunction)
orto(DynamicDestinations)
is used to write dynamic tables, time partitioning can be directly set in the returnedTableDestination
. -
withTimePartitioning
LikewithTimePartitioning(TimePartitioning)
but using a deferredValueProvider
. -
withJsonTimePartitioning
The same aswithTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning)
, but takes a JSON-serialized object. -
withClustering
Specifies the clustering fields to use when writing to a single output table. Ifto(SerializableFunction)
orto(DynamicDestinations)
is used to write to dynamic tables, the fields here will be ignored; callwithClustering()
instead. -
withJsonClustering
The same aswithClustering(Clustering)
, but takes a JSON-serialized Clustering object in a deferredValueProvider
. For example: `"{"fields": ["column1", "column2", "column3"]}"` -
withClustering
Allows writing to clustered tables whento(SerializableFunction)
orto(DynamicDestinations)
is used. The returnedTableDestination
objects should specify the clustering fields per table. If writing to a single table, usewithClustering(Clustering)
instead to pass aClustering
instance that specifies the static clustering fields to use.Setting this option enables use of
TableDestinationCoderV3
which encodes clustering information. The updated coder is compatible with non-clustered tables, so can be freely set for newly deployed pipelines, but note that pipelines using an older coder must be drained before setting this option, sinceTableDestinationCoderV3
will not be able to read state written with a previous version. -
withCreateDisposition
public BigQueryIO.Write<T> withCreateDisposition(BigQueryIO.Write.CreateDisposition createDisposition) Specifies whether the table should be created if it does not exist. -
withWriteDisposition
Specifies what to do with existing data in the table, in case the table already exists. -
withSchemaUpdateOptions
public BigQueryIO.Write<T> withSchemaUpdateOptions(Set<BigQueryIO.Write.SchemaUpdateOption> schemaUpdateOptions) Allows the schema of the destination table to be updated as a side effect of the write.This configuration applies only when writing to BigQuery with
BigQueryIO.Write.Method.FILE_LOADS
as method. -
withTableDescription
Specifies the table description. -
withBigLakeConfiguration
Specifies a configuration to create BigLake tables. The following options are available:- connectionId (REQUIRED): the name of your cloud resource connection.
- storageUri (REQUIRED): the path to your GCS folder where data will be written to. This
sink will create sub-folders for each project, dataset, and table destination. Example:
if you specify a storageUri of
"gs://foo/bar"
and writing to table"my_project.my_dataset.my_table"
, your data will be written under"gs://foo/bar/my_project/my_dataset/my_table/"
- fileFormat (OPTIONAL): defaults to
"parquet"
- tableFormat (OPTIONAL): defaults to
"iceberg"
NOTE: This is only supported with the Storage Write API methods.
- See Also:
-
withFailedInsertRetryPolicy
Specifies a policy for handling failed inserts.Currently this only is allowed when writing an unbounded collection to BigQuery. Bounded collections are written using batch load jobs, so we don't get per-element failures. Unbounded collections are written using streaming inserts, so we have access to per-element insert results.
-
withoutValidation
Disables BigQuery table validation. -
withMethod
Choose the method used to write data to BigQuery. See the Javadoc onBigQueryIO.Write.Method
for information and restrictions of the different methods. -
withRowMutationInformationFn
public BigQueryIO.Write<T> withRowMutationInformationFn(SerializableFunction<T, RowMutationInformation> updateFn) Allows upserting and deleting rows for tables with a primary key defined. Provides a mapping function that determines how a row is applied to BigQuery (upsert, or delete) along with a sequence number for ordering operations.This is supported when using the
BigQueryIO.Write.Method.STORAGE_API_AT_LEAST_ONCE
insert method, and with eitherBigQueryIO.Write.CreateDisposition.CREATE_NEVER
orBigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED
. For CREATE_IF_NEEDED, a primary key must be specified usingwithPrimaryKey(java.util.List<java.lang.String>)
. -
withDirectWriteProtos
-
withLoadJobProjectId
Set the project the BigQuery load job will be initiated from. This is only applicable when the write method is set toBigQueryIO.Write.Method.FILE_LOADS
. If omitted, the project of the destination table is used. -
withLoadJobProjectId
-
withTriggeringFrequency
Choose the frequency at which file writes are triggered.This is only applicable when the write method is set to
BigQueryIO.Write.Method.FILE_LOADS
orBigQueryIO.Write.Method.STORAGE_WRITE_API
, and only when writing an unboundedPCollection
.Every triggeringFrequency duration, a BigQuery load job will be generated for all the data written since the last load job. BigQuery has limits on how many load jobs can be triggered per day, so be careful not to set this duration too low, or you may exceed daily quota. Often this is set to 5 or 10 minutes to ensure that the project stays well under the BigQuery quota. See Quota Policy for more information about BigQuery quotas.
-
withNumFileShards
Control how many file shards are written when using BigQuery load jobs. Applicable only when also settingwithTriggeringFrequency(org.joda.time.Duration)
. To let runner determine the sharding at runtime, setwithAutoSharding()
instead. -
withNumStorageWriteApiStreams
Control how many parallel streams are used when using Storage API writes.For streaming pipelines, and when
withTriggeringFrequency(org.joda.time.Duration)
is also set. To let runner determine the sharding at runtime, set this to zero, orwithAutoSharding()
instead.For batch pipelines, it inserts a redistribute. To not reshufle and keep the pipeline parallelism as is, set this to zero.
-
withPropagateSuccessfulStorageApiWrites
public BigQueryIO.Write<T> withPropagateSuccessfulStorageApiWrites(boolean propagateSuccessfulStorageApiWrites) If set to true, then all successful writes will be propagated toWriteResult
and accessible via theWriteResult.getSuccessfulStorageApiInserts()
method. -
withPropagateSuccessfulStorageApiWrites
public BigQueryIO.Write<T> withPropagateSuccessfulStorageApiWrites(Predicate<String> columnsToPropagate) If called, then all successful writes will be propagated toWriteResult
and accessible via theWriteResult.getSuccessfulStorageApiInserts()
method. The predicate allows filtering out columns from appearing in the resulting PCollection. The argument to the predicate is the name of the field to potentially be included in the output. Nested fields will be presented using . notation - e.g. a.b.c. If you want a nested field included, you must ensure that the predicate returns true for every parent field. e.g. if you want field "a.b.c" included, the predicate must return true for "a" for "a.b" and for "a.b.c".The predicate will be invoked repeatedly for every field in every message, so it is recommended that it be as lightweight as possible. e.g. looking up fields in a hash table instead of searching a list of field names.
-
withCustomGcsTempLocation
Provides a custom location on GCS for storing temporary files to be loaded via BigQuery batch load jobs. See "Usage with templates" inBigQueryIO
documentation for discussion. -
withExtendedErrorInfo
Enables extended error information by enablingWriteResult.getFailedInsertsWithErr()
ATM this only works if using
BigQueryIO.Write.Method.STREAMING_INSERTS
. SeewithMethod(Method)
. -
skipInvalidRows
Insert all valid rows of a request, even if invalid rows exist. This is only applicable when the write method is set toBigQueryIO.Write.Method.STREAMING_INSERTS
. The default value is false, which causes the entire request to fail if any invalid rows exist. -
ignoreUnknownValues
Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors. -
useAvroLogicalTypes
Enables interpreting logical types into their corresponding types (ie. TIMESTAMP), instead of only using their raw types (ie. LONG). -
ignoreInsertIds
Setting this option to true disables insertId based data deduplication offered by BigQuery. For more information, please see https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication. -
withKmsKey
-
withPrimaryKey
-
withDefaultMissingValueInterpretation
public BigQueryIO.Write<T> withDefaultMissingValueInterpretation(com.google.cloud.bigquery.storage.v1.AppendRowsRequest.MissingValueInterpretation missingValueInterpretation) Specify how missing values should be interpreted when there is a default value in the schema. Options are to take the default value or to write an explicit null (not an option of the field is also required.). Note: this is only used when using one of the storage write API insert methods. -
optimizedWrites
If true, enables new codepaths that are expected to use less resources while writing to BigQuery. Not enabled by default in order to maintain backwards compatibility. -
useBeamSchema
If true, then the BigQuery schema will be inferred from the input schema. If no formatFunction is set, then BigQueryIO will automatically turn the input records into TableRows that match the schema. -
withAutoSharding
If true, enables using a dynamically determined number of shards to write to BigQuery. This can be used forBigQueryIO.Write.Method.FILE_LOADS
,BigQueryIO.Write.Method.STREAMING_INSERTS
andBigQueryIO.Write.Method.STORAGE_WRITE_API
. Only applicable to unbounded data. If usingBigQueryIO.Write.Method.FILE_LOADS
, numFileShards set viawithNumFileShards(int)
will be ignored. -
withMaxRetryJobs
If set, this will set the max number of retry of batch load jobs. -
withSuccessfulInsertsPropagation
If true, it enables the propagation of the successfully inserted TableRows on BigQuery as part of theWriteResult
object when usingBigQueryIO.Write.Method.STREAMING_INSERTS
. By default this property is set on true. In the cases where a pipeline won't make use of the insert results this property can be set on false, which will make the pipeline let go of those inserted TableRows and reclaim worker resources. -
withAutoSchemaUpdate
If true, enables automatically detecting BigQuery table schema updates. Table schema updates are usually noticed within several minutes. Only supported when using one of the STORAGE_API insert methods. -
withDeterministicRecordIdFn
public BigQueryIO.Write<T> withDeterministicRecordIdFn(SerializableFunction<T, String> toUniqueIdFunction) -
withTestServices
-
withMaxFilesPerBundle
Control how many files will be written concurrently by a single worker when using BigQuery load jobs before spilling to a shuffle. When data comes into this transform, it is written to one file per destination per worker. When there are more files than maxFilesPerBundle (DEFAULT: 20), the data is shuffled (i.e. Grouped By Destination), and written to files one-by-one-per-worker. This flag sets the maximum number of files that a single worker can write concurrently before shuffling the data. This flag should be used with caution. Setting a high number can increase the memory pressure on workers, and setting a low number can make a pipeline slower (due to the need to shuffle data). -
withMaxFileSize
Controls the maximum byte size per file to be loaded into BigQuery. If the amount of data written to one file reaches this threshold, we will close that file and continue writing in a new file.The default value (4 TiB) respects BigQuery's maximum number of source URIs per job configuration.
- See Also:
-
withMaxFilesPerPartition
Controls how many files will be assigned to a single BigQuery load job. If the number of files increases past this threshold, we will spill it over into multiple load jobs as necessary.The default value (10,000 files) respects BigQuery's maximum number of source URIs per job configuration.
- See Also:
-
withMaxBytesPerPartition
Control how much data will be assigned to a single BigQuery load job. If the amount of data flowing into oneBatchLoads
partition exceeds this value, that partition will be handled via multiple load jobs.The default value (11 TiB) respects BigQuery's maximum size per load job limit and is appropriate for most use cases. Reducing the value of this parameter can improve stability when loading to tables with complex schemas containing thousands of fields.
- See Also:
-
withWriteTempDataset
Temporary dataset. When writing to BigQuery from large file loads, theBigQueryIO.write()
will create temporary tables in a dataset to store staging data from partitions. With this option, you can set an existing dataset to create the temporary tables. BigQueryIO will create temporary tables in that dataset, and will remove it once it is not needed. No other tables in the dataset will be modified. Remember that the dataset must exist and your job needs permissions to create and remove tables inside that dataset. -
withErrorHandler
-
validate
Description copied from class:PTransform
Called before running the Pipeline to verify this transform is fully and correctly specified.By default, does nothing.
- Overrides:
validate
in classPTransform<PCollection<T>,
WriteResult>
-
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PCollection<T>,
WriteResult>
-
populateDisplayData
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PCollection<T>,
WriteResult> - Parameters:
builder
- The builder to populate with display data.- See Also:
-
getTable
Returns the table reference, ornull
.
-