public abstract static class BigQueryIO.Write<T> extends PTransform<PCollection<T>,WriteResult>
BigQueryIO.write().| Modifier and Type | Class and Description | 
|---|---|
| static class  | BigQueryIO.Write.CreateDispositionAn enumeration type for the BigQuery create disposition strings. | 
| static class  | BigQueryIO.Write.MethodDetermines the method used to insert data in BigQuery. | 
| static class  | BigQueryIO.Write.SchemaUpdateOptionAn enumeration type for the BigQuery schema update options strings. | 
| static class  | BigQueryIO.Write.WriteDispositionAn enumeration type for the BigQuery write disposition strings. | 
annotations, displayData, name, resourceHints| Constructor and Description | 
|---|
| Write() | 
| Modifier and Type | Method and Description | 
|---|---|
| WriteResult | expand(PCollection<T> input)Override this method to specify how this  PTransformshould be expanded on the givenInputT. | 
| abstract BigQueryIO.Write.Method | getMethod() | 
| @Nullable ValueProvider<TableReference> | getTable()Returns the table reference, or  null. | 
| BigQueryIO.Write<T> | ignoreInsertIds()Setting this option to true disables insertId based data deduplication offered by BigQuery. | 
| BigQueryIO.Write<T> | ignoreUnknownValues()Accept rows that contain values that do not match the schema. | 
| BigQueryIO.Write<T> | optimizedWrites()If true, enables new codepaths that are expected to use less resources while writing to
 BigQuery. | 
| void | populateDisplayData(DisplayData.Builder builder)Register display data for the given transform or component. | 
| BigQueryIO.Write<T> | skipInvalidRows()Insert all valid rows of a request, even if invalid rows exist. | 
| BigQueryIO.Write<T> | to(DynamicDestinations<T,?> dynamicDestinations)Writes to the table and schema specified by the  DynamicDestinationsobject. | 
| BigQueryIO.Write<T> | to(SerializableFunction<ValueInSingleWindow<T>,TableDestination> tableFunction)Writes to table specified by the specified table function. | 
| BigQueryIO.Write<T> | to(java.lang.String tableSpec)Writes to the given table, specified in the format described in  BigQueryHelpers.parseTableSpec(java.lang.String). | 
| BigQueryIO.Write<T> | to(TableReference table)Writes to the given table, specified as a  TableReference. | 
| BigQueryIO.Write<T> | to(ValueProvider<java.lang.String> tableSpec)Same as  to(String), but with aValueProvider. | 
| BigQueryIO.Write<T> | useAvroLogicalTypes()Enables interpreting logical types into their corresponding types (ie. | 
| BigQueryIO.Write<T> | useBeamSchema()If true, then the BigQuery schema will be inferred from the input schema. | 
| void | validate(PipelineOptions pipelineOptions)Called before running the Pipeline to verify this transform is fully and correctly specified. | 
| BigQueryIO.Write<T> | withAutoSchemaUpdate(boolean autoSchemaUpdate)If true, enables automatically detecting BigQuery table schema updates. | 
| BigQueryIO.Write<T> | withAutoSharding()If true, enables using a dynamically determined number of shards to write to BigQuery. | 
| BigQueryIO.Write<T> | withAvroFormatFunction(SerializableFunction<AvroWriteRequest<T>,GenericRecord> avroFormatFunction)Formats the user's type into a  GenericRecordto be written to BigQuery. | 
| BigQueryIO.Write<T> | withAvroSchemaFactory(SerializableFunction<TableSchema,Schema> avroSchemaFactory)Uses the specified function to convert a  TableSchemato aSchema. | 
| <AvroT> BigQueryIO.Write<T> | withAvroWriter(SerializableFunction<AvroWriteRequest<T>,AvroT> avroFormatFunction,
              SerializableFunction<Schema,DatumWriter<AvroT>> writerFactory)Convert's the user's type to an avro record using the supplied avroFormatFunction. | 
| BigQueryIO.Write<T> | withAvroWriter(SerializableFunction<Schema,DatumWriter<T>> writerFactory)Writes the user's type as avro using the supplied  DatumWriter. | 
| BigQueryIO.Write<T> | withClustering()Allows writing to clustered tables when  to(SerializableFunction)orto(DynamicDestinations)is used. | 
| BigQueryIO.Write<T> | withClustering(Clustering clustering)Specifies the clustering fields to use when writing to a single output table. | 
| BigQueryIO.Write<T> | withCreateDisposition(BigQueryIO.Write.CreateDisposition createDisposition)Specifies whether the table should be created if it does not exist. | 
| BigQueryIO.Write<T> | withCustomGcsTempLocation(ValueProvider<java.lang.String> customGcsTempLocation)Provides a custom location on GCS for storing temporary files to be loaded via BigQuery batch
 load jobs. | 
| BigQueryIO.Write<T> | withDefaultMissingValueInterpretation(com.google.cloud.bigquery.storage.v1.AppendRowsRequest.MissingValueInterpretation missingValueInterpretation)Specify how missing values should be interpreted when there is a default value in the schema. | 
| BigQueryIO.Write<T> | withDeterministicRecordIdFn(SerializableFunction<T,java.lang.String> toUniqueIdFunction) | 
| BigQueryIO.Write<T> | withDirectWriteProtos(boolean directWriteProtos) | 
| BigQueryIO.Write<T> | withErrorHandler(ErrorHandler<BadRecord,?> errorHandler) | 
| BigQueryIO.Write<T> | withExtendedErrorInfo()Enables extended error information by enabling  WriteResult.getFailedInsertsWithErr() | 
| BigQueryIO.Write<T> | withFailedInsertRetryPolicy(InsertRetryPolicy retryPolicy)Specifies a policy for handling failed inserts. | 
| BigQueryIO.Write<T> | withFormatFunction(SerializableFunction<T,TableRow> formatFunction)Formats the user's type into a  TableRowto be written to BigQuery. | 
| BigQueryIO.Write<T> | withFormatRecordOnFailureFunction(SerializableFunction<T,TableRow> formatFunction)If an insert failure occurs, this function is applied to the originally supplied row T. | 
| BigQueryIO.Write<T> | withJsonSchema(java.lang.String jsonSchema)Similar to  withSchema(TableSchema)but takes in a JSON-serializedTableSchema. | 
| BigQueryIO.Write<T> | withJsonSchema(ValueProvider<java.lang.String> jsonSchema)Same as  withJsonSchema(String)but using a deferredValueProvider. | 
| BigQueryIO.Write<T> | withJsonTimePartitioning(ValueProvider<java.lang.String> partitioning)The same as  withTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning), but takes a JSON-serialized object. | 
| BigQueryIO.Write<T> | withKmsKey(java.lang.String kmsKey) | 
| BigQueryIO.Write<T> | withLoadJobProjectId(java.lang.String loadJobProjectId)Set the project the BigQuery load job will be initiated from. | 
| BigQueryIO.Write<T> | withLoadJobProjectId(ValueProvider<java.lang.String> loadJobProjectId) | 
| BigQueryIO.Write<T> | withMaxBytesPerPartition(long maxBytesPerPartition)Control how much data will be assigned to a single BigQuery load job. | 
| BigQueryIO.Write<T> | withMaxFilesPerBundle(int maxFilesPerBundle)Control how many files will be written concurrently by a single worker when using BigQuery
 load jobs before spilling to a shuffle. | 
| BigQueryIO.Write<T> | withMaxRetryJobs(int maxRetryJobs)If set, this will set the max number of retry of batch load jobs. | 
| BigQueryIO.Write<T> | withMethod(BigQueryIO.Write.Method method)Choose the method used to write data to BigQuery. | 
| BigQueryIO.Write<T> | withNumFileShards(int numFileShards)Control how many file shards are written when using BigQuery load jobs. | 
| BigQueryIO.Write<T> | withNumStorageWriteApiStreams(int numStorageWriteApiStreams)Control how many parallel streams are used when using Storage API writes. | 
| BigQueryIO.Write<T> | withoutValidation()Disables BigQuery table validation. | 
| BigQueryIO.Write<T> | withPrimaryKey(java.util.List<java.lang.String> primaryKey) | 
| BigQueryIO.Write<T> | withPropagateSuccessfulStorageApiWrites(boolean propagateSuccessfulStorageApiWrites)If set to true, then all successful writes will be propagated to  WriteResultand
 accessible via theWriteResult.getSuccessfulStorageApiInserts()method. | 
| BigQueryIO.Write<T> | withRowMutationInformationFn(SerializableFunction<T,RowMutationInformation> updateFn)Allows upserting and deleting rows for tables with a primary key defined. | 
| BigQueryIO.Write<T> | withSchema(TableSchema schema)Uses the specified schema for rows to be written. | 
| BigQueryIO.Write<T> | withSchema(ValueProvider<TableSchema> schema)Same as  withSchema(TableSchema)but using a deferredValueProvider. | 
| BigQueryIO.Write<T> | withSchemaFromView(PCollectionView<java.util.Map<java.lang.String,java.lang.String>> view)Allows the schemas for each table to be computed within the pipeline itself. | 
| BigQueryIO.Write<T> | withSchemaUpdateOptions(java.util.Set<BigQueryIO.Write.SchemaUpdateOption> schemaUpdateOptions)Allows the schema of the destination table to be updated as a side effect of the write. | 
| BigQueryIO.Write<T> | withSuccessfulInsertsPropagation(boolean propagateSuccessful)If true, it enables the propagation of the successfully inserted TableRows on BigQuery as
 part of the  WriteResultobject when usingBigQueryIO.Write.Method.STREAMING_INSERTS. | 
| BigQueryIO.Write<T> | withTableDescription(java.lang.String tableDescription)Specifies the table description. | 
| BigQueryIO.Write<T> | withTestServices(BigQueryServices testServices) | 
| BigQueryIO.Write<T> | withTimePartitioning(TimePartitioning partitioning)Allows newly created tables to include a  TimePartitioningclass. | 
| BigQueryIO.Write<T> | withTimePartitioning(ValueProvider<TimePartitioning> partitioning)Like  withTimePartitioning(TimePartitioning)but using a deferredValueProvider. | 
| BigQueryIO.Write<T> | withTriggeringFrequency(Duration triggeringFrequency)Choose the frequency at which file writes are triggered. | 
| BigQueryIO.Write<T> | withWriteDisposition(BigQueryIO.Write.WriteDisposition writeDisposition)Specifies what to do with existing data in the table, in case the table already exists. | 
| BigQueryIO.Write<T> | withWriteTempDataset(java.lang.String writeTempDataset)Temporary dataset. | 
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validatepublic abstract BigQueryIO.Write.Method getMethod()
public BigQueryIO.Write<T> to(java.lang.String tableSpec)
BigQueryHelpers.parseTableSpec(java.lang.String).public BigQueryIO.Write<T> to(TableReference table)
TableReference.public BigQueryIO.Write<T> to(ValueProvider<java.lang.String> tableSpec)
to(String), but with a ValueProvider.public BigQueryIO.Write<T> to(SerializableFunction<ValueInSingleWindow<T>,TableDestination> tableFunction)
ValueInSingleWindow, so can be determined by the value or by the window.
 If the function produces destinations configured with clustering fields, ensure that
 withClustering() is also set so that the clustering configurations get properly
 encoded and decoded.
public BigQueryIO.Write<T> to(DynamicDestinations<T,?> dynamicDestinations)
DynamicDestinations object.
 If any of the returned destinations are configured with clustering fields, ensure that the
 passed DynamicDestinations object returns TableDestinationCoderV3 when DynamicDestinations.getDestinationCoder() is called.
public BigQueryIO.Write<T> withFormatFunction(SerializableFunction<T,TableRow> formatFunction)
TableRow to be written to BigQuery.public BigQueryIO.Write<T> withFormatRecordOnFailureFunction(SerializableFunction<T,TableRow> formatFunction)
TableRow will be accessed via WriteResult.getFailedInsertsWithErr().public BigQueryIO.Write<T> withAvroFormatFunction(SerializableFunction<AvroWriteRequest<T>,GenericRecord> avroFormatFunction)
GenericRecord to be written to BigQuery. The
 GenericRecords are written as avro using the standard GenericDatumWriter.
 This is mutually exclusive with withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>), only one may be set.
public BigQueryIO.Write<T> withAvroWriter(SerializableFunction<Schema,DatumWriter<T>> writerFactory)
DatumWriter.
 This is mutually exclusive with withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>), only one may be set.
 
Overwrites withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.io.gcp.bigquery.AvroWriteRequest<T>, org.apache.avro.generic.GenericRecord>) if it has been set.
public <AvroT> BigQueryIO.Write<T> withAvroWriter(SerializableFunction<AvroWriteRequest<T>,AvroT> avroFormatFunction, SerializableFunction<Schema,DatumWriter<AvroT>> writerFactory)
This is mutually exclusive with withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<T, com.google.api.services.bigquery.model.TableRow>), only one may be set.
 
Overwrites withAvroFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.io.gcp.bigquery.AvroWriteRequest<T>, org.apache.avro.generic.GenericRecord>) if it has been set.
public BigQueryIO.Write<T> withAvroSchemaFactory(SerializableFunction<TableSchema,Schema> avroSchemaFactory)
TableSchema to a Schema.
 If not specified, the TableSchema will automatically be converted to an avro schema.
public BigQueryIO.Write<T> withSchema(TableSchema schema)
The schema is required only if writing to a table that does not already exist, and
 BigQueryIO.Write.CreateDisposition is set to BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED.
public BigQueryIO.Write<T> withSchema(ValueProvider<TableSchema> schema)
withSchema(TableSchema) but using a deferred ValueProvider.public BigQueryIO.Write<T> withJsonSchema(java.lang.String jsonSchema)
withSchema(TableSchema) but takes in a JSON-serialized TableSchema.public BigQueryIO.Write<T> withJsonSchema(ValueProvider<java.lang.String> jsonSchema)
withJsonSchema(String) but using a deferred ValueProvider.public BigQueryIO.Write<T> withSchemaFromView(PCollectionView<java.util.Map<java.lang.String,java.lang.String>> view)
The input is a map-valued PCollectionView mapping string tablespecs to
 JSON-formatted TableSchemas. Tablespecs must be in the same format as taken by to(String).
public BigQueryIO.Write<T> withTimePartitioning(TimePartitioning partitioning)
TimePartitioning class. Can only be used
 when writing to a single table. If to(SerializableFunction) or to(DynamicDestinations) is used to write dynamic tables, time partitioning can be directly
 set in the returned TableDestination.public BigQueryIO.Write<T> withTimePartitioning(ValueProvider<TimePartitioning> partitioning)
withTimePartitioning(TimePartitioning) but using a deferred ValueProvider.public BigQueryIO.Write<T> withJsonTimePartitioning(ValueProvider<java.lang.String> partitioning)
withTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning), but takes a JSON-serialized object.public BigQueryIO.Write<T> withClustering(Clustering clustering)
to(SerializableFunction) or to(DynamicDestinations) is used to write to dynamic
 tables, the fields here will be ignored; call withClustering() instead.public BigQueryIO.Write<T> withClustering()
to(SerializableFunction) or to(DynamicDestinations) is used. The returned TableDestination objects should
 specify the clustering fields per table. If writing to a single table, use withClustering(Clustering) instead to pass a Clustering instance that specifies the
 static clustering fields to use.
 Setting this option enables use of TableDestinationCoderV3 which encodes
 clustering information. The updated coder is compatible with non-clustered tables, so can be
 freely set for newly deployed pipelines, but note that pipelines using an older coder must be
 drained before setting this option, since TableDestinationCoderV3 will not be able to
 read state written with a previous version.
public BigQueryIO.Write<T> withCreateDisposition(BigQueryIO.Write.CreateDisposition createDisposition)
public BigQueryIO.Write<T> withWriteDisposition(BigQueryIO.Write.WriteDisposition writeDisposition)
public BigQueryIO.Write<T> withSchemaUpdateOptions(java.util.Set<BigQueryIO.Write.SchemaUpdateOption> schemaUpdateOptions)
This configuration applies only when writing to BigQuery with BigQueryIO.Write.Method.FILE_LOADS as
 method.
public BigQueryIO.Write<T> withTableDescription(java.lang.String tableDescription)
public BigQueryIO.Write<T> withFailedInsertRetryPolicy(InsertRetryPolicy retryPolicy)
Currently this only is allowed when writing an unbounded collection to BigQuery. Bounded collections are written using batch load jobs, so we don't get per-element failures. Unbounded collections are written using streaming inserts, so we have access to per-element insert results.
public BigQueryIO.Write<T> withoutValidation()
public BigQueryIO.Write<T> withMethod(BigQueryIO.Write.Method method)
BigQueryIO.Write.Method for
 information and restrictions of the different methods.public BigQueryIO.Write<T> withRowMutationInformationFn(SerializableFunction<T,RowMutationInformation> updateFn)
This is supported when using the BigQueryIO.Write.Method.STORAGE_API_AT_LEAST_ONCE insert
 method, and with either BigQueryIO.Write.CreateDisposition.CREATE_NEVER or BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED. For CREATE_IF_NEEDED, a primary key must be
 specified using withPrimaryKey(java.util.List<java.lang.String>).
public BigQueryIO.Write<T> withDirectWriteProtos(boolean directWriteProtos)
public BigQueryIO.Write<T> withLoadJobProjectId(java.lang.String loadJobProjectId)
BigQueryIO.Write.Method.FILE_LOADS. If omitted, the project of the
 destination table is used.public BigQueryIO.Write<T> withLoadJobProjectId(ValueProvider<java.lang.String> loadJobProjectId)
public BigQueryIO.Write<T> withTriggeringFrequency(Duration triggeringFrequency)
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS or
 BigQueryIO.Write.Method.STORAGE_WRITE_API, and only when writing an unbounded PCollection.
 
Every triggeringFrequency duration, a BigQuery load job will be generated for all the data written since the last load job. BigQuery has limits on how many load jobs can be triggered per day, so be careful not to set this duration too low, or you may exceed daily quota. Often this is set to 5 or 10 minutes to ensure that the project stays well under the BigQuery quota. See Quota Policy for more information about BigQuery quotas.
public BigQueryIO.Write<T> withNumFileShards(int numFileShards)
withTriggeringFrequency(org.joda.time.Duration). To let runner determine the sharding at
 runtime, set withAutoSharding() instead.public BigQueryIO.Write<T> withNumStorageWriteApiStreams(int numStorageWriteApiStreams)
withTriggeringFrequency(org.joda.time.Duration) is also set. To let runner
 determine the sharding at runtime, set this to zero, or withAutoSharding() instead.public BigQueryIO.Write<T> withPropagateSuccessfulStorageApiWrites(boolean propagateSuccessfulStorageApiWrites)
WriteResult and
 accessible via the WriteResult.getSuccessfulStorageApiInserts() method.public BigQueryIO.Write<T> withCustomGcsTempLocation(ValueProvider<java.lang.String> customGcsTempLocation)
BigQueryIO documentation for discussion.public BigQueryIO.Write<T> withExtendedErrorInfo()
WriteResult.getFailedInsertsWithErr()
 ATM this only works if using BigQueryIO.Write.Method.STREAMING_INSERTS. See withMethod(Method).
public BigQueryIO.Write<T> skipInvalidRows()
BigQueryIO.Write.Method.STREAMING_INSERTS. The default value is false,
 which causes the entire request to fail if any invalid rows exist.public BigQueryIO.Write<T> ignoreUnknownValues()
public BigQueryIO.Write<T> useAvroLogicalTypes()
public BigQueryIO.Write<T> ignoreInsertIds()
public BigQueryIO.Write<T> withKmsKey(java.lang.String kmsKey)
public BigQueryIO.Write<T> withPrimaryKey(java.util.List<java.lang.String> primaryKey)
public BigQueryIO.Write<T> withDefaultMissingValueInterpretation(com.google.cloud.bigquery.storage.v1.AppendRowsRequest.MissingValueInterpretation missingValueInterpretation)
public BigQueryIO.Write<T> optimizedWrites()
public BigQueryIO.Write<T> useBeamSchema()
public BigQueryIO.Write<T> withAutoSharding()
BigQueryIO.Write.Method.FILE_LOADS, BigQueryIO.Write.Method.STREAMING_INSERTS and BigQueryIO.Write.Method.STORAGE_WRITE_API. Only applicable to unbounded data. If using BigQueryIO.Write.Method.FILE_LOADS, numFileShards set via withNumFileShards(int) will be ignored.public BigQueryIO.Write<T> withMaxRetryJobs(int maxRetryJobs)
public BigQueryIO.Write<T> withSuccessfulInsertsPropagation(boolean propagateSuccessful)
WriteResult object when using BigQueryIO.Write.Method.STREAMING_INSERTS. By
 default this property is set on true. In the cases where a pipeline won't make use of the
 insert results this property can be set on false, which will make the pipeline let go of
 those inserted TableRows and reclaim worker resources.public BigQueryIO.Write<T> withAutoSchemaUpdate(boolean autoSchemaUpdate)
public BigQueryIO.Write<T> withDeterministicRecordIdFn(SerializableFunction<T,java.lang.String> toUniqueIdFunction)
public BigQueryIO.Write<T> withTestServices(BigQueryServices testServices)
public BigQueryIO.Write<T> withMaxFilesPerBundle(int maxFilesPerBundle)
public BigQueryIO.Write<T> withMaxBytesPerPartition(long maxBytesPerPartition)
BatchLoads partition exceeds this value, that partition will be
 handled via multiple load jobs.
 The default value (11 TiB) respects BigQuery's maximum size per load job limit and is appropriate for most use cases. Reducing the value of this parameter can improve stability when loading to tables with complex schemas containing thousands of fields.
public BigQueryIO.Write<T> withWriteTempDataset(java.lang.String writeTempDataset)
BigQueryIO.write() will create temporary tables in a dataset to store staging data from
 partitions. With this option, you can set an existing dataset to create the temporary tables.
 BigQueryIO will create temporary tables in that dataset, and will remove it once it is not
 needed. No other tables in the dataset will be modified. Remember that the dataset must exist
 and your job needs permissions to create and remove tables inside that dataset.public BigQueryIO.Write<T> withErrorHandler(ErrorHandler<BadRecord,?> errorHandler)
public void validate(PipelineOptions pipelineOptions)
PTransformBy default, does nothing.
validate in class PTransform<PCollection<T>,WriteResult>public WriteResult expand(PCollection<T> input)
PTransformPTransform should be expanded on the given
 InputT.
 NOTE: This method should not be called directly. Instead apply the PTransform should
 be applied to the InputT using the apply method.
 
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand in class PTransform<PCollection<T>,WriteResult>public void populateDisplayData(DisplayData.Builder builder)
PTransformpopulateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect
 display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace,
 but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace
 of the subcomponent.
 
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData in interface HasDisplayDatapopulateDisplayData in class PTransform<PCollection<T>,WriteResult>builder - The builder to populate with display data.HasDisplayDatapublic @Nullable ValueProvider<TableReference> getTable()
null.