public abstract static class AvroIO.TypedWrite<UserT,DestinationT,OutputT> extends PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
AvroIO.write(java.lang.Class<T>)
.annotations, displayData, name, resourceHints
Constructor and Description |
---|
TypedWrite() |
Modifier and Type | Method and Description |
---|---|
WriteFilesResult<DestinationT> |
expand(PCollection<UserT> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
<NewDestinationT> |
to(DynamicAvroDestinations<UserT,NewDestinationT,OutputT> dynamicDestinations)
Deprecated.
Use
FileIO.write() or FileIO.writeDynamic() instead. |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
to(FileBasedSink.FilenamePolicy filenamePolicy)
Writes to files named according to the given
FileBasedSink.FilenamePolicy . |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
to(ResourceId outputPrefix)
Writes to file(s) with the given output prefix.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
to(java.lang.String outputPrefix)
Writes to file(s) with the given output prefix.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
to(ValueProvider<java.lang.String> outputPrefix)
Like
to(String) . |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
toResource(ValueProvider<ResourceId> outputPrefix)
Like
to(ResourceId) . |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withBadRecordErrorHandler(ErrorHandler<BadRecord,?> errorHandler)
See
FileIO.Write#withBadRecordErrorHandler(ErrorHandler) for details on usage. |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withCodec(CodecFactory codec)
Writes to Avro file(s) compressed using specified codec.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withDatumWriterFactory(AvroSink.DatumWriterFactory<OutputT> datumWriterFactory)
Specifies a
AvroSink.DatumWriterFactory to use for creating DatumWriter instances. |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withFormatFunction(@Nullable SerializableFunction<UserT,OutputT> formatFunction)
Specifies a format function to convert
UserT to the output type. |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withMetadata(java.util.Map<java.lang.String,java.lang.Object> metadata)
Writes to Avro file(s) with the specified metadata.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withNoSpilling()
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withNumShards(int numShards)
Configures the number of output shards produced overall (when using unwindowed writes) or
per-window (when using windowed writes).
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withoutSharding()
Forces a single file as output and empty shard name template.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withSchema(Schema schema)
Sets the output schema.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withShardNameTemplate(java.lang.String shardTemplate)
Uses the given
ShardNameTemplate for naming output files. |
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withSuffix(java.lang.String filenameSuffix)
Configures the filename suffix for written files.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withSyncInterval(int syncInterval)
Sets the approximate number of uncompressed bytes to write in each block for the AVRO
container format.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withTempDirectory(ResourceId tempDirectory)
Set the base directory used to generate temporary files.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withTempDirectory(ValueProvider<ResourceId> tempDirectory)
Set the base directory used to generate temporary files.
|
AvroIO.TypedWrite<UserT,DestinationT,OutputT> |
withWindowedWrites()
Preserves windowing of input elements and writes them to files based on the element's window.
|
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> to(java.lang.String outputPrefix)
FileSystems
for information on
supported file systems.
The name of the output files will be determined by the FileBasedSink.FilenamePolicy
used.
By default, a DefaultFilenamePolicy
will build output filenames using the
specified prefix, a shard name template (see withShardNameTemplate(String)
, and a
common suffix (if supplied using withSuffix(String)
). This default can be overridden
using #to(FilenamePolicy)
.
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> to(ResourceId outputPrefix)
FileSystems
for information on
supported file systems. This prefix is used by the DefaultFilenamePolicy
to generate
filenames.
By default, a DefaultFilenamePolicy
will build output filenames using the
specified prefix, a shard name template (see withShardNameTemplate(String)
, and a
common suffix (if supplied using withSuffix(String)
). This default can be overridden
using #to(FilenamePolicy)
.
This default policy can be overridden using #to(FilenamePolicy)
, in which case
withShardNameTemplate(String)
and withSuffix(String)
should not be set.
Custom filename policies do not automatically see this prefix - you should explicitly pass
the prefix into your FileBasedSink.FilenamePolicy
object if you need this.
If withTempDirectory(org.apache.beam.sdk.options.ValueProvider<org.apache.beam.sdk.io.fs.ResourceId>)
has not been called, this filename prefix will be used to
infer a directory for temporary files.
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> to(ValueProvider<java.lang.String> outputPrefix)
to(String)
.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> toResource(ValueProvider<ResourceId> outputPrefix)
to(ResourceId)
.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> to(FileBasedSink.FilenamePolicy filenamePolicy)
FileBasedSink.FilenamePolicy
. A directory for
temporary files must be specified using withTempDirectory(org.apache.beam.sdk.options.ValueProvider<org.apache.beam.sdk.io.fs.ResourceId>)
.@Deprecated public <NewDestinationT> AvroIO.TypedWrite<UserT,NewDestinationT,OutputT> to(DynamicAvroDestinations<UserT,NewDestinationT,OutputT> dynamicDestinations)
FileIO.write()
or FileIO.writeDynamic()
instead.DynamicAvroDestinations
object to vend FileBasedSink.FilenamePolicy
objects. These
objects can examine the input record when creating a FileBasedSink.FilenamePolicy
. A directory for
temporary files must be specified using withTempDirectory(org.apache.beam.sdk.options.ValueProvider<org.apache.beam.sdk.io.fs.ResourceId>)
.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withSyncInterval(int syncInterval)
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withSchema(Schema schema)
GenericRecord
and
when not using to(DynamicAvroDestinations)
.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withFormatFunction(@Nullable SerializableFunction<UserT,OutputT> formatFunction)
UserT
to the output type. If to(DynamicAvroDestinations)
is used, FileBasedSink.DynamicDestinations.formatRecord(UserT)
must be
used instead.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withTempDirectory(ValueProvider<ResourceId> tempDirectory)
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withTempDirectory(ResourceId tempDirectory)
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withShardNameTemplate(java.lang.String shardTemplate)
ShardNameTemplate
for naming output files. This option may only be
used when using one of the default filename-prefix to() overrides.
See DefaultFilenamePolicy
for how the prefix, shard name template, and suffix are
used.
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withSuffix(java.lang.String filenameSuffix)
See DefaultFilenamePolicy
for how the prefix, shard name template, and suffix are
used.
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withNumShards(int numShards)
For unwindowed writes, constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
numShards
- the number of shards to use, or 0 to let the system decide.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withoutSharding()
For unwindowed writes, constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
This is equivalent to .withNumShards(1).withShardNameTemplate("")
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withWindowedWrites()
If using #to(FilenamePolicy)
. Filenames will be generated using FileBasedSink.FilenamePolicy.windowedFilename(int, int, org.apache.beam.sdk.transforms.windowing.BoundedWindow, org.apache.beam.sdk.transforms.windowing.PaneInfo, org.apache.beam.sdk.io.FileBasedSink.OutputFileHints)
. See also WriteFiles.withWindowedWrites()
.
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withNoSpilling()
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withCodec(CodecFactory codec)
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withDatumWriterFactory(AvroSink.DatumWriterFactory<OutputT> datumWriterFactory)
AvroSink.DatumWriterFactory
to use for creating DatumWriter
instances.public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withMetadata(java.util.Map<java.lang.String,java.lang.Object> metadata)
Supported value types are String, Long, and byte[].
public AvroIO.TypedWrite<UserT,DestinationT,OutputT> withBadRecordErrorHandler(ErrorHandler<BadRecord,?> errorHandler)
FileIO.Write#withBadRecordErrorHandler(ErrorHandler)
for details on usage.public WriteFilesResult<DestinationT> expand(PCollection<UserT> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
builder
- The builder to populate with display data.HasDisplayData