public abstract static class TextIO.TypedWrite<UserT,DestinationT> extends PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
TextIO.write()
.annotations, displayData, name, resourceHints
Constructor and Description |
---|
TypedWrite() |
Modifier and Type | Method and Description |
---|---|
WriteFilesResult<DestinationT> |
expand(PCollection<UserT> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
TextIO.TypedWrite<UserT,DestinationT> |
skipIfEmpty()
Don't write any output files if the PCollection is empty.
|
<NewDestinationT> |
to(FileBasedSink.DynamicDestinations<UserT,NewDestinationT,java.lang.String> dynamicDestinations)
Deprecated.
Use
FileIO.write() or FileIO.writeDynamic() with TextIO.sink()
instead. |
TextIO.TypedWrite<UserT,DestinationT> |
to(FileBasedSink.FilenamePolicy filenamePolicy)
Writes to files named according to the given
FileBasedSink.FilenamePolicy . |
TextIO.TypedWrite<UserT,DestinationT> |
to(ResourceId filenamePrefix)
Like
to(String) . |
TextIO.TypedWrite<UserT,DefaultFilenamePolicy.Params> |
to(SerializableFunction<UserT,DefaultFilenamePolicy.Params> destinationFunction,
DefaultFilenamePolicy.Params emptyDestination)
Deprecated.
Use
FileIO.write() or FileIO.writeDynamic() with TextIO.sink()
instead. |
TextIO.TypedWrite<UserT,DestinationT> |
to(java.lang.String filenamePrefix)
Writes to text files with the given prefix.
|
TextIO.TypedWrite<UserT,DestinationT> |
to(ValueProvider<java.lang.String> outputPrefix)
Like
to(String) . |
TextIO.TypedWrite<UserT,DestinationT> |
toResource(ValueProvider<ResourceId> filenamePrefix)
Like
to(ResourceId) . |
TextIO.TypedWrite<UserT,DestinationT> |
withAutoSharding() |
TextIO.TypedWrite<UserT,DestinationT> |
withBadRecordErrorHandler(ErrorHandler<BadRecord,?> errorHandler)
See
FileIO.Write.withBadRecordErrorHandler(ErrorHandler) for details on usage. |
TextIO.TypedWrite<UserT,DestinationT> |
withBatchMaxBufferingDuration(@Nullable Duration batchMaxBufferingDuration)
Returns a new
TextIO.TypedWrite that will batch the input records using specified max
buffering duration. |
TextIO.TypedWrite<UserT,DestinationT> |
withBatchSize(@Nullable java.lang.Integer batchSize)
Returns a new
TextIO.TypedWrite that will batch the input records using specified batch
size. |
TextIO.TypedWrite<UserT,DestinationT> |
withBatchSizeBytes(@Nullable java.lang.Integer batchSizeBytes)
Returns a new
TextIO.TypedWrite that will batch the input records using specified batch size
in bytes. |
TextIO.TypedWrite<UserT,DestinationT> |
withCompression(Compression compression)
Returns a transform for writing to text files like this one but that compresses output using
the given
Compression . |
TextIO.TypedWrite<UserT,DestinationT> |
withDelimiter(char[] delimiter)
Specifies the delimiter after each string written.
|
TextIO.TypedWrite<UserT,DestinationT> |
withFooter(@Nullable java.lang.String footer)
Adds a footer string to each file.
|
TextIO.TypedWrite<UserT,DestinationT> |
withFormatFunction(@Nullable SerializableFunction<UserT,java.lang.String> formatFunction)
Deprecated.
Use
FileIO.write() or FileIO.writeDynamic() with TextIO.sink()
instead. |
TextIO.TypedWrite<UserT,DestinationT> |
withHeader(@Nullable java.lang.String header)
Adds a header string to each file.
|
TextIO.TypedWrite<UserT,DestinationT> |
withNoSpilling()
|
TextIO.TypedWrite<UserT,DestinationT> |
withNumShards(int numShards)
Configures the number of output shards produced overall (when using unwindowed writes) or
per-window (when using windowed writes).
|
TextIO.TypedWrite<UserT,DestinationT> |
withNumShards(@Nullable ValueProvider<java.lang.Integer> numShards)
Like
withNumShards(int) . |
TextIO.TypedWrite<UserT,DestinationT> |
withoutSharding()
Forces a single file as output and empty shard name template.
|
TextIO.TypedWrite<UserT,DestinationT> |
withShardNameTemplate(java.lang.String shardTemplate)
Uses the given
ShardNameTemplate for naming output files. |
TextIO.TypedWrite<UserT,DestinationT> |
withSuffix(java.lang.String filenameSuffix)
Configures the filename suffix for written files.
|
TextIO.TypedWrite<UserT,DestinationT> |
withTempDirectory(ResourceId tempDirectory)
Set the base directory used to generate temporary files.
|
TextIO.TypedWrite<UserT,DestinationT> |
withTempDirectory(ValueProvider<ResourceId> tempDirectory)
Set the base directory used to generate temporary files.
|
TextIO.TypedWrite<UserT,DestinationT> |
withWindowedWrites()
Preserves windowing of input elements and writes them to files based on the element's window.
|
TextIO.TypedWrite<UserT,DestinationT> |
withWritableByteChannelFactory(FileBasedSink.WritableByteChannelFactory writableByteChannelFactory)
Returns a transform for writing to text files like this one but that has the given
FileBasedSink.WritableByteChannelFactory to be used by the FileBasedSink during output. |
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
public TextIO.TypedWrite<UserT,DestinationT> to(java.lang.String filenamePrefix)
prefix
can reference any FileSystem
on the classpath. This prefix is used by the DefaultFilenamePolicy
to
generate filenames.
By default, a DefaultFilenamePolicy
will be used built using the specified prefix
to define the base output directory and file prefix, a shard identifier (see withNumShards(int)
), and a common suffix (if supplied using withSuffix(String)
).
This default policy can be overridden using #to(FilenamePolicy)
, in which case
withShardNameTemplate(String)
and withSuffix(String)
should not be set.
Custom filename policies do not automatically see this prefix - you should explicitly pass
the prefix into your FileBasedSink.FilenamePolicy
object if you need this.
If withTempDirectory(org.apache.beam.sdk.options.ValueProvider<org.apache.beam.sdk.io.fs.ResourceId>)
has not been called, this filename prefix will be used to
infer a directory for temporary files.
public TextIO.TypedWrite<UserT,DestinationT> to(ResourceId filenamePrefix)
to(String)
.public TextIO.TypedWrite<UserT,DestinationT> to(ValueProvider<java.lang.String> outputPrefix)
to(String)
.public TextIO.TypedWrite<UserT,DestinationT> to(FileBasedSink.FilenamePolicy filenamePolicy)
FileBasedSink.FilenamePolicy
. A
directory for temporary files must be specified using withTempDirectory(org.apache.beam.sdk.options.ValueProvider<org.apache.beam.sdk.io.fs.ResourceId>)
.@Deprecated public <NewDestinationT> TextIO.TypedWrite<UserT,NewDestinationT> to(FileBasedSink.DynamicDestinations<UserT,NewDestinationT,java.lang.String> dynamicDestinations)
FileIO.write()
or FileIO.writeDynamic()
with TextIO.sink()
instead.FileBasedSink.DynamicDestinations
object to vend FileBasedSink.FilenamePolicy
objects. These
objects can examine the input record when creating a FileBasedSink.FilenamePolicy
. A directory for
temporary files must be specified using withTempDirectory(org.apache.beam.sdk.options.ValueProvider<org.apache.beam.sdk.io.fs.ResourceId>)
.@Deprecated public TextIO.TypedWrite<UserT,DefaultFilenamePolicy.Params> to(SerializableFunction<UserT,DefaultFilenamePolicy.Params> destinationFunction, DefaultFilenamePolicy.Params emptyDestination)
FileIO.write()
or FileIO.writeDynamic()
with TextIO.sink()
instead.DefaultFilenamePolicy.Params
object that specifies where the
records should be written (base filename, file suffix, and shard template). The
emptyDestination parameter specified where empty files should be written for when the written
PCollection
is empty.public TextIO.TypedWrite<UserT,DestinationT> toResource(ValueProvider<ResourceId> filenamePrefix)
to(ResourceId)
.@Deprecated public TextIO.TypedWrite<UserT,DestinationT> withFormatFunction(@Nullable SerializableFunction<UserT,java.lang.String> formatFunction)
FileIO.write()
or FileIO.writeDynamic()
with TextIO.sink()
instead.UserT
to the output type. If #to(DynamicDestinations)
is used, FileBasedSink.DynamicDestinations.formatRecord(Object)
must be
used instead.public TextIO.TypedWrite<UserT,DestinationT> withBatchSize(@Nullable java.lang.Integer batchSize)
TextIO.TypedWrite
that will batch the input records using specified batch
size. The default value is WriteFiles.FILE_TRIGGERING_RECORD_COUNT
.
This option is used only for writing unbounded data with auto-sharding.
public TextIO.TypedWrite<UserT,DestinationT> withBatchSizeBytes(@Nullable java.lang.Integer batchSizeBytes)
TextIO.TypedWrite
that will batch the input records using specified batch size
in bytes. The default value is WriteFiles.FILE_TRIGGERING_BYTE_COUNT
.
This option is used only for writing unbounded data with auto-sharding.
public TextIO.TypedWrite<UserT,DestinationT> withBatchMaxBufferingDuration(@Nullable Duration batchMaxBufferingDuration)
TextIO.TypedWrite
that will batch the input records using specified max
buffering duration. The default value is WriteFiles.FILE_TRIGGERING_RECORD_BUFFERING_DURATION
.
This option is used only for writing unbounded data with auto-sharding.
public TextIO.TypedWrite<UserT,DestinationT> withTempDirectory(ValueProvider<ResourceId> tempDirectory)
public TextIO.TypedWrite<UserT,DestinationT> withTempDirectory(ResourceId tempDirectory)
public TextIO.TypedWrite<UserT,DestinationT> withShardNameTemplate(java.lang.String shardTemplate)
ShardNameTemplate
for naming output files. This option may only be
used when using one of the default filename-prefix to() overrides - i.e. not when using
either #to(FilenamePolicy)
or #to(DynamicDestinations)
.
See DefaultFilenamePolicy
for how the prefix, shard name template, and suffix are
used.
public TextIO.TypedWrite<UserT,DestinationT> withSuffix(java.lang.String filenameSuffix)
#to(FilenamePolicy)
or #to(DynamicDestinations)
.
See DefaultFilenamePolicy
for how the prefix, shard name template, and suffix are
used.
public TextIO.TypedWrite<UserT,DestinationT> withNumShards(int numShards)
For unwindowed writes, constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
numShards
- the number of shards to use, or 0 to let the system decide.public TextIO.TypedWrite<UserT,DestinationT> withNumShards(@Nullable ValueProvider<java.lang.Integer> numShards)
withNumShards(int)
. Specifying null
means runner-determined sharding.public TextIO.TypedWrite<UserT,DestinationT> withoutSharding()
For unwindowed writes, constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
This is equivalent to .withNumShards(1).withShardNameTemplate("")
public TextIO.TypedWrite<UserT,DestinationT> withDelimiter(char[] delimiter)
Defaults to '\n'.
public TextIO.TypedWrite<UserT,DestinationT> withHeader(@Nullable java.lang.String header)
A null
value will clear any previously configured header.
public TextIO.TypedWrite<UserT,DestinationT> withFooter(@Nullable java.lang.String footer)
A null
value will clear any previously configured footer.
public TextIO.TypedWrite<UserT,DestinationT> withWritableByteChannelFactory(FileBasedSink.WritableByteChannelFactory writableByteChannelFactory)
FileBasedSink.WritableByteChannelFactory
to be used by the FileBasedSink
during output. The
default is value is Compression.UNCOMPRESSED
.
A null
value will reset the value to the default value mentioned above.
public TextIO.TypedWrite<UserT,DestinationT> withCompression(Compression compression)
Compression
. The default value is Compression.UNCOMPRESSED
.public TextIO.TypedWrite<UserT,DestinationT> withWindowedWrites()
If using to(FileBasedSink.FilenamePolicy)
. Filenames will be generated using
FileBasedSink.FilenamePolicy.windowedFilename(int, int, org.apache.beam.sdk.transforms.windowing.BoundedWindow, org.apache.beam.sdk.transforms.windowing.PaneInfo, org.apache.beam.sdk.io.FileBasedSink.OutputFileHints)
. See also WriteFiles.withWindowedWrites()
.
public TextIO.TypedWrite<UserT,DestinationT> withAutoSharding()
public TextIO.TypedWrite<UserT,DestinationT> withNoSpilling()
public TextIO.TypedWrite<UserT,DestinationT> withBadRecordErrorHandler(ErrorHandler<BadRecord,?> errorHandler)
FileIO.Write.withBadRecordErrorHandler(ErrorHandler)
for details on usage.public TextIO.TypedWrite<UserT,DestinationT> skipIfEmpty()
public WriteFilesResult<DestinationT> expand(PCollection<UserT> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
builder
- The builder to populate with display data.HasDisplayData