Package org.apache.beam.sdk.io
Class FileIO.Write<DestinationT,UserT>
java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
org.apache.beam.sdk.io.FileIO.Write<DestinationT,UserT>
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
FileIO
public abstract static class FileIO.Write<DestinationT,UserT>
extends PTransform<PCollection<UserT>,WriteFilesResult<DestinationT>>
Implementation of
FileIO.write()
and FileIO.writeDynamic()
.- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
A policy for generating names for shard files. -
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionby
(Contextful<Contextful.Fn<UserT, DestinationT>> destinationFn) Likeby(org.apache.beam.sdk.transforms.SerializableFunction<UserT, DestinationT>)
, but with access to context such as side inputs.by
(SerializableFunction<UserT, DestinationT> destinationFn) Specifies how to partition elements into groups ("destinations").static FileIO.Write.FileNaming
defaultNaming
(String prefix, String suffix) static FileIO.Write.FileNaming
defaultNaming
(ValueProvider<String> prefix, ValueProvider<String> suffix) Defines a defaultFileIO.Write.FileNaming
which will use the prefix and suffix supplied to create a name based on the window, pane, number of shards, shard index, and compression.expand
(PCollection<UserT> input) Override this method to specify how thisPTransform
should be expanded on the givenInputT
.static FileIO.Write.FileNaming
relativeFileNaming
(ValueProvider<String> baseDirectory, FileIO.Write.FileNaming innerNaming) Specifies a common directory for all generated files.to
(ValueProvider<String> directory) Liketo(String)
but with aValueProvider
.via
(FileIO.Sink<UserT> sink) Likevia(Contextful)
, but uses the sameFileIO.Sink
for all destinations.via
(Contextful<Contextful.Fn<DestinationT, FileIO.Sink<UserT>>> sinkFn) Likevia(Contextful, Contextful)
, but the output type of the sink is the same as the type of the input collection.<OutputT> FileIO.Write
<DestinationT, UserT> via
(Contextful<Contextful.Fn<UserT, OutputT>> outputFn, FileIO.Sink<OutputT> sink) Likevia(Contextful, Contextful)
, but uses the same sink for all destinations.<OutputT> FileIO.Write
<DestinationT, UserT> via
(Contextful<Contextful.Fn<UserT, OutputT>> outputFn, Contextful<Contextful.Fn<DestinationT, FileIO.Sink<OutputT>>> sinkFn) Specifies how to create aFileIO.Sink
for a particular destination and how to map the element type to the sink's output type.withBadRecordErrorHandler
(ErrorHandler<BadRecord, ?> errorHandler) Configures a newFileIO.Write
with an ErrorHandler.withCompression
(Compression compression) Specifies to compress all generated shard files using the givenCompression
and, by default, append the respective extension to the filename.withDestinationCoder
(Coder<DestinationT> destinationCoder) Specifies aCoder
for the destination type, if it can not be inferred fromby(org.apache.beam.sdk.transforms.SerializableFunction<UserT, DestinationT>)
.withEmptyGlobalWindowDestination
(DestinationT emptyWindowDestination) IfwithIgnoreWindowing()
is specified, specifies a destination to be used in case the collection is empty, to generate the (only, empty) output file.Deprecated.Avoid usage of this method: its effects are complex and it will be removed in future versions of Beam.withMaxNumWritersPerBundle
(@Nullable Integer maxNumWritersPerBundle) Set the maximum number of writers created in a bundle before spilling to shuffle.withNaming
(FileIO.Write.FileNaming naming) Specifies a custom strategy for generating filenames.LikewithNaming(SerializableFunction)
but allows accessing context, such as side inputs, from the function.Specifies a custom strategy for generating filenames depending on the destination, similar towithNaming(FileNaming)
.withNumShards
(int numShards) Specifies to use a given fixed number of shards per window.withNumShards
(@Nullable ValueProvider<Integer> numShards) LikewithNumShards(int)
.withPrefix
(String prefix) Specifies a common prefix to use for all generated filenames, if using the default file naming.withPrefix
(ValueProvider<String> prefix) LikewithPrefix(String)
but with aValueProvider
.withSharding
(PTransform<PCollection<UserT>, PCollectionView<Integer>> sharding) Specifies aPTransform
to use for computing the desired number of shards in each window.withSuffix
(String suffix) Specifies a common suffix to use for all generated filenames, if using the default file naming.withSuffix
(ValueProvider<String> suffix) LikewithSuffix(String)
but with aValueProvider
.withTempDirectory
(String tempDirectory) Specifies a directory into which all temporary files will be placed.withTempDirectory
(ValueProvider<String> tempDirectory) Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Write
public Write()
-
-
Method Details
-
defaultNaming
-
defaultNaming
public static FileIO.Write.FileNaming defaultNaming(ValueProvider<String> prefix, ValueProvider<String> suffix) Defines a defaultFileIO.Write.FileNaming
which will use the prefix and suffix supplied to create a name based on the window, pane, number of shards, shard index, and compression. Removes window when in theGlobalWindow
and pane info when it is the only firing of the pane. -
relativeFileNaming
public static FileIO.Write.FileNaming relativeFileNaming(ValueProvider<String> baseDirectory, FileIO.Write.FileNaming innerNaming) -
by
Specifies how to partition elements into groups ("destinations"). -
by
public FileIO.Write<DestinationT,UserT> by(Contextful<Contextful.Fn<UserT, DestinationT>> destinationFn) Likeby(org.apache.beam.sdk.transforms.SerializableFunction<UserT, DestinationT>)
, but with access to context such as side inputs. -
via
public <OutputT> FileIO.Write<DestinationT,UserT> via(Contextful<Contextful.Fn<UserT, OutputT>> outputFn, Contextful<Contextful.Fn<DestinationT, FileIO.Sink<OutputT>>> sinkFn) Specifies how to create aFileIO.Sink
for a particular destination and how to map the element type to the sink's output type. The sink function must create a newFileIO.Sink
instance every time it is called. -
via
public <OutputT> FileIO.Write<DestinationT,UserT> via(Contextful<Contextful.Fn<UserT, OutputT>> outputFn, FileIO.Sink<OutputT> sink) Likevia(Contextful, Contextful)
, but uses the same sink for all destinations. -
via
public FileIO.Write<DestinationT,UserT> via(Contextful<Contextful.Fn<DestinationT, FileIO.Sink<UserT>>> sinkFn) Likevia(Contextful, Contextful)
, but the output type of the sink is the same as the type of the input collection. The sink function must create a newFileIO.Sink
instance every time it is called. -
via
Likevia(Contextful)
, but uses the sameFileIO.Sink
for all destinations. -
to
Specifies a common directory for all generated files. A temporary generated sub-directory of this directory will be used as the temp directory, unless overridden bywithTempDirectory(java.lang.String)
. -
to
Liketo(String)
but with aValueProvider
. -
withPrefix
Specifies a common prefix to use for all generated filenames, if using the default file naming. Incompatible withwithNaming(org.apache.beam.sdk.io.FileIO.Write.FileNaming)
. -
withPrefix
LikewithPrefix(String)
but with aValueProvider
. -
withSuffix
Specifies a common suffix to use for all generated filenames, if using the default file naming. Incompatible withwithNaming(org.apache.beam.sdk.io.FileIO.Write.FileNaming)
. -
withSuffix
LikewithSuffix(String)
but with aValueProvider
. -
withNaming
Specifies a custom strategy for generating filenames. All generated filenames will be resolved relative to the directory specified into(java.lang.String)
, if any.Incompatible with
withSuffix(java.lang.String)
.This can only be used in combination with
FileIO.write()
but notFileIO.writeDynamic()
. -
withNaming
public FileIO.Write<DestinationT,UserT> withNaming(SerializableFunction<DestinationT, FileIO.Write.FileNaming> namingFn) Specifies a custom strategy for generating filenames depending on the destination, similar towithNaming(FileNaming)
.This can only be used in combination with
FileIO.writeDynamic()
but notFileIO.write()
. -
withNaming
public FileIO.Write<DestinationT,UserT> withNaming(Contextful<Contextful.Fn<DestinationT, FileIO.Write.FileNaming>> namingFn) LikewithNaming(SerializableFunction)
but allows accessing context, such as side inputs, from the function. -
withTempDirectory
Specifies a directory into which all temporary files will be placed. -
withTempDirectory
-
withCompression
Specifies to compress all generated shard files using the givenCompression
and, by default, append the respective extension to the filename. -
withEmptyGlobalWindowDestination
public FileIO.Write<DestinationT,UserT> withEmptyGlobalWindowDestination(DestinationT emptyWindowDestination) IfwithIgnoreWindowing()
is specified, specifies a destination to be used in case the collection is empty, to generate the (only, empty) output file. -
withDestinationCoder
Specifies aCoder
for the destination type, if it can not be inferred fromby(org.apache.beam.sdk.transforms.SerializableFunction<UserT, DestinationT>)
. -
withNumShards
Specifies to use a given fixed number of shards per window. 0 means runner-determined sharding. Specifying a non-zero value may hurt performance, because it will limit the parallelism of writing and will introduce an extraGroupByKey
operation. -
withNumShards
LikewithNumShards(int)
. Specifyingnull
means runner-determined sharding. -
withSharding
public FileIO.Write<DestinationT,UserT> withSharding(PTransform<PCollection<UserT>, PCollectionView<Integer>> sharding) Specifies aPTransform
to use for computing the desired number of shards in each window. -
withIgnoreWindowing
Deprecated.Avoid usage of this method: its effects are complex and it will be removed in future versions of Beam. Right now it exists for compatibility withWriteFiles
.Specifies to ignore windowing information in the input, and instead rewindow it to global window with the default trigger. -
withAutoSharding
-
withNoSpilling
-
withMaxNumWritersPerBundle
public FileIO.Write<DestinationT,UserT> withMaxNumWritersPerBundle(@Nullable Integer maxNumWritersPerBundle) Set the maximum number of writers created in a bundle before spilling to shuffle. Seeinvalid reference
WriteFiles#withMaxNumWritersPerBundle()
-
withBadRecordErrorHandler
public FileIO.Write<DestinationT,UserT> withBadRecordErrorHandler(ErrorHandler<BadRecord, ?> errorHandler) Configures a newFileIO.Write
with an ErrorHandler. For configuring an ErrorHandler, seeErrorHandler
. Whenever a record is formatted, or a lookup for a dynamic destination is performed, and that operation fails, the exception is passed to the error handler. This is intended to handle any errors related to the data of a record, but not any connectivity or IO errors related to the literal writing of a record. -
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PCollection<UserT>,
WriteFilesResult<DestinationT>>
-