Class FileBasedSink<UserT,DestinationT,OutputT>
- Type Parameters:
OutputT
- the type of values written to the sink.
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Direct Known Subclasses:
AvroSink
At pipeline construction time, the methods of FileBasedSink are called to validate the sink
and to create a FileBasedSink.WriteOperation
that manages the process of writing to the sink.
The process of writing to file-based sink is as follows:
- An optional subclass-defined initialization,
- a parallel write of bundles to temporary files, and finally,
- these temporary files are renamed with final output filenames.
In order to ensure fault-tolerance, a bundle may be executed multiple times (e.g., in the
event of failure/retry or for redundancy). However, exactly one of these executions will have its
result passed to the finalize method. Each call to FileBasedSink.Writer.open(java.lang.String)
is passed a unique
bundle id when it is called by the WriteFiles transform, so even redundant or retried
bundles will have a unique way of identifying their output.
The bundle id should be used to guarantee that a bundle's output is unique. This uniqueness guarantee is important; if a bundle is to be output to a file, for example, the name of the file will encode the unique bundle id to avoid conflicts with other writers.
FileBasedSink
can take a custom FileBasedSink.FilenamePolicy
object to determine output
filenames, and this policy object can be used to write windowed or triggered PCollections into
separate files per window pane. This allows file output from unbounded PCollections, and also
works for bounded PCollections.
Supported file systems are those registered with FileSystems
.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Deprecated.static class
A class that allows value-dependent writes inFileBasedSink
.static class
A naming policy for output files.static final class
Result of a single bundle write.static final class
A coder forFileBasedSink.FileResult
objects.static interface
Provides hints about how to generate output files, such as a suggested filename suffix (e.g.static interface
Implementations create instances ofWritableByteChannel
used byFileBasedSink
and related classes to allow decorating, or otherwise transforming, the raw data that would normally be written directly to theWritableByteChannel
passed intoFileBasedSink.WritableByteChannelFactory.create(WritableByteChannel)
.static class
Abstract operation that manages the process of writing toFileBasedSink
.static class
Abstract writer that writes a bundle to aFileBasedSink
. -
Constructor Summary
ConstructorsConstructorDescriptionFileBasedSink
(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations) Construct aFileBasedSink
with the given temp directory, producing uncompressed files.FileBasedSink
(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations, Compression compression) Construct aFileBasedSink
with the given temp directory and output channel type.FileBasedSink
(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations, FileBasedSink.WritableByteChannelFactory writableByteChannelFactory) Construct aFileBasedSink
with the given temp directory and output channel type. -
Method Summary
Modifier and TypeMethodDescriptionstatic ResourceId
convertToFileResourceIfPossible
(String outputPrefix) This is a helper function for turning a user-provided output filename prefix and converting it into aResourceId
for writing output files.abstract FileBasedSink.WriteOperation
<DestinationT, OutputT> Return a subclass ofFileBasedSink.WriteOperation
that will manage the write to the sink.Return theFileBasedSink.DynamicDestinations
used.Returns the directory inside which temporary files will be written according to the configuredFileBasedSink.FilenamePolicy
.protected final FileBasedSink.WritableByteChannelFactory
Returns theFileBasedSink.WritableByteChannelFactory
used.void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.void
validate
(PipelineOptions options)
-
Constructor Details
-
FileBasedSink
public FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations) Construct aFileBasedSink
with the given temp directory, producing uncompressed files. -
FileBasedSink
public FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations, FileBasedSink.WritableByteChannelFactory writableByteChannelFactory) Construct aFileBasedSink
with the given temp directory and output channel type. -
FileBasedSink
public FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations, Compression compression) Construct aFileBasedSink
with the given temp directory and output channel type.
-
-
Method Details
-
convertToFileResourceIfPossible
This is a helper function for turning a user-provided output filename prefix and converting it into aResourceId
for writing output files. SeeTextIO.Write.to(String)
for an example use case.Typically, the input prefix will be something like
/tmp/foo/bar
, and the user would like output files to be named as/tmp/foo/bar-0-of-3.txt
. Thus, this function tries to interpret the provided string as a fileResourceId
path.However, this may fail, for example if the user gives a prefix that is a directory. E.g.,
/
,gs://my-bucket
, orc://
. In that case, interpreting the string as a file will fail and this function will return a directoryResourceId
instead. -
getDynamicDestinations
Return theFileBasedSink.DynamicDestinations
used. -
getTempDirectoryProvider
Returns the directory inside which temporary files will be written according to the configuredFileBasedSink.FilenamePolicy
. -
validate
-
createWriteOperation
Return a subclass ofFileBasedSink.WriteOperation
that will manage the write to the sink. -
populateDisplayData
Description copied from interface:HasDisplayData
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Parameters:
builder
- The builder to populate with display data.- See Also:
-
getWritableByteChannelFactory
Returns theFileBasedSink.WritableByteChannelFactory
used.
-
Compression
.