OutputT
- the type of values written to the sink.@Experimental(value=FILESYSTEM) public abstract class FileBasedSink<UserT,DestinationT,OutputT> extends java.lang.Object implements java.io.Serializable, HasDisplayData
At pipeline construction time, the methods of FileBasedSink are called to validate the sink
and to create a FileBasedSink.WriteOperation
that manages the process of writing to the sink.
The process of writing to file-based sink is as follows:
In order to ensure fault-tolerance, a bundle may be executed multiple times (e.g., in the
event of failure/retry or for redundancy). However, exactly one of these executions will have its
result passed to the finalize method. Each call to FileBasedSink.Writer.openWindowed(java.lang.String, org.apache.beam.sdk.transforms.windowing.BoundedWindow, org.apache.beam.sdk.transforms.windowing.PaneInfo, int, DestinationT)
or FileBasedSink.Writer.openUnwindowed(java.lang.String, int, DestinationT)
is passed a unique bundle id when it is called by the WriteFiles
transform, so even redundant or retried bundles will have a unique way of identifying their
output.
The bundle id should be used to guarantee that a bundle's output is unique. This uniqueness guarantee is important; if a bundle is to be output to a file, for example, the name of the file will encode the unique bundle id to avoid conflicts with other writers.
FileBasedSink
can take a custom FileBasedSink.FilenamePolicy
object to determine output
filenames, and this policy object can be used to write windowed or triggered PCollections into
separate files per window pane. This allows file output from unbounded PCollections, and also
works for bounded PCollecctions.
Supported file systems are those registered with FileSystems
.
Modifier and Type | Class and Description |
---|---|
static class |
FileBasedSink.CompressionType
Deprecated.
use
Compression . |
static class |
FileBasedSink.DynamicDestinations<UserT,DestinationT,OutputT>
A class that allows value-dependent writes in
FileBasedSink . |
static class |
FileBasedSink.FilenamePolicy
A naming policy for output files.
|
static class |
FileBasedSink.FileResult<DestinationT>
Result of a single bundle write.
|
static class |
FileBasedSink.FileResultCoder<DestinationT>
A coder for
FileBasedSink.FileResult objects. |
static interface |
FileBasedSink.OutputFileHints
Provides hints about how to generate output files, such as a suggested filename suffix (e.g.
|
static interface |
FileBasedSink.WritableByteChannelFactory
Implementations create instances of
WritableByteChannel used by FileBasedSink
and related classes to allow decorating, or otherwise transforming, the raw data that
would normally be written directly to the WritableByteChannel passed into FileBasedSink.WritableByteChannelFactory.create(WritableByteChannel) . |
static class |
FileBasedSink.WriteOperation<DestinationT,OutputT>
Abstract operation that manages the process of writing to
FileBasedSink . |
static class |
FileBasedSink.Writer<DestinationT,OutputT>
Abstract writer that writes a bundle to a
FileBasedSink . |
Constructor and Description |
---|
FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider,
FileBasedSink.DynamicDestinations<?,DestinationT,OutputT> dynamicDestinations)
Construct a
FileBasedSink with the given temp directory, producing uncompressed files. |
FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider,
FileBasedSink.DynamicDestinations<?,DestinationT,OutputT> dynamicDestinations,
Compression compression)
Construct a
FileBasedSink with the given temp directory and output channel type. |
FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider,
FileBasedSink.DynamicDestinations<?,DestinationT,OutputT> dynamicDestinations,
FileBasedSink.WritableByteChannelFactory writableByteChannelFactory)
Construct a
FileBasedSink with the given temp directory and output channel type. |
@Experimental(value=FILESYSTEM) public FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?,DestinationT,OutputT> dynamicDestinations)
FileBasedSink
with the given temp directory, producing uncompressed files.@Experimental(value=FILESYSTEM) public FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?,DestinationT,OutputT> dynamicDestinations, FileBasedSink.WritableByteChannelFactory writableByteChannelFactory)
FileBasedSink
with the given temp directory and output channel type.@Experimental(value=FILESYSTEM) public FileBasedSink(ValueProvider<ResourceId> tempDirectoryProvider, FileBasedSink.DynamicDestinations<?,DestinationT,OutputT> dynamicDestinations, Compression compression)
FileBasedSink
with the given temp directory and output channel type.@Experimental(value=FILESYSTEM) public static ResourceId convertToFileResourceIfPossible(java.lang.String outputPrefix)
ResourceId
for writing output files. See TextIO.Write.to(String)
for an
example use case.
Typically, the input prefix will be something like /tmp/foo/bar
, and the user would
like output files to be named as /tmp/foo/bar-0-of-3.txt
. Thus, this function tries to
interpret the provided string as a file ResourceId
path.
However, this may fail, for example if the user gives a prefix that is a directory. E.g.,
/
, gs://my-bucket
, or c://
. In that case, interpreting the string as a
file will fail and this function will return a directory ResourceId
instead.
public FileBasedSink.DynamicDestinations<UserT,DestinationT,OutputT> getDynamicDestinations()
FileBasedSink.DynamicDestinations
used.@Experimental(value=FILESYSTEM) public ValueProvider<ResourceId> getTempDirectoryProvider()
FileBasedSink.FilenamePolicy
.public void validate(PipelineOptions options)
public abstract FileBasedSink.WriteOperation<DestinationT,OutputT> createWriteOperation()
FileBasedSink.WriteOperation
that will manage the write to the sink.public void populateDisplayData(DisplayData.Builder builder)
HasDisplayData
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call
super.populateDisplayData(builder)
in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder)
to use
the namespace of the subcomponent.
populateDisplayData
in interface HasDisplayData
builder
- The builder to populate with display data.HasDisplayData
protected final FileBasedSink.WritableByteChannelFactory getWritableByteChannelFactory()
FileBasedSink.WritableByteChannelFactory
used.