OutputT
- the type of values written to the sink.public abstract static class FileBasedSink.WriteOperation<DestinationT,OutputT>
extends java.lang.Object
implements java.io.Serializable
FileBasedSink
.
The primary responsibilities of the WriteOperation is the management of output files. During
a write, FileBasedSink.Writer
s write bundles to temporary file locations. After the bundles have been
written,
finalizeDestination(DestinationT, org.apache.beam.sdk.transforms.windowing.BoundedWindow, java.lang.Integer, java.util.Collection<org.apache.beam.sdk.io.FileBasedSink.FileResult<DestinationT>>)
is given a list of the temporary files
containing the output bundles.
Subclass implementations of WriteOperation must implement createWriter()
to return a concrete FileBasedSinkWriter.
During the write, bundles are written to temporary files using the tempDirectory that can be
provided via the constructor of WriteOperation. These temporary files will be named {tempDirectory}/{bundleId}
, where bundleId is the unique id of the bundle. For example, if
tempDirectory is "gs://my-bucket/my_temp_output", the output for a bundle with bundle id 15723
will be "gs://my-bucket/my_temp_output/15723".
Final output files are written to the location specified by the FileBasedSink.FilenamePolicy
. If
no filename policy is specified, then the DefaultFilenamePolicy
will be used. The
directory that the files are written to is determined by the FileBasedSink.FilenamePolicy
instance.
Note that in the case of permanent failure of a bundle's write, no clean up of temporary files will occur.
If there are no elements in the PCollection being written, no output will be generated.
Modifier and Type | Field and Description |
---|---|
protected FileBasedSink<?,DestinationT,OutputT> |
sink
The Sink that this WriteOperation will write to.
|
protected boolean |
windowedWrites
Whether windowed writes are being used.
|
Constructor and Description |
---|
WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink)
Constructs a WriteOperation using the default strategy for generating a temporary directory
from the base output filename.
|
WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink,
ResourceId tempDirectory)
Create a new WriteOperation.
|
Modifier and Type | Method and Description |
---|---|
protected static ResourceId |
buildTemporaryFilename(ResourceId tempDirectory,
java.lang.String filename)
Constructs a temporary file resource given the temporary directory and a filename.
|
abstract FileBasedSink.Writer<DestinationT,OutputT> |
createWriter()
Clients must implement to return a subclass of
FileBasedSink.Writer . |
protected java.util.List<KV<FileBasedSink.FileResult<DestinationT>,ResourceId>> |
finalizeDestination(DestinationT dest,
@Nullable BoundedWindow window,
@Nullable java.lang.Integer numShards,
java.util.Collection<FileBasedSink.FileResult<DestinationT>> existingResults) |
FileBasedSink<?,DestinationT,OutputT> |
getSink()
Returns the FileBasedSink for this write operation.
|
ResourceId |
getTempDirectory() |
void |
removeTemporaryFiles(java.util.Collection<ResourceId> filenames) |
void |
setWindowedWrites()
Indicates that the operation will be performing windowed writes.
|
java.lang.String |
toString() |
protected final FileBasedSink<?,DestinationT,OutputT> sink
@Experimental(value=FILESYSTEM) protected boolean windowedWrites
public WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink)
Without windowing, the default is a uniquely named subdirectory of the provided tempDirectory, e.g. if tempDirectory is /path/to/foo/, the temporary directory will be /path/to/foo/.temp-beam-$uuid.
With windowing, the default is a consistent named subdirectory of the provided tempDirectory, e.g. if tempDirectory is /path/to/foo/, the temporary directory will be /path/to/foo/.temp-beam. With windowing, unique subdirectories of the tempDirectory are not beneficial as they cannot be used for cleanup. By using a consistent directory, the created temp files are well-distributed beneath a common directory prefix, across both worker and pipeline executions. This is beneficial for filesystems such as GCS which can reuse autoscaling of the file metadata.
sink
- the FileBasedSink that will be used to configure this write operation.@Experimental(value=FILESYSTEM) public WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink, ResourceId tempDirectory)
sink
- the FileBasedSink that will be used to configure this write operation.tempDirectory
- the base directory to be used for temporary output files.@Experimental(value=FILESYSTEM) protected static ResourceId buildTemporaryFilename(ResourceId tempDirectory, java.lang.String filename) throws java.io.IOException
java.io.IOException
public ResourceId getTempDirectory()
public abstract FileBasedSink.Writer<DestinationT,OutputT> createWriter() throws java.lang.Exception
FileBasedSink.Writer
. This method must not mutate
the state of the object.java.lang.Exception
public void setWindowedWrites()
public void removeTemporaryFiles(java.util.Collection<ResourceId> filenames) throws java.io.IOException
java.io.IOException
@Experimental(value=FILESYSTEM) protected final java.util.List<KV<FileBasedSink.FileResult<DestinationT>,ResourceId>> finalizeDestination(DestinationT dest, @Nullable BoundedWindow window, @Nullable java.lang.Integer numShards, java.util.Collection<FileBasedSink.FileResult<DestinationT>> existingResults) throws java.lang.Exception
java.lang.Exception
public FileBasedSink<?,DestinationT,OutputT> getSink()
public java.lang.String toString()
toString
in class java.lang.Object