T
- the type of values written to the sink.public abstract static class FileBasedSink.WriteOperation<T>
extends java.lang.Object
implements java.io.Serializable
FileBasedSink
.
The primary responsibilities of the WriteOperation is the management of output
files. During a write, FileBasedSink.Writer
s write bundles to temporary file
locations. After the bundles have been written,
finalize(java.lang.Iterable<org.apache.beam.sdk.io.FileBasedSink.FileResult>)
is given a list of the temporary
files containing the output bundles.
Subclass implementations of WriteOperation must implement
createWriter()
to return a concrete
FileBasedSinkWriter.
{tempDirectory}/{bundleId}
, where bundleId is the unique id of the bundle.
For example, if tempDirectory is "gs://my-bucket/my_temp_output", the output for a
bundle with bundle id 15723 will be "gs://my-bucket/my_temp_output/15723".
Final output files are written to baseOutputFilename with the format
{baseOutputFilename}-0000i-of-0000n.{extension}
where n is the total number of bundles
written and extension is the file extension. Both baseOutputFilename and extension are required
constructor arguments.
Subclass implementations can change the file naming template by supplying a value for fileNamingTemplate.
Note that in the case of permanent failure of a bundle's write, no clean up of temporary files will occur.
If there are no elements in the PCollection being written, no output will be generated.
Modifier and Type | Field and Description |
---|---|
protected FileBasedSink<T> |
sink
The Sink that this WriteOperation will write to.
|
protected ValueProvider<ResourceId> |
tempDirectory
Directory for temporary output files.
|
protected boolean |
windowedWrites
Whether windowed writes are being used.
|
Constructor and Description |
---|
WriteOperation(FileBasedSink<T> sink)
Constructs a WriteOperation using the default strategy for generating a temporary
directory from the base output filename.
|
WriteOperation(FileBasedSink<T> sink,
ResourceId tempDirectory)
Create a new WriteOperation.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.Map<ResourceId,ResourceId> |
buildOutputFilenames(java.lang.Iterable<FileBasedSink.FileResult> writerResults) |
protected static ResourceId |
buildTemporaryFilename(ResourceId tempDirectory,
java.lang.String filename)
Constructs a temporary file resource given the temporary directory and a filename.
|
abstract FileBasedSink.Writer<T> |
createWriter()
Clients must implement to return a subclass of
FileBasedSink.Writer . |
void |
finalize(java.lang.Iterable<FileBasedSink.FileResult> writerResults)
Finalizes writing by copying temporary output files to their final location and optionally
removing temporary files.
|
FileBasedSink<T> |
getSink()
Returns the FileBasedSink for this write operation.
|
void |
setWindowedWrites(boolean windowedWrites)
Indicates that the operation will be performing windowed writes.
|
java.lang.String |
toString() |
protected final FileBasedSink<T> sink
protected final ValueProvider<ResourceId> tempDirectory
@Experimental(value=FILESYSTEM) protected boolean windowedWrites
public WriteOperation(FileBasedSink<T> sink)
Default is a uniquely named sibling of baseOutputFilename, e.g. if baseOutputFilename is /path/to/foo, the temporary directory will be /path/to/temp-beam-foo-$date.
sink
- the FileBasedSink that will be used to configure this write operation.@Experimental(value=FILESYSTEM) public WriteOperation(FileBasedSink<T> sink, ResourceId tempDirectory)
sink
- the FileBasedSink that will be used to configure this write operation.tempDirectory
- the base directory to be used for temporary output files.@Experimental(value=FILESYSTEM) protected static ResourceId buildTemporaryFilename(ResourceId tempDirectory, java.lang.String filename) throws java.io.IOException
java.io.IOException
public abstract FileBasedSink.Writer<T> createWriter() throws java.lang.Exception
FileBasedSink.Writer
. This
method must not mutate the state of the object.java.lang.Exception
public void setWindowedWrites(boolean windowedWrites)
public void finalize(java.lang.Iterable<FileBasedSink.FileResult> writerResults) throws java.lang.Exception
Finalization may be overridden by subclass implementations to perform customized
finalization (e.g., initiating some operation on output bundles, merging them, etc.).
writerResults
contains the filenames of written bundles.
If subclasses override this method, they must guarantee that its implementation is idempotent, as it may be executed multiple times in the case of failure or for redundancy. It is a best practice to attempt to try to make this method atomic.
writerResults
- the results of writes (FileResult).java.lang.Exception
@Experimental(value=FILESYSTEM) protected final java.util.Map<ResourceId,ResourceId> buildOutputFilenames(java.lang.Iterable<FileBasedSink.FileResult> writerResults)
public FileBasedSink<T> getSink()
public java.lang.String toString()
toString
in class java.lang.Object