public class ShardNameTemplate
extends java.lang.Object
Shard naming templates are strings that may contain placeholders for the shard number and shard count. When constructing a filename for a particular shard number, the upper-case letters 'S' and 'N' are replaced with the 0-padded shard number and shard count respectively.
Left-padding of the numbers enables lexicographical sorting of the resulting filenames. If the shard number or count are too large for the space provided in the template, then the result may no longer sort lexicographically. For example, a shard template of "S-of-N", for 200 shards, will result in outputs named "0-of-200", ... '10-of-200', '100-of-200", etc.
Shard numbers start with 0, so the last shard number is the shard count minus one. For example, the template "-SSSSS-of-NNNNN" will be instantiated as "-00000-of-01000" for the first shard (shard 0) of a 1000-way sharded output.
A shard name template is typically provided along with a name prefix and suffix, which allows constructing complex paths that have embedded shard information. For example, outputs in the form "gs://bucket/path-01-of-99.txt" could be constructed by providing the individual components:
pipeline.apply(
TextIO.write().to("gs://bucket/path")
.withShardNameTemplate("-SS-of-NN")
.withSuffix(".txt"))
In the example above, you could make parts of the output configurable by users without the user having to specify all components of the output name.
If a shard name template does not contain any repeating 'S', then the output shard count must be 1, as otherwise the same filename would be generated for multiple shards.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DIRECTORY_CONTAINER
Shard is a file within a directory.
|
static java.lang.String |
INDEX_OF_MAX
Shard name containing the index and max.
|
Constructor and Description |
---|
ShardNameTemplate() |
public static final java.lang.String INDEX_OF_MAX
Eg: [prefix]-00000-of-00100[suffix] and [prefix]-00001-of-00100[suffix]
public static final java.lang.String DIRECTORY_CONTAINER
Eg: [prefix]/part-00000[suffix] and [prefix]/part-00001[suffix]