Class ShardNameTemplate

java.lang.Object
org.apache.beam.sdk.io.ShardNameTemplate

public class ShardNameTemplate extends Object
Standard shard naming templates.

Shard naming templates are strings that may contain placeholders for the shard number and shard count. When constructing a filename for a particular shard number, the upper-case letters 'S' and 'N' are replaced with the 0-padded shard number and shard count respectively.

Left-padding of the numbers enables lexicographical sorting of the resulting filenames. If the shard number or count are too large for the space provided in the template, then the result may no longer sort lexicographically. For example, a shard template of "S-of-N", for 200 shards, will result in outputs named "0-of-200", ... '10-of-200', '100-of-200", etc.

Shard numbers start with 0, so the last shard number is the shard count minus one. For example, the template "-SSSSS-of-NNNNN" will be instantiated as "-00000-of-01000" for the first shard (shard 0) of a 1000-way sharded output.

A shard name template is typically provided along with a name prefix and suffix, which allows constructing complex paths that have embedded shard information. For example, outputs in the form "gs://bucket/path-01-of-99.txt" could be constructed by providing the individual components:


 pipeline.apply(
     TextIO.write().to("gs://bucket/path")
                 .withShardNameTemplate("-SS-of-NN")
                 .withSuffix(".txt"))
 

In the example above, you could make parts of the output configurable by users without the user having to specify all components of the output name.

If a shard name template does not contain any repeating 'S', then the output shard count must be 1, as otherwise the same filename would be generated for multiple shards.

  • Field Details

    • INDEX_OF_MAX

      public static final String INDEX_OF_MAX
      Shard name containing the index and max.

      Eg: [prefix]-00000-of-00100[suffix] and [prefix]-00001-of-00100[suffix]

      See Also:
    • DIRECTORY_CONTAINER

      public static final String DIRECTORY_CONTAINER
      Shard is a file within a directory.

      Eg: [prefix]/part-00000[suffix] and [prefix]/part-00001[suffix]

      See Also:
  • Constructor Details

    • ShardNameTemplate

      public ShardNameTemplate()