Class IsmFormat
java.lang.Object
org.apache.beam.runners.dataflow.internal.IsmFormat
An Ism file is a prefix encoded composite key value file broken into shards. Each composite key
is composed of a fixed number of component keys. A fixed number of those sub keys represent the
shard key portion; see
IsmFormat.IsmRecord and IsmFormat.IsmRecordCoder for further details around
the data format. In addition to the data, there is a bloom filter, and multiple indices to allow
for efficient retrieval.
An Ism file is composed of these high level sections (in order):
- shard block
- bloom filter (See
ScalableBloomFilterfor details on encoding format) - shard index
- footer (See
IsmFormat.Footerfor details on encoding format)
The shard block is composed of multiple copies of the following:
- data block
- data index
The data block is composed of multiple copies of the following:
- key prefix (See
IsmFormat.KeyPrefixfor details on encoding format) - unshared key bytes
- value bytes
- optional 0x00 0x00 bytes followed by metadata bytes (if the following 0x00 0x00 bytes are not present, then there are no metadata bytes)
1225801234 as the seed value.
The data index is composed of N copies of the following:
- key prefix (See
IsmFormat.KeyPrefixfor details on encoding format) - unshared key bytes
- byte offset to key prefix in data block (variable length long coding)
The shard index is composed of a variable length integer encoding representing
the number of shard index records followed by that many shard index records. See IsmFormat.IsmShardCoder for further details as to its encoding scheme.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classThe footer stores the relevant information required to locate the index and bloom filter.static final classACoderforIsmFormat.Footer.static classA record containing a composite key and either a value or metadata.static classACoderforIsmFormat.IsmRecords.static classA shard descriptor containing shard id, the data block offset, and the index offset for the given shard.static classA coder forIsmFormat.IsmShards.static classThe prefix used before each key which contains the number of shared and unshared bytes from the previous key that was read.static final classACoderforIsmFormat.KeyPrefix.static classA coder for metadata key component. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Coder<List<IsmFormat.IsmShard>> AListCoderwrapping aIsmFormat.IsmShardCoderused to encode the shard index.static final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic ObjectAn object representing a wild card for a key component.static booleanisMetadataKey(List<?> keyComponents) Returns true if and only if any of the passed in key components represent a metadata key.static voidValidates that the key portion of the given coder is deterministic.
-
Field Details
-
SHARD_BITS
public static final int SHARD_BITS- See Also:
-
ISM_SHARD_INDEX_CODER
AListCoderwrapping aIsmFormat.IsmShardCoderused to encode the shard index. SeeListCoderfor its encoding specification andIsmFormat.IsmShardCoderfor its encoding specification.
-
-
Constructor Details
-
IsmFormat
public IsmFormat()
-
-
Method Details
-
validateCoderIsCompatible
Validates that the key portion of the given coder is deterministic. -
isMetadataKey
Returns true if and only if any of the passed in key components represent a metadata key. -
getMetadataKey
An object representing a wild card for a key component. Encoded usingIsmFormat.MetadataKeyCoder.
-