Class IsmFormat
java.lang.Object
org.apache.beam.runners.dataflow.internal.IsmFormat
An Ism file is a prefix encoded composite key value file broken into shards. Each composite key
is composed of a fixed number of component keys. A fixed number of those sub keys represent the
shard key portion; see
IsmFormat.IsmRecord
and IsmFormat.IsmRecordCoder
for further details around
the data format. In addition to the data, there is a bloom filter, and multiple indices to allow
for efficient retrieval.
An Ism file is composed of these high level sections (in order):
- shard block
- bloom filter (See
ScalableBloomFilter
for details on encoding format) - shard index
- footer (See
IsmFormat.Footer
for details on encoding format)
The shard block is composed of multiple copies of the following:
- data block
- data index
The data block is composed of multiple copies of the following:
- key prefix (See
IsmFormat.KeyPrefix
for details on encoding format) - unshared key bytes
- value bytes
- optional 0x00 0x00 bytes followed by metadata bytes (if the following 0x00 0x00 bytes are not present, then there are no metadata bytes)
1225801234
as the seed value.
The data index is composed of N
copies of the following:
- key prefix (See
IsmFormat.KeyPrefix
for details on encoding format) - unshared key bytes
- byte offset to key prefix in data block (variable length long coding)
The shard index is composed of a variable length integer
encoding representing
the number of shard index records followed by that many shard index records. See IsmFormat.IsmShardCoder
for further details as to its encoding scheme.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
The footer stores the relevant information required to locate the index and bloom filter.static final class
ACoder
forIsmFormat.Footer
.static class
A record containing a composite key and either a value or metadata.static class
ACoder
forIsmFormat.IsmRecord
s.static class
A shard descriptor containing shard id, the data block offset, and the index offset for the given shard.static class
A coder forIsmFormat.IsmShard
s.static class
The prefix used before each key which contains the number of shared and unshared bytes from the previous key that was read.static final class
ACoder
forIsmFormat.KeyPrefix
.static class
A coder for metadata key component. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Coder
<List<IsmFormat.IsmShard>> AListCoder
wrapping aIsmFormat.IsmShardCoder
used to encode the shard index.static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic Object
An object representing a wild card for a key component.static boolean
isMetadataKey
(List<?> keyComponents) Returns true if and only if any of the passed in key components represent a metadata key.static void
Validates that the key portion of the given coder is deterministic.
-
Field Details
-
SHARD_BITS
public static final int SHARD_BITS- See Also:
-
ISM_SHARD_INDEX_CODER
AListCoder
wrapping aIsmFormat.IsmShardCoder
used to encode the shard index. SeeListCoder
for its encoding specification andIsmFormat.IsmShardCoder
for its encoding specification.
-
-
Constructor Details
-
IsmFormat
public IsmFormat()
-
-
Method Details
-
validateCoderIsCompatible
Validates that the key portion of the given coder is deterministic. -
isMetadataKey
Returns true if and only if any of the passed in key components represent a metadata key. -
getMetadataKey
An object representing a wild card for a key component. Encoded usingIsmFormat.MetadataKeyCoder
.
-