Class GroupByEncryptedKey<K,V>

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Iterable<V>>>>
org.apache.beam.sdk.transforms.GroupByEncryptedKey<K,V>
All Implemented Interfaces:
Serializable, HasDisplayData

public class GroupByEncryptedKey<K,V> extends PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Iterable<V>>>>
A PTransform that provides a secure alternative to GroupByKey.

This transform encrypts the keys of the input PCollection, performs a GroupByKey on the encrypted keys, and then decrypts the keys in the output. This is useful when the keys contain sensitive data that should not be stored at rest by the runner.

The transform requires a Secret which returns a base64 encoded 32 byte secret which can be used to generate a SecretKeySpec object using the HmacSHA256 algorithm.

Note the following caveats: 1) Runners can implement arbitrary materialization steps, so this does not guarantee that the whole pipeline will not have unencrypted data at rest by itself. 2) If using this transform in streaming mode, this transform may not properly handle update compatibility checks around coders. This means that an improper update could lead to invalid coders, causing pipeline failure or data corruption. If you need to update, make sure that the input type passed into this transform does not change.

See Also:
  • Method Details

    • create

      public static <K, V> GroupByEncryptedKey<K,V> create(org.apache.beam.sdk.util.Secret hmacKey)
      Creates a GroupByEncryptedKey transform.
      Type Parameters:
      K - The type of the keys in the input PCollection.
      V - The type of the values in the input PCollection.
      Parameters:
      hmacKey - The Secret key to use for encryption.
      Returns:
      A GroupByEncryptedKey transform.
    • createWithCustomGbk

      public static <K, V> GroupByEncryptedKey<K,V> createWithCustomGbk(org.apache.beam.sdk.util.Secret hmacKey, PTransform<PCollection<KV<byte[],KV<byte[],byte[]>>>,PCollection<KV<byte[],Iterable<KV<byte[],byte[]>>>>> gbk)
      Creates a GroupByEncryptedKey transform with a custom GBK in the middle.
      Type Parameters:
      K - The type of the keys in the input PCollection.
      V - The type of the values in the input PCollection.
      Parameters:
      hmacKey - The Secret key to use for encryption.
      gbk - The custom GBK transform to use in the middle of the GBEK.
      Returns:
      A GroupByEncryptedKey transform.
    • expand

      public PCollection<KV<K,Iterable<V>>> expand(PCollection<KV<K,V>> input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Iterable<V>>>>