Class DynamicDestinations<T,DestinationT>

java.lang.Object
org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinations<T,DestinationT>
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
PortableBigQueryDestinations, StorageApiDynamicDestinationsTableRow

public abstract class DynamicDestinations<T,DestinationT> extends Object implements Serializable
This class provides the most general way of specifying dynamic BigQuery table destinations. Destinations can be extracted from the input element, and stored as a custom type. Mappings are provided to convert the destination into a BigQuery table reference and a BigQuery schema. The class can read side inputs while performing these mappings.

For example, consider a PCollection of events, each containing a user-id field. You want to write each user's events to a separate table with a separate schema per user. Since the user-id field is a string, you will represent the destination as a string.


 events.apply(BigQueryIO.<UserEvent>write()
  .to(new DynamicDestinations<UserEvent, String>() {
        public String getDestination(ValueInSingleWindow<UserEvent> element) {
          return element.getValue().getUserId();
        }
        public TableDestination getTable(String user) {
          return new TableDestination(tableForUser(user), "Table for user " + user);
        }
        public TableSchema getSchema(String user) {
          return tableSchemaForUser(user);
        }
      })
  .withFormatFunction(new SerializableFunction<UserEvent, TableRow>() {
     public TableRow apply(UserEvent event) {
       return convertUserEventToTableRow(event);
     }
   }));
 

An instance of DynamicDestinations can also use side inputs using sideInput(PCollectionView). The side inputs must be present in getSideInputs(). Side inputs are accessed in the global window, so they must be globally windowed.

DestinationT is expected to provide proper hash and equality members. Ideally it will be a compact type with an efficient coder, as these objects may be used as a key in a GroupByKey.

See Also:
  • Constructor Details

    • DynamicDestinations

      public DynamicDestinations()
  • Method Details

    • getSideInputs

      public List<PCollectionView<?>> getSideInputs()
      Specifies that this object needs access to one or more side inputs. This side inputs must be globally windowed, as they will be accessed from the global window.
    • sideInput

      protected final <SideInputT> SideInputT sideInput(PCollectionView<SideInputT> view)
      Returns the value of a given side input. The view must be present in getSideInputs().
    • getDestination

      public abstract DestinationT getDestination(@Nullable ValueInSingleWindow<T> element)
      Returns an object that represents at a high level which table is being written to. May not return null.

      The method must return a unique object for different destination tables involved over all BigQueryIO write transforms in the same pipeline. See https://github.com/apache/beam/issues/32335 for details.

    • getDestinationCoder

      public @Nullable Coder<DestinationT> getDestinationCoder()
      Returns the coder for DynamicDestinations. If this is not overridden, then BigQueryIO will look in the coder registry for a suitable coder. This must be a deterministic coder, as DynamicDestinations will be used as a key type in a GroupByKey.
    • getTable

      public abstract TableDestination getTable(DestinationT destination)
      Returns a TableDestination object for the destination. May not return null. Return value needs to be unique to each destination: may not return the same TableDestination for different destinations.
    • getSchema

      public abstract @Nullable TableSchema getSchema(DestinationT destination)
      Returns the table schema for the destination.
    • getTableConstraints

      public @Nullable TableConstraints getTableConstraints(DestinationT destination)
      Returns TableConstraints (including primary and foreign key) to be used when creating the table. Note: this is not currently supported when using FILE_LOADS!.