public abstract class DynamicDestinations<T,DestinationT>
extends java.lang.Object
implements java.io.Serializable
For example, consider a PCollection of events, each containing a user-id field. You want to write each user's events to a separate table with a separate schema per user. Since the user-id field is a string, you will represent the destination as a string.
events.apply(BigQueryIO.<UserEvent>write()
.to(new DynamicDestinations<UserEvent, String>() {
public String getDestination(ValueInSingleWindow<UserEvent> element) {
return element.getValue().getUserId();
}
public TableDestination getTable(String user) {
return new TableDestination(tableForUser(user), "Table for user " + user);
}
public TableSchema getSchema(String user) {
return tableSchemaForUser(user);
}
})
.withFormatFunction(new SerializableFunction<UserEvent, TableRow>() {
public TableRow apply(UserEvent event) {
return convertUserEventToTableRow(event);
}
}));
An instance of DynamicDestinations
can also use side inputs using sideInput(PCollectionView)
. The side inputs must be present in getSideInputs()
. Side
inputs are accessed in the global window, so they must be globally windowed.
DestinationT
is expected to provide proper hash and equality members. Ideally it will
be a compact type with an efficient coder, as these objects may be used as a key in a GroupByKey
.
Constructor and Description |
---|
DynamicDestinations() |
Modifier and Type | Method and Description |
---|---|
abstract DestinationT |
getDestination(@Nullable ValueInSingleWindow<T> element)
Returns an object that represents at a high level which table is being written to.
|
@Nullable Coder<DestinationT> |
getDestinationCoder()
Returns the coder for
DestinationT . |
abstract @Nullable TableSchema |
getSchema(DestinationT destination)
Returns the table schema for the destination.
|
java.util.List<PCollectionView<?>> |
getSideInputs()
Specifies that this object needs access to one or more side inputs.
|
abstract TableDestination |
getTable(DestinationT destination)
Returns a
TableDestination object for the destination. |
@Nullable TableConstraints |
getTableConstraints(DestinationT destination)
Returns TableConstraints (including primary and foreign key) to be used when creating the
table.
|
protected <SideInputT> |
sideInput(PCollectionView<SideInputT> view)
Returns the value of a given side input.
|
public java.util.List<PCollectionView<?>> getSideInputs()
protected final <SideInputT> SideInputT sideInput(PCollectionView<SideInputT> view)
getSideInputs()
.public abstract DestinationT getDestination(@Nullable ValueInSingleWindow<T> element)
The method must return a unique object for different destination tables involved over all BigQueryIO write transforms in the same pipeline. See https://github.com/apache/beam/issues/32335 for details.
public @Nullable Coder<DestinationT> getDestinationCoder()
DestinationT
. If this is not overridden, then BigQueryIO
will look in the coder registry for a suitable coder. This must be a deterministic coder, as
DestinationT
will be used as a key type in a GroupByKey
.public abstract TableDestination getTable(DestinationT destination)
TableDestination
object for the destination. May not return null. Return
value needs to be unique to each destination: may not return the same TableDestination
for different destinations.public abstract @Nullable TableSchema getSchema(DestinationT destination)
public @Nullable TableConstraints getTableConstraints(DestinationT destination)