@Experimental(value=SCHEMAS) public class Select<T> extends PTransform<PCollection<T>,PCollection<Row>>
PTransform
for selecting a subset of fields from a schema type.
This transforms allows projecting out a subset of fields from a schema type. The output of
this transform is of type Row
, though that can be converted into any other type with
matching schema using the Convert
transform.
For example, consider the following POJO type:
{@literal @}DefaultSchema(JavaFieldSchema.class)
public class UserEvent {
public String userId;
public String eventId;
public int eventType;
public Location location;
}
{@literal @}DefaultSchema(JavaFieldSchema.class)
public class Location {
public double latitude;
public double longtitude;
}
Say you want to select just the set of userId, eventId pairs from each element, you would write
the following:
PCollection<UserEvent> events = readUserEvents();
PCollection<Row> rows = event.apply(Select.fieldNames("userId", "eventId"));
It's possible to select a nested field as well. For example, if you want just the location
information from each element:
PCollection<UserEvent> events = readUserEvents();
PCollection<Row> rows = event.apply(Select.fieldAccess(FieldAccessDescriptor.create()
.withNestedField("location",
FieldAccessDescriptor.withAllFields())));
name
Modifier and Type | Method and Description |
---|---|
PCollection<Row> |
expand(PCollection<T> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
static <T> Select<T> |
fieldAccess(FieldAccessDescriptor fieldAccessDescriptor)
Select a set of fields described in a
FieldAccessDescriptor . |
static <T> Select<T> |
fieldIds(java.lang.Integer... ids)
Select a set of top-level field ids from the row.
|
static <T> Select<T> |
fieldNames(java.lang.String... names)
Select a set of top-level field names from the row.
|
compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <T> Select<T> fieldIds(java.lang.Integer... ids)
public static <T> Select<T> fieldNames(java.lang.String... names)
public static <T> Select<T> fieldAccess(FieldAccessDescriptor fieldAccessDescriptor)
FieldAccessDescriptor
.
This allows for nested fields to be selected as well.
public PCollection<Row> expand(PCollection<T> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<T>,PCollection<Row>>