@Experimental(value=SCHEMAS) public class JsonToRow extends java.lang.Object
Creates a PTransform to convert input JSON objects to Rows with given
Schema.
Currently supported Schema field types are:
Schema.TypeName#BYTE
Schema.TypeName#INT16
Schema.TypeName#INT32
Schema.TypeName#INT64
Schema.TypeName#FLOAT
Schema.TypeName#DOUBLE
Schema.TypeName#BOOLEAN
Schema.TypeName#STRING
For specifics of JSON deserialization see RowJson.RowJsonDeserializer.
Conversion is strict, with minimal type coercion:
Booleans are only parsed from true or false literals, not from "true"
or "false" strings or any other values (exception is thrown in these cases).
If a JSON number doesn't fit into the corresponding schema field type, an exception is be thrown. Strings are not auto-converted to numbers. Floating point numbers are not auto-converted to integral numbers. Precision loss also causes exceptions.
Only JSON string values can be parsed into Schema.TypeName.STRING. Numbers, booleans are not
automatically converted, exceptions are thrown in these cases.
If a schema field is missing from the JSON value, by default the field will be assumed to have
a null value, and will be converted into a null in the row if the schema has this field being
nullable. This behavior can be changed by setting the RowJson.RowJsonDeserializer.NullBehavior using the withSchemaAndNullBehavior(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior). For example, setting it with RowJson.RowJsonDeserializer.NullBehavior.REQUIRE_NULL means that JSON values must be null to be parsed as null, otherwise an
error will be thrown, as with previous versions of Beam.
| Modifier and Type | Class and Description |
|---|---|
static class |
JsonToRow.JsonToRowWithErrFn |
static class |
JsonToRow.ParseResult
The result of a
withExceptionReporting(Schema) transform. |
| Constructor and Description |
|---|
JsonToRow() |
| Modifier and Type | Method and Description |
|---|---|
static JsonToRow.JsonToRowWithErrFn |
withExceptionReporting(Schema rowSchema)
Enable Exception Reporting support.
|
static PTransform<PCollection<java.lang.String>,PCollection<Row>> |
withSchema(Schema rowSchema) |
static PTransform<PCollection<java.lang.String>,PCollection<Row>> |
withSchemaAndNullBehavior(Schema rowSchema,
org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior) |
public static PTransform<PCollection<java.lang.String>,PCollection<Row>> withSchema(Schema rowSchema)
public static PTransform<PCollection<java.lang.String>,PCollection<Row>> withSchemaAndNullBehavior(Schema rowSchema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior)
@Experimental(value=SCHEMAS) public static JsonToRow.JsonToRowWithErrFn withExceptionReporting(Schema rowSchema)
JsonToRow.ParseResult
You can access the results by using withExceptionReporting(Schema):
ParseResult results = jsonPersons.apply(JsonToRow.withExceptionReporting(PERSON_SCHEMA));
Then access the parsed results via, JsonToRow.ParseResult.getResults()
And access the failed to parse results via, JsonToRow.ParseResult.getFailedToParseLines()
This will produce a Row with Schema JsonToRow.JsonToRowWithErrFn.ERROR_ROW_SCHEMA
To access the reason for the failure you will need to first enable extended error reporting.
JsonToRow.JsonToRowWithErrFn.withExtendedErrorInfo()
This will provide access to the reason for the Parse failure. The call to JsonToRow.ParseResult.getFailedToParseLines() will produce a Row with Schema JsonToRow.JsonToRowWithErrFn.ERROR_ROW_WITH_ERR_MSG_SCHEMA
JsonToRow.JsonToRowWithErrFn