Class JsonToRow
PTransform
to convert input JSON objects to Rows
with given Schema
.
Currently supported Schema
field types are:
Schema.TypeName.BYTE
Schema.TypeName.INT16
Schema.TypeName.INT32
Schema.TypeName.INT64
Schema.TypeName.FLOAT
Schema.TypeName.DOUBLE
Schema.TypeName.BOOLEAN
Schema.TypeName.STRING
For specifics of JSON deserialization see RowJson.RowJsonDeserializer
.
Conversion is strict, with minimal type coercion:
Booleans are only parsed from true
or false
literals, not from "true"
or "false"
strings or any other values (exception is thrown in these cases).
If a JSON number doesn't fit into the corresponding schema field type, an exception is be thrown. Strings are not auto-converted to numbers. Floating point numbers are not auto-converted to integral numbers. Precision loss also causes exceptions.
Only JSON string values can be parsed into Schema.TypeName.STRING
. Numbers, booleans are not
automatically converted, exceptions are thrown in these cases.
If a schema field is missing from the JSON value, by default the field will be assumed to have
a null value, and will be converted into a null in the row if the schema has this field being
nullable. This behavior can be changed by setting the RowJson.RowJsonDeserializer.NullBehavior
using the withSchemaAndNullBehavior(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior)
. For example, setting it with RowJson.RowJsonDeserializer.NullBehavior.REQUIRE_NULL
means that JSON values must be null to be parsed as null, otherwise an
error will be thrown, as with previous versions of Beam.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
static class
The result of awithExceptionReporting(Schema)
transform. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic JsonToRow.JsonToRowWithErrFn
withExceptionReporting
(Schema rowSchema) Enable Exception Reporting support.static PTransform
<PCollection<String>, PCollection<Row>> withSchema
(Schema rowSchema) static PTransform
<PCollection<String>, PCollection<Row>> withSchemaAndNullBehavior
(Schema rowSchema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior)
-
Constructor Details
-
JsonToRow
public JsonToRow()
-
-
Method Details
-
withSchema
-
withSchemaAndNullBehavior
public static PTransform<PCollection<String>,PCollection<Row>> withSchemaAndNullBehavior(Schema rowSchema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior) -
withExceptionReporting
Enable Exception Reporting support. If this value is set errors in the parsing layer are returned as Row objects within aJsonToRow.ParseResult
You can access the results by using
withExceptionReporting(Schema)
:ParseResult results = jsonPersons.apply(JsonToRow.withExceptionReporting(PERSON_SCHEMA));
Then access the parsed results via,
JsonToRow.ParseResult.getResults()
And access the failed to parse results via,
JsonToRow.ParseResult.getFailedToParseLines()
This will produce a Row with Schema
JsonToRow.JsonToRowWithErrFn.ERROR_ROW_SCHEMA
To access the reason for the failure you will need to first enable extended error reporting.
JsonToRow.JsonToRowWithErrFn.withExtendedErrorInfo()
This will provide access to the reason for the Parse failure. The call to
JsonToRow.ParseResult.getFailedToParseLines()
will produce a Row with SchemaJsonToRow.JsonToRowWithErrFn.ERROR_ROW_WITH_ERR_MSG_SCHEMA
- Returns:
JsonToRow.JsonToRowWithErrFn
-