Class JsonToRow
PTransform to convert input JSON objects to Rows with given Schema.
Currently supported Schema field types are:
Schema.TypeName.BYTESchema.TypeName.INT16Schema.TypeName.INT32Schema.TypeName.INT64Schema.TypeName.FLOATSchema.TypeName.DOUBLESchema.TypeName.BOOLEANSchema.TypeName.STRING
For specifics of JSON deserialization see RowJson.RowJsonDeserializer.
Conversion is strict, with minimal type coercion:
Booleans are only parsed from true or false literals, not from "true"
or "false" strings or any other values (exception is thrown in these cases).
If a JSON number doesn't fit into the corresponding schema field type, an exception is be thrown. Strings are not auto-converted to numbers. Floating point numbers are not auto-converted to integral numbers. Precision loss also causes exceptions.
Only JSON string values can be parsed into Schema.TypeName.STRING. Numbers, booleans are not
automatically converted, exceptions are thrown in these cases.
If a schema field is missing from the JSON value, by default the field will be assumed to have
a null value, and will be converted into a null in the row if the schema has this field being
nullable. This behavior can be changed by setting the RowJson.RowJsonDeserializer.NullBehavior using the withSchemaAndNullBehavior(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior). For example, setting it with RowJson.RowJsonDeserializer.NullBehavior.REQUIRE_NULL means that JSON values must be null to be parsed as null, otherwise an
error will be thrown, as with previous versions of Beam.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classstatic classThe result of awithExceptionReporting(Schema)transform. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic JsonToRow.JsonToRowWithErrFnwithExceptionReporting(Schema rowSchema) Enable Exception Reporting support.static PTransform<PCollection<String>, PCollection<Row>> withSchema(Schema rowSchema) static PTransform<PCollection<String>, PCollection<Row>> withSchemaAndNullBehavior(Schema rowSchema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior)
-
Constructor Details
-
JsonToRow
public JsonToRow()
-
-
Method Details
-
withSchema
-
withSchemaAndNullBehavior
public static PTransform<PCollection<String>,PCollection<Row>> withSchemaAndNullBehavior(Schema rowSchema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior) -
withExceptionReporting
Enable Exception Reporting support. If this value is set errors in the parsing layer are returned as Row objects within aJsonToRow.ParseResultYou can access the results by using
withExceptionReporting(Schema):ParseResult results = jsonPersons.apply(JsonToRow.withExceptionReporting(PERSON_SCHEMA));
Then access the parsed results via,
JsonToRow.ParseResult.getResults()And access the failed to parse results via,
JsonToRow.ParseResult.getFailedToParseLines()This will produce a Row with Schema
JsonToRow.JsonToRowWithErrFn.ERROR_ROW_SCHEMATo access the reason for the failure you will need to first enable extended error reporting.
JsonToRow.JsonToRowWithErrFn.withExtendedErrorInfo()This will provide access to the reason for the Parse failure. The call to
JsonToRow.ParseResult.getFailedToParseLines()will produce a Row with SchemaJsonToRow.JsonToRowWithErrFn.ERROR_ROW_WITH_ERR_MSG_SCHEMA- Returns:
JsonToRow.JsonToRowWithErrFn
-