@Experimental(value=SCHEMAS) public class JsonToRow extends java.lang.Object
Creates a PTransform
to convert input JSON objects to Rows
with given
Schema
.
Currently supported Schema
field types are:
Schema.TypeName#BYTE
Schema.TypeName#INT16
Schema.TypeName#INT32
Schema.TypeName#INT64
Schema.TypeName#FLOAT
Schema.TypeName#DOUBLE
Schema.TypeName#BOOLEAN
Schema.TypeName#STRING
For specifics of JSON deserialization see RowJson.RowJsonDeserializer
.
Conversion is strict, with minimal type coercion:
Booleans are only parsed from true
or false
literals, not from "true"
or "false"
strings or any other values (exception is thrown in these cases).
If a JSON number doesn't fit into the corresponding schema field type, an exception is be thrown. Strings are not auto-converted to numbers. Floating point numbers are not auto-converted to integral numbers. Precision loss also causes exceptions.
Only JSON string values can be parsed into Schema.TypeName.STRING
. Numbers, booleans are not
automatically converted, exceptions are thrown in these cases.
If a schema field is missing from the JSON value, by default the field will be assumed to have
a null value, and will be converted into a null in the row if the schema has this field being
nullable. This behavior can be changed by setting the RowJson.RowJsonDeserializer.NullBehavior
using the withSchemaAndNullBehavior(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior)
. For example, setting it with RowJson.RowJsonDeserializer.NullBehavior.REQUIRE_NULL
means that JSON values must be null to be parsed as null, otherwise an
error will be thrown, as with previous versions of Beam.
Modifier and Type | Class and Description |
---|---|
static class |
JsonToRow.JsonToRowWithErrFn |
static class |
JsonToRow.ParseResult
The result of a
withExceptionReporting(Schema) transform. |
Constructor and Description |
---|
JsonToRow() |
Modifier and Type | Method and Description |
---|---|
static JsonToRow.JsonToRowWithErrFn |
withExceptionReporting(Schema rowSchema)
Enable Exception Reporting support.
|
static PTransform<PCollection<java.lang.String>,PCollection<Row>> |
withSchema(Schema rowSchema) |
static PTransform<PCollection<java.lang.String>,PCollection<Row>> |
withSchemaAndNullBehavior(Schema rowSchema,
org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior) |
public static PTransform<PCollection<java.lang.String>,PCollection<Row>> withSchema(Schema rowSchema)
public static PTransform<PCollection<java.lang.String>,PCollection<Row>> withSchemaAndNullBehavior(Schema rowSchema, org.apache.beam.sdk.util.RowJson.RowJsonDeserializer.NullBehavior nullBehavior)
@Experimental(value=SCHEMAS) public static JsonToRow.JsonToRowWithErrFn withExceptionReporting(Schema rowSchema)
JsonToRow.ParseResult
You can access the results by using withExceptionReporting(Schema)
:
ParseResult results = jsonPersons.apply(JsonToRow.withExceptionReporting(PERSON_SCHEMA));
Then access the parsed results via, JsonToRow.ParseResult.getResults()
And access the failed to parse results via, JsonToRow.ParseResult.getFailedToParseLines()
This will produce a Row with Schema JsonToRow.JsonToRowWithErrFn.ERROR_ROW_SCHEMA
To access the reason for the failure you will need to first enable extended error reporting.
JsonToRow.JsonToRowWithErrFn.withExtendedErrorInfo()
This will provide access to the reason for the Parse failure. The call to JsonToRow.ParseResult.getFailedToParseLines()
will produce a Row with Schema JsonToRow.JsonToRowWithErrFn.ERROR_ROW_WITH_ERR_MSG_SCHEMA
JsonToRow.JsonToRowWithErrFn