public class BoundedDatasetFactory
extends java.lang.Object
| Modifier and Type | Method and Description | 
|---|---|
| static <T> org.apache.spark.sql.Dataset<org.apache.beam.sdk.util.WindowedValue<T>> | createDatasetFromRDD(org.apache.spark.sql.SparkSession session,
                    BoundedSource<T> source,
                    java.util.function.Supplier<PipelineOptions> options,
                    org.apache.spark.sql.Encoder<org.apache.beam.sdk.util.WindowedValue<T>> encoder) | 
| static <T> org.apache.spark.sql.Dataset<org.apache.beam.sdk.util.WindowedValue<T>> | createDatasetFromRows(org.apache.spark.sql.SparkSession session,
                     BoundedSource<T> source,
                     java.util.function.Supplier<PipelineOptions> options,
                     org.apache.spark.sql.Encoder<org.apache.beam.sdk.util.WindowedValue<T>> encoder) | 
public static <T> org.apache.spark.sql.Dataset<org.apache.beam.sdk.util.WindowedValue<T>> createDatasetFromRows(org.apache.spark.sql.SparkSession session,
                                                                                                                BoundedSource<T> source,
                                                                                                                java.util.function.Supplier<PipelineOptions> options,
                                                                                                                org.apache.spark.sql.Encoder<org.apache.beam.sdk.util.WindowedValue<T>> encoder)
Dataset for a BoundedSource via a Spark Table.
 Unfortunately tables are expected to return an InternalRow, requiring serialization.
 This makes this approach at the time being significantly less performant than creating a
 dataset from an RDD.
public static <T> org.apache.spark.sql.Dataset<org.apache.beam.sdk.util.WindowedValue<T>> createDatasetFromRDD(org.apache.spark.sql.SparkSession session,
                                                                                                               BoundedSource<T> source,
                                                                                                               java.util.function.Supplier<PipelineOptions> options,
                                                                                                               org.apache.spark.sql.Encoder<org.apache.beam.sdk.util.WindowedValue<T>> encoder)
Dataset for a BoundedSource via a Spark RDD.
 This is currently the most efficient approach as it avoid any serialization overhead.