T
- Type of elements read by the source.@Experimental(value=SOURCE_SINK) public abstract class Source<T> extends java.lang.Object implements java.io.Serializable, HasDisplayData
Source
for reading the input.
This class is not intended to be subclassed directly. Instead, to define
a bounded source (a source which produces a finite amount of input), subclass
BoundedSource
; to define an unbounded source, subclass UnboundedSource
.
A Source
passed to a Read
transform must be
Serializable
. This allows the Source
instance
created in this "main program" to be sent (in serialized form) to
remote worker machines and reconstituted for each batch of elements
of the input PCollection
being processed or for each source splitting
operation. A Source
can have instance variable state, and
non-transient instance variable state will be serialized in the main program
and then deserialized on remote worker machines.
Source
classes MUST be effectively immutable. The only acceptable use of
mutable fields is to cache the results of expensive operations, and such fields MUST be
marked transient
.
Source
objects should override Object.toString()
, as it will be
used in important error and debugging messages.
Modifier and Type | Class and Description |
---|---|
static class |
Source.Reader<T>
The interface that readers of custom input sources must implement.
|
Constructor and Description |
---|
Source() |
Modifier and Type | Method and Description |
---|---|
abstract Coder<T> |
getDefaultOutputCoder()
Returns the default
Coder to use for the data read from this source. |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
abstract void |
validate()
Checks that this source is valid, before it can be used in a pipeline.
|
public abstract void validate()
It is recommended to use Preconditions
for implementing
this method.
public abstract Coder<T> getDefaultOutputCoder()
Coder
to use for the data read from this source.public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call
super.populateDisplayData(builder)
in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder)
to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
builder
- The builder to populate with display data.HasDisplayData