Package org.apache.beam.sdk.io
@DefaultAnnotation(org.checkerframework.checker.nullness.qual.NonNull.class)
package org.apache.beam.sdk.io
Defines transforms for reading and writing common storage formats, including
, and
invalid reference
org.apache.beam.sdk.io.AvroIO
TextIO
.
The classes in this package provide Read
transforms that create PCollections from
existing storage:
PCollection<TableRow> inputData = pipeline.apply(
BigQueryIO.readTableRows().from("apache-beam-testing.samples.weather_stations"));
and Write
transforms that persist PCollections to external storage:
PCollection<Integer> numbers = ...;
numbers.apply(TextIO.write().to("gs://my_bucket/path/to/numbers"));
-
ClassDescriptionA
BlockBasedSource
is aFileBasedSource
where a file consists of blocks of records.ABlock
represents a block of records that can be read.AReader
that reads records from aBlockBasedSource
.PTransform
that reads a bounded amount of data from anUnboundedSource
, specified as one or both of a maximum number of elements or a maximum period of time to read.ASource
that reads a finite amount of input and, because of that, supports some additional operations.AReader
that reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.A read-onlyFileSystem
implementation looking up resources using a ClassLoader.AutoService
registrar for theClassLoaderFileSystem
.A Source that reads from compressed files.Reader for aCompressedSource
.Deprecated.Factory interface for creating channels that decompress the content of an underlying channel.Various compression types for reading/writing files.Most users should useGenerateSequence
instead.The checkpoint for an unboundedCountingSource
is simply the last value produced.A custom coder forCounterMark
.A defaultFileBasedSink.FilenamePolicy
for windowed and unwindowed files.Encapsulates constructor parameters toDefaultFilenamePolicy
.A Coder forDefaultFilenamePolicy.Params
.Some helper classes that derive fromFileBasedSink.DynamicDestinations
.FileBasedSink<UserT,DestinationT, OutputT> Abstract class for file-based output.Deprecated.useCompression
.FileBasedSink.DynamicDestinations<UserT,DestinationT, OutputT> A class that allows value-dependent writes inFileBasedSink
.A naming policy for output files.FileBasedSink.FileResult<DestinationT>Result of a single bundle write.FileBasedSink.FileResultCoder<DestinationT>A coder forFileBasedSink.FileResult
objects.Provides hints about how to generate output files, such as a suggested filename suffix (e.g.Implementations create instances ofWritableByteChannel
used byFileBasedSink
and related classes to allow decorating, or otherwise transforming, the raw data that would normally be written directly to theWritableByteChannel
passed intoFileBasedSink.WritableByteChannelFactory.create(WritableByteChannel)
.FileBasedSink.WriteOperation<DestinationT,OutputT> Abstract operation that manages the process of writing toFileBasedSink
.FileBasedSink.Writer<DestinationT,OutputT> Abstract writer that writes a bundle to aFileBasedSink
.A common base class for all file-basedSource
s.Areader
that implements code common to readers ofFileBasedSource
s.A givenFileBasedSource
represents a file resource of one of these types.General-purpose transforms for working with files: listing files (matching), reading and writing.Implementation ofFileIO.match()
.Implementation ofFileIO.matchAll()
.Describes configuration for matching filepatterns, such asEmptyMatchTreatment
and continuous watching for matching files.A utility class for accessing a potentially compressed file.Implementation ofFileIO.readMatches()
.Enum to control how directories are handled.FileIO.Sink<ElementT>Specifies how to write elements to individual files inFileIO.write()
andFileIO.writeDynamic()
.FileIO.Write<DestinationT,UserT> Implementation ofFileIO.write()
andFileIO.writeDynamic()
.A policy for generating names for shard files.FileSystem<ResourceIdT extends ResourceId>File system interface in Beam.A registrar that createsFileSystem
instances fromPipelineOptions
.Clients facingFileSystem
utility.APTransform
that produces longs starting from the given value, and either up to the given limit or untilLong.MAX_VALUE
/ until the given time elapses.Exposes GenerateSequence as an external transform for cross-language usage.Parameters class to expose the transform to an external SDK.AutoService
registrar for theLocalFileSystem
.Helper functions for producing aResourceId
that references a local file or directory.ABoundedSource
that uses offsets to define starting and ending positions.ASource.Reader
that implements code common to readers of allOffsetBasedSource
s.APTransform
for reading from aSource
.Read.Bounded<T>PTransform
that reads from aBoundedSource
.Helper class for buildingRead
transforms.PTransform
that reads from aUnboundedSource
.ACoder
forFileIO.ReadableFile
.Reads each file in the inputPCollection
ofFileIO.ReadableFile
using given parameters for splitting files into offset ranges and for creating aFileBasedSource
for a file.A class to handle errors which occur during file reads.Reads each file of the inputPCollection
and outputs each element as the value of aKV
, where the key is the filename from which that value came.ShardingFunction<UserT,DestinationT> Function for assigningShardedKey
s to input elements for shardedWriteFiles
.Standard shard naming templates.Source<T>Base class for defining input formats and creating aSource
for reading the input.The interface that readers of custom input sources must implement.PTransform
s for reading and writing text files.Deprecated.UseCompression
.Implementation ofTextIO.read()
.Deprecated.SeeTextIO.readAll()
for details.Implementation ofTextIO.readFiles()
.Implementation ofTextIO.sink()
.TextIO.TypedWrite<UserT,DestinationT> Implementation ofTextIO.write()
.This class is used as the default return value ofTextIO.write()
.This returns a row count estimation for files associated with a file pattern.Builder forTextRowCountEstimator
.This strategy stops sampling if we sample enough number of bytes.This strategy stops sampling when total number of sampled bytes are more than some threshold.An exception that will be thrown if the estimator cannot get an estimation of the number of lines.This strategy samples all the files.Sampling Strategy shows us when should we stop reading further files.Implementation detail ofTextIO.Read
.PTransform
s for reading and writing TensorFlow TFRecord files.Deprecated.UseCompression
.Implementation ofTFRecordIO.read()
.Implementation ofTFRecordIO.readFiles()
.Implementation ofTFRecordIO.write()
.Configuration for reading from TFRecord.Builder forTFRecordReadSchemaTransformConfiguration
.Configuration for reading from TFRecord.UnboundedSource<OutputT,CheckpointMarkT extends UnboundedSource.CheckpointMark> ASource
that reads an unbounded amount of input and, because of that, supports some additional operations such as checkpointing, watermarks, and record ids.A marker representing the progress and state of anUnboundedSource.UnboundedReader
.A checkpoint mark that does nothing when finalized.UnboundedSource.UnboundedReader<OutputT>AReader
that reads an unbounded amount of input.WriteFiles<UserT,DestinationT, OutputT> APTransform
that writes to aFileBasedSink
.WriteFilesResult<DestinationT>The result of aWriteFiles
transform.
Compression
instead