Package org.apache.beam.sdk.io
@DefaultAnnotation(org.checkerframework.checker.nullness.qual.NonNull.class)
package org.apache.beam.sdk.io
Defines transforms for reading and writing common storage formats, including
, and
invalid reference
org.apache.beam.sdk.io.AvroIO
TextIO.
The classes in this package provide Read transforms that create PCollections from
existing storage:
PCollection<TableRow> inputData = pipeline.apply(
BigQueryIO.readTableRows().from("apache-beam-testing.samples.weather_stations"));
and Write transforms that persist PCollections to external storage:
PCollection<Integer> numbers = ...;
numbers.apply(TextIO.write().to("gs://my_bucket/path/to/numbers"));
-
ClassDescriptionA
BlockBasedSourceis aFileBasedSourcewhere a file consists of blocks of records.ABlockrepresents a block of records that can be read.AReaderthat reads records from aBlockBasedSource.PTransformthat reads a bounded amount of data from anUnboundedSource, specified as one or both of a maximum number of elements or a maximum period of time to read.ASourcethat reads a finite amount of input and, because of that, supports some additional operations.AReaderthat reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.A read-onlyFileSystemimplementation looking up resources using a ClassLoader.AutoServiceregistrar for theClassLoaderFileSystem.A Source that reads from compressed files.Reader for aCompressedSource.Deprecated.Factory interface for creating channels that decompress the content of an underlying channel.Various compression types for reading/writing files.Most users should useGenerateSequenceinstead.The checkpoint for an unboundedCountingSourceis simply the last value produced.A custom coder forCounterMark.A defaultFileBasedSink.FilenamePolicyfor windowed and unwindowed files.Encapsulates constructor parameters toDefaultFilenamePolicy.A Coder forDefaultFilenamePolicy.Params.Some helper classes that derive fromFileBasedSink.DynamicDestinations.FileBasedSink<UserT,DestinationT, OutputT> Abstract class for file-based output.Deprecated.useCompression.FileBasedSink.DynamicDestinations<UserT,DestinationT, OutputT> A class that allows value-dependent writes inFileBasedSink.A naming policy for output files.FileBasedSink.FileResult<DestinationT>Result of a single bundle write.FileBasedSink.FileResultCoder<DestinationT>A coder forFileBasedSink.FileResultobjects.Provides hints about how to generate output files, such as a suggested filename suffix (e.g.Implementations create instances ofWritableByteChannelused byFileBasedSinkand related classes to allow decorating, or otherwise transforming, the raw data that would normally be written directly to theWritableByteChannelpassed intoFileBasedSink.WritableByteChannelFactory.create(WritableByteChannel).FileBasedSink.WriteOperation<DestinationT,OutputT> Abstract operation that manages the process of writing toFileBasedSink.FileBasedSink.Writer<DestinationT,OutputT> Abstract writer that writes a bundle to aFileBasedSink.A common base class for all file-basedSources.Areaderthat implements code common to readers ofFileBasedSources.A givenFileBasedSourcerepresents a file resource of one of these types.General-purpose transforms for working with files: listing files (matching), reading and writing.Implementation ofFileIO.match().Implementation ofFileIO.matchAll().Describes configuration for matching filepatterns, such asEmptyMatchTreatmentand continuous watching for matching files.A utility class for accessing a potentially compressed file.Implementation ofFileIO.readMatches().Enum to control how directories are handled.FileIO.Sink<ElementT>Specifies how to write elements to individual files inFileIO.write()andFileIO.writeDynamic().FileIO.Write<DestinationT,UserT> Implementation ofFileIO.write()andFileIO.writeDynamic().A policy for generating names for shard files.FileSystem<ResourceIdT extends ResourceId>File system interface in Beam.A registrar that createsFileSysteminstances fromPipelineOptions.Clients facingFileSystemutility.APTransformthat produces longs starting from the given value, and either up to the given limit or untilLong.MAX_VALUE/ until the given time elapses.Exposes GenerateSequence as an external transform for cross-language usage.Parameters class to expose the transform to an external SDK.AutoServiceregistrar for theLocalFileSystem.Helper functions for producing aResourceIdthat references a local file or directory.ABoundedSourcethat uses offsets to define starting and ending positions.ASource.Readerthat implements code common to readers of allOffsetBasedSources.APTransformfor reading from aSource.Read.Bounded<T>PTransformthat reads from aBoundedSource.Helper class for buildingReadtransforms.PTransformthat reads from aUnboundedSource.ACoderforFileIO.ReadableFile.Reads each file in the inputPCollectionofFileIO.ReadableFileusing given parameters for splitting files into offset ranges and for creating aFileBasedSourcefor a file.A class to handle errors which occur during file reads.Reads each file of the inputPCollectionand outputs each element as the value of aKV, where the key is the filename from which that value came.ShardingFunction<UserT,DestinationT> Function for assigningShardedKeys to input elements for shardedWriteFiles.Standard shard naming templates.Source<T>Base class for defining input formats and creating aSourcefor reading the input.The interface that readers of custom input sources must implement.PTransforms for reading and writing text files.Deprecated.UseCompression.Implementation ofTextIO.read().Deprecated.SeeTextIO.readAll()for details.Implementation ofTextIO.readFiles().Implementation ofTextIO.sink().TextIO.TypedWrite<UserT,DestinationT> Implementation ofTextIO.write().This class is used as the default return value ofTextIO.write().This returns a row count estimation for files associated with a file pattern.Builder forTextRowCountEstimator.This strategy stops sampling if we sample enough number of bytes.This strategy stops sampling when total number of sampled bytes are more than some threshold.An exception that will be thrown if the estimator cannot get an estimation of the number of lines.This strategy samples all the files.Sampling Strategy shows us when should we stop reading further files.Implementation detail ofTextIO.Read.PTransforms for reading and writing TensorFlow TFRecord files.Deprecated.UseCompression.Implementation ofTFRecordIO.read().Implementation ofTFRecordIO.readFiles().Implementation ofTFRecordIO.write().Configuration for reading from TFRecord.Builder forTFRecordReadSchemaTransformConfiguration.Configuration for reading from TFRecord.UnboundedSource<OutputT,CheckpointMarkT extends UnboundedSource.CheckpointMark> ASourcethat reads an unbounded amount of input and, because of that, supports some additional operations such as checkpointing, watermarks, and record ids.A marker representing the progress and state of anUnboundedSource.UnboundedReader.A checkpoint mark that does nothing when finalized.UnboundedSource.UnboundedReader<OutputT>AReaderthat reads an unbounded amount of input.WriteFiles<UserT,DestinationT, OutputT> APTransformthat writes to aFileBasedSink.WriteFilesResult<DestinationT>The result of aWriteFilestransform.
Compressioninstead