Class FileSystem<ResourceIdT extends ResourceId>

java.lang.Object
org.apache.beam.sdk.io.FileSystem<ResourceIdT>
Direct Known Subclasses:
ClassLoaderFileSystem

public abstract class FileSystem<ResourceIdT extends ResourceId> extends Object
File system interface in Beam.

It defines APIs for writing file systems agnostic code.

All methods are protected, and they are for file system providers to implement. Clients should use the FileSystems utility.

  • Constructor Details

    • FileSystem

      public FileSystem()
  • Method Details

    • match

      protected abstract List<MatchResult> match(List<String> specs) throws IOException
      This is the entry point to convert user-provided specs to ResourceIds. Callers should use match(java.util.List<java.lang.String>) to resolve users specs ambiguities before calling other methods.

      Implementation should handle the following ambiguities of a user-provided spec:

      1. spec could be a glob or a uri. match(java.util.List<java.lang.String>) should be able to tell and choose efficient implementations.
      2. The user-provided spec might refer to files or directories. It is common that users that wish to indicate a directory will omit the trailing /, such as in a spec of "/tmp/dir". The FileSystem should be able to recognize a directory with the trailing / omitted, but should always return a correct FileSystem (e.g., "/tmp/dir/" inside the returned MatchResult.

      All FileSystem implementations should support glob in the final hierarchical path component of FileSystem. This allows SDK libraries to construct file system agnostic spec. FileSystems can support additional patterns for user-provided specs.

      Returns:
      List<MatchResult> in the same order of the input specs.
      Throws:
      IllegalArgumentException - if specs are invalid.
      IOException - if all specs failed to match due to issues like: network connection, authorization. Exception for individual spec need to be deferred until callers retrieve metadata with MatchResult.metadata().
    • create

      protected abstract WritableByteChannel create(ResourceIdT resourceId, CreateOptions createOptions) throws IOException
      Returns a write channel for the given FileSystem.

      The resource is not expanded; it is used verbatim.

      Parameters:
      resourceId - the reference of the file-like resource to create
      createOptions - the configuration of the create operation
      Throws:
      IOException
    • open

      protected abstract ReadableByteChannel open(ResourceIdT resourceId) throws IOException
      Returns a read channel for the given FileSystem.

      The resource is not expanded; it is used verbatim.

      If seeking is supported, then this returns a SeekableByteChannel.

      Parameters:
      resourceId - the reference of the file-like resource to open
      Throws:
      IOException
    • copy

      protected abstract void copy(List<ResourceIdT> srcResourceIds, List<ResourceIdT> destResourceIds) throws IOException
      Copies a List of file-like resources from one location to another.

      The number of source resources must equal the number of destination resources. Destination resources will be created recursively.

      Parameters:
      srcResourceIds - the references of the source resources
      destResourceIds - the references of the destination resources
      Throws:
      FileNotFoundException - if the source resources are missing. When copy throws, each resource might or might not be copied. In such scenarios, callers can use match() to determine the state of the resources.
      IOException
    • rename

      protected abstract void rename(List<ResourceIdT> srcResourceIds, List<ResourceIdT> destResourceIds, MoveOptions... moveOptions) throws IOException
      Renames a List of file-like resources from one location to another.

      The number of source resources must equal the number of destination resources. Destination resources will be created recursively.

      Parameters:
      srcResourceIds - the references of the source resources
      destResourceIds - the references of the destination resources
      moveOptions - move options specifying handling of error conditions
      Throws:
      UnsupportedOperationException - if move options are specified and not supported by the FileSystem
      FileNotFoundException - if the source resources are missing. When rename throws, the state of the resources is unknown but safe: for every (source, destination) pair of resources, the following are possible: a) source exists, b) destination exists, c) source and destination both exist. Thus no data is lost, however, duplicated resource are possible. In such scenarios, callers can use match() to determine the state of the resource.
      IOException
    • delete

      protected abstract void delete(Collection<ResourceIdT> resourceIds) throws IOException
      Deletes a collection of resources.
      Parameters:
      resourceIds - the references of the resources to delete.
      Throws:
      FileNotFoundException - if resources are missing. When delete throws, each resource might or might not be deleted. In such scenarios, callers can use match() to determine the state of the resources.
      IOException
    • matchNewResource

      protected abstract ResourceIdT matchNewResource(String singleResourceSpec, boolean isDirectory)
      Returns a new ResourceId for this filesystem that represents the named resource. The user supplies both the resource spec and whether it is a directory.

      The supplied singleResourceSpec is expected to be in a proper format, including any necessary escaping, for this FileSystem.

      This function may throw an IllegalArgumentException if given an invalid argument, such as when the specified singleResourceSpec is not a valid resource name.

    • getScheme

      protected abstract String getScheme()
      Get the URI scheme which defines the namespace of the FileSystem.
      See Also:
    • reportLineage

      protected void reportLineage(ResourceIdT resourceId, Lineage lineage)
      Report Lineage metrics for resource id at file level.
    • reportLineage

      protected void reportLineage(ResourceIdT unusedId, Lineage unusedLineage, FileSystem.LineageLevel level)
      Report Lineage metrics for resource id to a given level.

      Unless override by FileSystem implementations, default to no-op.