Class FileSystems

java.lang.Object
org.apache.beam.sdk.io.FileSystems

public class FileSystems extends Object
Clients facing FileSystem utility.
  • Field Details

  • Constructor Details

    • FileSystems

      public FileSystems()
  • Method Details

    • hasGlobWildcard

      public static boolean hasGlobWildcard(String spec)
      Checks whether the given spec contains a glob wildcard character.
    • match

      public static List<MatchResult> match(List<String> specs) throws IOException
      This is the entry point to convert user-provided specs to ResourceIds. Callers should use match(java.util.List<java.lang.String>) to resolve users specs ambiguities before calling other methods.

      Implementation handles the following ambiguities of a user-provided spec:

      1. spec could be a glob or a uri. match(java.util.List<java.lang.String>) should be able to tell and choose efficient implementations.
      2. The user-provided spec might refer to files or directories. It is common that users that wish to indicate a directory will omit the trailing path delimiter, such as "/tmp/dir" in Linux. The FileSystem should be able to recognize a directory with the trailing path delimiter omitted, but should always return a correct ResourceId (e.g., "/tmp/dir/" inside the returned MatchResult.

      All FileSystem implementations should support glob in the final hierarchical path component of ResourceId. This allows SDK libraries to construct file system agnostic spec. FileSystems can support additional patterns for user-provided specs.

      In case the spec schemes don't match any known FileSystem implementations, FileSystems will attempt to use LocalFileSystem to resolve a path.

      Specs that do not match any resources are treated according to EmptyMatchTreatment.DISALLOW.

      Returns:
      List<MatchResult> in the same order of the input specs.
      Throws:
      IllegalArgumentException - if specs are invalid -- empty or have different schemes.
      IOException - if all specs failed to match due to issues like: network connection, authorization. Exception for individual spec is deferred until callers retrieve metadata with MatchResult.metadata().
    • match

      public static List<MatchResult> match(List<String> specs, EmptyMatchTreatment emptyMatchTreatment) throws IOException
      Like match(List), but with a configurable EmptyMatchTreatment.
      Throws:
      IOException
    • match

      public static MatchResult match(String spec) throws IOException
      Like match(List), but for a single resource specification.

      The function match(List) is preferred when matching multiple patterns, as it allows for bulk API calls to remote filesystems.

      Throws:
      IOException
    • match

      public static MatchResult match(String spec, EmptyMatchTreatment emptyMatchTreatment) throws IOException
      Like match(String), but with a configurable EmptyMatchTreatment.
      Throws:
      IOException
    • matchSingleFileSpec

      public static MatchResult.Metadata matchSingleFileSpec(String spec) throws IOException
      Returns the MatchResult.Metadata for a single file resource. Expects a resource specification spec that matches a single result.
      Parameters:
      spec - a resource specification that matches exactly one result.
      Returns:
      the MatchResult.Metadata for the specified resource.
      Throws:
      FileNotFoundException - if the file resource is not found.
      IOException - in the event of an error in the inner call to match(java.util.List<java.lang.String>), or if the given spec does not match exactly 1 result.
    • matchResources

      public static List<MatchResult> matchResources(List<ResourceId> resourceIds) throws IOException
      Returns MatchResults for the given resourceIds.
      Parameters:
      resourceIds - resourceIds that might be derived from match(java.util.List<java.lang.String>), ResourceId.resolve(java.lang.String, org.apache.beam.sdk.io.fs.ResolveOptions), or ResourceId.getCurrentDirectory().
      Throws:
      IOException - if all resourceIds failed to match due to issues like: network connection, authorization. Exception for individual ResourceId need to be deferred until callers retrieve metadata with MatchResult.metadata().
    • create

      public static WritableByteChannel create(ResourceId resourceId, String mimeType) throws IOException
      Returns a write channel for the given ResourceId.

      The resource is not expanded; it is used verbatim.

      Parameters:
      resourceId - the reference of the file-like resource to create
      mimeType - the mine type of the file-like resource to create
      Throws:
      IOException
    • create

      public static WritableByteChannel create(ResourceId resourceId, CreateOptions createOptions) throws IOException
      Returns a write channel for the given ResourceId with CreateOptions.

      The resource is not expanded; it is used verbatim.

      Parameters:
      resourceId - the reference of the file-like resource to create
      createOptions - the configuration of the create operation
      Throws:
      IOException
    • open

      public static ReadableByteChannel open(ResourceId resourceId) throws IOException
      Returns a read channel for the given ResourceId.

      The resource is not expanded; it is used verbatim.

      If seeking is supported, then this returns a SeekableByteChannel.

      Parameters:
      resourceId - the reference of the file-like resource to open
      Throws:
      IOException
    • copy

      public static void copy(List<ResourceId> srcResourceIds, List<ResourceId> destResourceIds, MoveOptions... moveOptions) throws IOException
      Copies a List of file-like resources from one location to another.

      The number of source resources must equal the number of destination resources. Destination resources will be created recursively.

      srcResourceIds and destResourceIds must have the same scheme.

      It doesn't support copying globs.

      Parameters:
      srcResourceIds - the references of the source resources
      destResourceIds - the references of the destination resources
      Throws:
      IOException
    • rename

      public static void rename(List<ResourceId> srcResourceIds, List<ResourceId> destResourceIds, MoveOptions... moveOptions) throws IOException
      Renames a List of file-like resources from one location to another.

      The number of source resources must equal the number of destination resources. Destination resources will be created recursively.

      srcResourceIds and destResourceIds must have the same scheme.

      It doesn't support renaming globs.

      Src files will be removed, even if the copy is skipped due to specified move options.

      Parameters:
      srcResourceIds - the references of the source resources
      destResourceIds - the references of the destination resources
      Throws:
      IOException
    • delete

      public static void delete(Collection<ResourceId> resourceIds, MoveOptions... moveOptions) throws IOException
      Deletes a collection of resources.

      resourceIds must have the same scheme.

      Parameters:
      resourceIds - the references of the resources to delete.
      Throws:
      IOException
    • reportSourceLineage

      public static void reportSourceLineage(ResourceId resourceId)
      Report source Lineage metrics for resource id.
    • reportSinkLineage

      public static void reportSinkLineage(ResourceId resourceId)
      Report sink Lineage metrics for resource id.
    • reportSourceLineage

      public static void reportSourceLineage(ResourceId resourceId, FileSystem.LineageLevel level)
      Report source Lineage metrics for resource id at given level.

      Internal API, no backward compatibility guaranteed.

    • reportSinkLineage

      public static void reportSinkLineage(ResourceId resourceId, FileSystem.LineageLevel level)
      Report source Lineage metrics for resource id at given level.

      Internal API, no backward compatibility guaranteed.

    • setDefaultPipelineOptions

      @Internal public static void setDefaultPipelineOptions(PipelineOptions options)
      Sets the default configuration in workers.

      It will be used in FileSystemRegistrars for all schemes.

      Outside of workers where Beam FileSystem API is used (e.g. test methods, user code executed during pipeline submission), consider use registerFileSystemsOnce(org.apache.beam.sdk.options.PipelineOptions) if initialize FileSystem of supported schema is the main goal.

    • registerFileSystemsOnce

      @Internal public static void registerFileSystemsOnce(PipelineOptions options)
      Register file systems once if never done before.

      This method executes setDefaultPipelineOptions(org.apache.beam.sdk.options.PipelineOptions) only if it has never been run, otherwise it returns immediately.

      It is internally used by test setup to avoid repeated filesystem registrations (involves expensive ServiceLoader calls) when there are multiple pipeline and PipelineOptions object initialized, which is commonly seen in test execution.

    • matchNewResource

      public static ResourceId matchNewResource(String singleResourceSpec, boolean isDirectory)
      Returns a new ResourceId that represents the named resource of a type corresponding to the resource type.

      The supplied singleResourceSpec is expected to be in a proper format, including any necessary escaping, for the underlying FileSystem.

      This function may throw an IllegalArgumentException if given an invalid argument, such as when the specified singleResourceSpec is not a valid resource name.

    • matchNewDirectory

      public static ResourceId matchNewDirectory(String singleResourceSpec, String... baseNames)
      Returns a new ResourceId that represents the named directory resource.
      Parameters:
      singleResourceSpec - the root directory, for example "/abc"
      baseNames - a list of named directory, for example ["d", "e", "f"]
      Returns:
      the ResourceId for the resolved directory. In same example as above, it corresponds to "/abc/d/e/f".