apache_beam.io.hadoopfilesystem module¶
FileSystem implementation for accessing
Hadoop Distributed File System files.
-
class
apache_beam.io.hadoopfilesystem.HadoopFileSystem(pipeline_options)[source]¶ Bases:
apache_beam.io.filesystem.FileSystemFileSystemimplementation that supports HDFS.URL arguments to methods expect strings starting with
hdfs://.Initializes a connection to HDFS.
Connection configuration is done by passing pipeline options. See
HadoopFileSystemOptions.-
join(base_url, *paths)[source]¶ Join two or more pathname components.
Parameters: - base_url – string path of the first component of the path. Must start with hdfs://.
- paths – path components to be added
Returns: Full url after combining all the passed components.
-
create(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: A Python File-like object.
-
open(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: A Python File-like object.
-
copy(source_file_names, destination_file_names)[source]¶ It is an error if any file to copy already exists at the destination.
Raises
BeamIOErrorif any error occurred.Parameters: - source_file_names – iterable of URLs.
- destination_file_names – iterable of URLs.
-
exists(url)[source]¶ Checks existence of url in HDFS.
Parameters: url – String in the form hdfs://… Returns: True if url exists as a file or directory in HDFS.
-