apache_beam.io.hadoopfilesystem module¶
FileSystem implementation for accessing
Hadoop Distributed File System files.
- 
class 
apache_beam.io.hadoopfilesystem.HadoopFileSystem(pipeline_options)[source]¶ Bases:
apache_beam.io.filesystem.FileSystemFileSystemimplementation that supports HDFS.URL arguments to methods expect strings starting with
hdfs://.Uses client library
hdfs3.core.HDFileSystem.Initializes a connection to HDFS.
Connection configuration is done using HDFS Configuration.
- 
join(base_url, *paths)[source]¶ Join two or more pathname components.
Parameters: - base_url – string path of the first component of the path. Must start with hdfs://.
 - paths – path components to be added
 
Returns: Full url after combining all the passed components.
- 
create(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: An Python File-like object. Return type: hdfs3.core.HDFile 
- 
open(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: An Python File-like object. Return type: hdfs3.core.HDFile 
- 
copy(source_file_names, destination_file_names)[source]¶ Will overwrite files and directories in destination_file_names.
Raises
BeamIOErrorif any error occurred.Parameters: - source_file_names – iterable of URLs.
 - destination_file_names – iterable of URLs.
 
-