apache_beam.io.hadoopfilesystem module¶
FileSystem
implementation for accessing
Hadoop Distributed File System files.
-
class
apache_beam.io.hadoopfilesystem.
HadoopFileSystem
(pipeline_options)[source]¶ Bases:
apache_beam.io.filesystem.FileSystem
FileSystem
implementation that supports HDFS.URL arguments to methods expect strings starting with
hdfs://
.Uses client library
hdfs3.core.HDFileSystem
.Initializes a connection to HDFS.
Connection configuration is done using HDFS Configuration.
-
join
(base_url, *paths)[source]¶ Join two or more pathname components.
Parameters: - base_url – string path of the first component of the path. Must start with hdfs://.
- paths – path components to be added
Returns: Full url after combining all the passed components.
-
create
(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: An Python File-like object. Return type: hdfs3.core.HDFile
-
open
(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: An Python File-like object. Return type: hdfs3.core.HDFile
-
copy
(source_file_names, destination_file_names)[source]¶ Will overwrite files and directories in destination_file_names.
Raises
BeamIOError
if any error occurred.Parameters: - source_file_names – iterable of URLs.
- destination_file_names – iterable of URLs.
-