apache_beam.io.hadoopfilesystem module¶
FileSystem
implementation for accessing
Hadoop Distributed File System files.
-
class
apache_beam.io.hadoopfilesystem.
HadoopFileSystem
(pipeline_options)[source]¶ Bases:
apache_beam.io.filesystem.FileSystem
FileSystem
implementation that supports HDFS.URL arguments to methods expect strings starting with
hdfs://
.Initializes a connection to HDFS.
Connection configuration is done by passing pipeline options. See
HadoopFileSystemOptions
.-
join
(base_url, *paths)[source]¶ Join two or more pathname components.
Parameters: - base_url – string path of the first component of the path. Must start with hdfs://.
- paths – path components to be added
Returns: Full url after combining all the passed components.
-
create
(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: A Python File-like object.
-
open
(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: A Python File-like object.
-
copy
(source_file_names, destination_file_names)[source]¶ It is an error if any file to copy already exists at the destination.
Raises
BeamIOError
if any error occurred.Parameters: - source_file_names – iterable of URLs.
- destination_file_names – iterable of URLs.
-
exists
(url)[source]¶ Checks existence of url in HDFS.
Parameters: url – String in the form hdfs://… Returns: True if url exists as a file or directory in HDFS.
-