apache_beam.io.hadoopfilesystem module¶
FileSystem implementation for accessing
Hadoop Distributed File System files.
- 
class apache_beam.io.hadoopfilesystem.HadoopFileSystem(pipeline_options)[source]¶
- Bases: - apache_beam.io.filesystem.FileSystem- FileSystemimplementation that supports HDFS.- URL arguments to methods expect strings starting with - hdfs://.- Initializes a connection to HDFS. - Connection configuration is done by passing pipeline options. See - HadoopFileSystemOptions.- 
join(base_url, *paths)[source]¶
- Join two or more pathname components. - Parameters: - base_url – string path of the first component of the path. Must start with hdfs://.
- paths – path components to be added
 - Returns: - Full url after combining all the passed components. 
 - 
create(url, mime_type='application/octet-stream', compression_type='auto')[source]¶
- Returns: - A Python File-like object. 
 - 
open(url, mime_type='application/octet-stream', compression_type='auto')[source]¶
- Returns: - A Python File-like object. 
 - 
copy(source_file_names, destination_file_names)[source]¶
- It is an error if any file to copy already exists at the destination. - Raises - BeamIOErrorif any error occurred.- Parameters: - source_file_names – iterable of URLs.
- destination_file_names – iterable of URLs.
 
 - 
exists(url)[source]¶
- Checks existence of url in HDFS. - Parameters: - url – String in the form hdfs://… - Returns: - True if url exists as a file or directory in HDFS. 
 - 
size(url)[source]¶
- Fetches file size for a URL. - Returns: - int size of path according to the FileSystem. - Raises: - BeamIOError– if url doesn’t exist.
 - 
last_updated(url)[source]¶
- Fetches last updated time for a URL. - Parameters: - url – string url of file. - Returns: float UNIX Epoch time - Raises: - BeamIOError– if path doesn’t exist.
 - 
checksum(url)[source]¶
- Fetches a checksum description for a URL. - Returns: - String describing the checksum. - Raises: - BeamIOError– if url doesn’t exist.
 - 
metadata(url)[source]¶
- Fetch metadata fields of a file on the FileSystem. - Parameters: - url – string url of a file. - Returns: - FileMetadata.- Raises: - BeamIOError– if url doesn’t exist.
 
-