apache_beam.dataframe.io module¶
-
apache_beam.dataframe.io.
read_csv
(path, *args, splittable=False, **kwargs)[source]¶ Emulates pd.read_csv from Pandas, but as a Beam PTransform.
Use this as
df = p | beam.dataframe.io.read_csv(…)to get a deferred Beam dataframe representing the contents of the file.
If your files are large and records do not contain quoted newlines, you may pass the extra argument splittable=True to enable dynamic splitting for this read on newlines. Using this option for records that do contain quoted newlines may result in partial records and data corruption.
-
apache_beam.dataframe.io.
read_excel
(path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
read_feather
(path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
read_parquet
(path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
read_sas
(path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
read_spss
(path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
read_stata
(path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
to_excel
(df, path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
to_feather
(df, path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
to_parquet
(df, path, *args, **kwargs)¶
-
apache_beam.dataframe.io.
to_stata
(df, path, *args, **kwargs)¶