apache_beam.dataframe.frames module¶
Analogs for pandas.DataFrame
and pandas.Series
:
DeferredDataFrame
and DeferredSeries
.
These classes are effectively wrappers around a schema-aware
PCollection
that provide a set of operations
compatible with the pandas API.
Note that we aim for the Beam DataFrame API to be completely compatible with the pandas API, but there are some features that are currently unimplemented for various reasons. Pay particular attention to the ‘Differences from pandas’ section for each operation to understand where we diverge.
-
class
apache_beam.dataframe.frames.
DeferredSeries
(expr)[source]¶ Bases:
apache_beam.dataframe.frames.DeferredDataFrameOrSeries
-
name
¶
-
dtype
¶
-
dtypes
¶
-
align
(other, join, axis, level, method, **kwargs)[source]¶ Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
Parameters: - other (DeferredDataFrame or DeferredSeries) –
- join ({'outer', 'inner', 'left', 'right'}, default 'outer') –
- axis (allowed axis of the other object, default None) – Align on index (0), columns (1), or both (None).
- level (int or level name, default None) – Broadcast across a level, matching Index values on the passed MultiIndex level.
- copy (bool, default True) – Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value (scalar, default np.NaN) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.
- method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) –
Method to use for filling holes in reindexed DeferredSeries:
- pad / ffill: propagate last valid observation forward to next valid.
- backfill / bfill: use NEXT valid observation to fill gap.
- limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
- fill_axis ({0 or 'index'}, default 0) – Filling axis, method and limit.
- broadcast_axis ({0 or 'index'}, default None) – Broadcast values along this axis, if aligning two objects of different dimensions.
Returns: (left, right) – Aligned objects.
Return type: (DeferredSeries, type of other)
Differences from pandas
Aligning per-level is not yet supported. Only the default,
level=None
, is allowed.Filling NaN values via
method
is not supported, because it is sensitive to the order of the data (see https://s.apache.org/dataframe-order-sensitive-operations). Only the default,method=None
, is allowed.
-
array
¶ pandas.Series.array is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
ravel
(**kwargs)¶ pandas.Series.ravel is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
rename
(**kwargs)¶ Alter Series index labels or name.
Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.
Alternatively, change
Series.name
with a scalar value.See the user guide for more.
Parameters: - axis ({0 or "index"}) – Unused. Accepted for compatibility with DeferredDataFrame method only.
- index (scalar, hashable sequence, dict-like or function, optional) – Functions or dict-like are transformations to apply to
the index.
Scalar or hashable sequence-like will alter the
DeferredSeries.name
attribute. - **kwargs – Additional keyword arguments passed to the function. Only the “inplace” keyword is used.
Returns: DeferredSeries with index labels or name altered or None if
inplace=True
.Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.rename()
- Corresponding DeferredDataFrame method.
DeferredSeries.rename_axis()
- Set the name of the axis.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2, 3]) >>> s 0 1 1 2 2 3 dtype: int64 >>> s.rename("my_name") # scalar, changes Series.name 0 1 1 2 2 3 Name: my_name, dtype: int64 >>> s.rename(lambda x: x ** 2) # function, changes labels 0 1 1 2 4 3 dtype: int64 >>> s.rename({1: 3, 2: 5}) # mapping, changes labels 0 1 3 2 5 3 dtype: int64
-
between
(**kwargs)¶ Return boolean Series equivalent to left <= series <= right.
This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.
Parameters: - left (scalar or list-like) – Left boundary.
- right (scalar or list-like) – Right boundary.
- inclusive (bool, default True) – Include boundaries.
Returns: DeferredSeries representing whether each element is between left and right (inclusive).
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.gt()
- Greater than of series and other.
DeferredSeries.lt()
- Less than of series and other.
Notes
This function is equivalent to
(left <= ser) & (ser <= right)
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([2, 0, 4, 8, np.nan]) Boundary values are included by default: >>> s.between(1, 4) 0 True 1 False 2 True 3 False 4 False dtype: bool With `inclusive` set to ``False`` boundary values are excluded: >>> s.between(1, 4, inclusive=False) 0 True 1 False 2 False 3 False 4 False dtype: bool `left` and `right` can be any scalar value: >>> s = pd.Series(['Alice', 'Bob', 'Carol', 'Eve']) >>> s.between('Anna', 'Daniel') 0 False 1 True 2 True 3 False dtype: bool
-
add_suffix
(**kwargs)¶ Suffix labels with string suffix.
For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.
Parameters: suffix (str) – The string to add after each label. Returns: New DeferredSeries or DeferredDataFrame with updated labels. Return type: DeferredSeries or DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.add_prefix()
- Prefix row labels with string prefix.
DeferredDataFrame.add_prefix()
- Prefix column labels with string prefix.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2, 3, 4]) >>> s 0 1 1 2 2 3 3 4 dtype: int64 >>> s.add_suffix('_item') 0_item 1 1_item 2 2_item 3 3_item 4 dtype: int64 >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}) >>> df A B 0 1 3 1 2 4 2 3 5 3 4 6 >>> df.add_suffix('_col') A_col B_col 0 1 3 1 2 4 2 3 5 3 4 6
-
add_prefix
(**kwargs)¶ Prefix labels with string prefix.
For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.
Parameters: prefix (str) – The string to add before each label. Returns: New DeferredSeries or DeferredDataFrame with updated labels. Return type: DeferredSeries or DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.add_suffix()
- Suffix row labels with string suffix.
DeferredDataFrame.add_suffix()
- Suffix column labels with string suffix.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2, 3, 4]) >>> s 0 1 1 2 2 3 3 4 dtype: int64 >>> s.add_prefix('item_') item_0 1 item_1 2 item_2 3 item_3 4 dtype: int64 >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}) >>> df A B 0 1 3 1 2 4 2 3 5 3 4 6 >>> df.add_prefix('col_') col_A col_B 0 1 3 1 2 4 2 3 5 3 4 6
-
std
(*args, **kwargs)[source]¶ Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters: - axis ({index (0)}) –
- skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.
- ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for DeferredSeries.
Returns: Return type: scalar or DeferredSeries (if level specified)
Differences from pandas
This operation has no known divergences from the pandas API.
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
-
var
(axis, skipna, level, ddof, **kwargs)[source]¶ Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters: - axis ({index (0)}) –
- skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- level (int or level name, default None) – If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.
- ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_only (bool, default None) – Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for DeferredSeries.
Returns: Return type: scalar or DeferredSeries (if level specified)
Differences from pandas
Per-level aggregation is not yet supported (BEAM-11777). Only the default,
level=None
, is allowed.Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
-
isnull
(**kwargs)¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).Returns: Mask of bool values for each element in DeferredSeries that indicates whether an element is an NA value. Return type: DeferredSeries Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.isnull()
- Alias of isna.
DeferredSeries.notna()
- Boolean inverse of isna.
DeferredSeries.dropna()
- Omit axes labels with missing values.
isna()
- Top-level isna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False Show which entries in a Series are NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.isna() 0 False 1 False 2 True dtype: bool
-
isna
(**kwargs)¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).Returns: Mask of bool values for each element in DeferredSeries that indicates whether an element is an NA value. Return type: DeferredSeries Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.isnull()
- Alias of isna.
DeferredSeries.notna()
- Boolean inverse of isna.
DeferredSeries.dropna()
- Omit axes labels with missing values.
isna()
- Top-level isna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False Show which entries in a Series are NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.isna() 0 False 1 False 2 True dtype: bool
-
notnull
(**kwargs)¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.Returns: Mask of bool values for each element in DeferredSeries that indicates whether an element is not an NA value. Return type: DeferredSeries Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.notnull()
- Alias of notna.
DeferredSeries.isna()
- Boolean inverse of notna.
DeferredSeries.dropna()
- Omit axes labels with missing values.
notna()
- Top-level notna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are not NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.notna() age born name toy 0 True False True False 1 True True True True 2 False True True True Show which entries in a Series are not NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.notna() 0 True 1 True 2 False dtype: bool
-
notna
(**kwargs)¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.Returns: Mask of bool values for each element in DeferredSeries that indicates whether an element is not an NA value. Return type: DeferredSeries Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.notnull()
- Alias of notna.
DeferredSeries.isna()
- Boolean inverse of notna.
DeferredSeries.dropna()
- Omit axes labels with missing values.
notna()
- Top-level notna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are not NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.notna() age born name toy 0 True False True False 1 True True True True 2 False True True True Show which entries in a Series are not NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.notna() 0 True 1 True 2 False dtype: bool
-
items
(**kwargs)¶ pandas.Series.items is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
iteritems
(**kwargs)¶ pandas.Series.iteritems is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
tolist
(**kwargs)¶ pandas.Series.tolist is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_numpy
(**kwargs)¶ pandas.Series.to_numpy is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_string
(**kwargs)¶ pandas.Series.to_string is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
agg
(func, axis=0, *args, **kwargs)¶
-
axes
¶
-
clip
(**kwargs)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis.
Parameters: - lower (float or array_like, default None) – Minimum threshold value. All values below this threshold will be set to it.
- upper (float or array_like, default None) – Maximum threshold value. All values above this threshold will be set to it.
- axis (int or str axis name, optional) – Align object with lower and upper along the given axis.
- inplace (bool, default False) – Whether to perform the operation in place on the data.
- **kwargs (*args,) –
Additional keywords have no effect but might be accepted for compatibility with numpy.
Returns: Same type as calling object with the values outside the clip boundaries replaced or None if
inplace=True
.Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.clip()
- Trim values at input threshold in series.
DeferredDataFrame.clip()
- Trim values at input threshold in dataframe.
numpy.clip()
- Clip (limit) the values in an array.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]} >>> df = pd.DataFrame(data) >>> df col_0 col_1 0 9 -2 1 -3 -7 2 0 6 3 -1 8 4 5 -5 Clips per column using lower and upper thresholds: >>> df.clip(-4, 6) col_0 col_1 0 6 -2 1 -3 -4 2 0 6 3 -1 6 4 5 -4 Clips using specific lower and upper thresholds per column element: >>> t = pd.Series([2, -4, -1, 6, 3]) >>> t 0 2 1 -4 2 -1 3 6 4 3 dtype: int64 >>> df.clip(t, t + 4, axis=0) col_0 col_1 0 6 2 1 -3 -4 2 0 3 3 6 8 4 5 3
-
all
(*args, **kwargs)¶
-
any
(*args, **kwargs)¶
-
count
(*args, **kwargs)¶
-
min
(*args, **kwargs)¶
-
max
(*args, **kwargs)¶
-
prod
(*args, **kwargs)¶
-
product
(*args, **kwargs)¶
-
sum
(*args, **kwargs)¶
-
mean
(*args, **kwargs)¶
-
median
(*args, **kwargs)¶
-
argmax
(**kwargs)¶ pandas.Series.argmax is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
argmin
(**kwargs)¶ pandas.Series.argmin is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cummax
(**kwargs)¶ pandas.Series.cummax is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cummin
(**kwargs)¶ pandas.Series.cummin is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cumprod
(**kwargs)¶ pandas.Series.cumprod is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cumsum
(**kwargs)¶ pandas.Series.cumsum is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
diff
(**kwargs)¶ pandas.Series.diff is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
first
(**kwargs)¶ pandas.Series.first is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
head
(**kwargs)¶ pandas.Series.head is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
interpolate
(**kwargs)¶ pandas.Series.interpolate is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
last
(**kwargs)¶ pandas.Series.last is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
searchsorted
(**kwargs)¶ pandas.Series.searchsorted is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
shift
(**kwargs)¶ pandas.Series.shift is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
tail
(**kwargs)¶ pandas.Series.tail is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
filter
(**kwargs)¶ Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
Parameters: - items (list-like) – Keep labels from axis which are in items.
- like (str) – Keep labels from axis for which “like in label == True”.
- regex (str (regular expression)) – Keep labels from axis for which re.search(regex, label) == True.
- axis ({0 or ‘index’, 1 or ‘columns’, None}, default None) – The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, ‘index’ for DeferredSeries, ‘columns’ for DeferredDataFrame.
Returns: Return type: same type as input object
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.loc()
- Access a group of rows and columns by label(s) or a boolean array.
Notes
The
items
,like
, andregex
parameters are enforced to be mutually exclusive.axis
defaults to the info axis that is used when indexing with[]
.Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])), ... index=['mouse', 'rabbit'], ... columns=['one', 'two', 'three']) >>> df one two three mouse 1 2 3 rabbit 4 5 6 >>> # select columns by name >>> df.filter(items=['one', 'three']) one three mouse 1 3 rabbit 4 6 >>> # select columns by regular expression >>> df.filter(regex='e$', axis=1) one three mouse 1 3 rabbit 4 6 >>> # select rows containing 'bbi' >>> df.filter(like='bbi', axis=0) one two three rabbit 4 5 6
-
memory_usage
(**kwargs)¶ pandas.Series.memory_usage is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
is_unique
¶
-
plot
(**kwargs)¶ pandas.Series.plot is not supported in the Beam DataFrame API because it is a plotting tool.
For more information see {reason_data[‘url’]}.
-
pop
(**kwargs)¶ pandas.Series.pop is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
rename_axis
(**kwargs)¶ Set the name of the axis for the index or columns.
Parameters: - mapper (scalar, list-like, optional) – Value to set the axis name attribute.
- columns (index,) –
A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the
columns
parameter is not allowed if the object is a DeferredSeries. This parameter only apply for DeferredDataFrame type objects.Use either
mapper
andaxis
to specify the axis to target withmapper
, orindex
and/orcolumns
.Changed in version 0.24.0.
- axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to rename.
- copy (bool, default True) – Also copy underlying data.
- inplace (bool, default False) – Modifies the object directly, instead of creating a new DeferredSeries or DeferredDataFrame.
Returns: The same type as the caller or None if
inplace=True
.Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.rename()
- Alter DeferredSeries index labels or name.
DeferredDataFrame.rename()
- Alter DeferredDataFrame index labels or name.
Index.rename()
- Set new names on index.
Notes
DeferredDataFrame.rename_axis
supports two calling conventions(index=index_mapper, columns=columns_mapper, ...)
(mapper, axis={'index', 'columns'}, ...)
The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter
copy
is ignored.The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.
We highly recommend using keyword arguments to clarify your intent.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
**Series** >>> s = pd.Series(["dog", "cat", "monkey"]) >>> s 0 dog 1 cat 2 monkey dtype: object >>> s.rename_axis("animal") animal 0 dog 1 cat 2 monkey dtype: object **DataFrame** >>> df = pd.DataFrame({"num_legs": [4, 4, 2], ... "num_arms": [0, 0, 2]}, ... ["dog", "cat", "monkey"]) >>> df num_legs num_arms dog 4 0 cat 4 0 monkey 2 2 >>> df = df.rename_axis("animal") >>> df num_legs num_arms animal dog 4 0 cat 4 0 monkey 2 2 >>> df = df.rename_axis("limbs", axis="columns") >>> df limbs num_legs num_arms animal dog 4 0 cat 4 0 monkey 2 2 **MultiIndex** >>> df.index = pd.MultiIndex.from_product([['mammal'], ... ['dog', 'cat', 'monkey']], ... names=['type', 'name']) >>> df limbs num_legs num_arms type name mammal dog 4 0 cat 4 0 monkey 2 2 >>> df.rename_axis(index={'type': 'class'}) limbs num_legs num_arms class name mammal dog 4 0 cat 4 0 monkey 2 2 >>> df.rename_axis(columns=str.upper) LIMBS num_legs num_arms type name mammal dog 4 0 cat 4 0 monkey 2 2
-
round
(**kwargs)¶ Round each value in a Series to the given number of decimals.
Parameters: - decimals (int, default 0) – Number of decimal places to round to. If decimals is negative, it specifies the number of positions to the left of the decimal point.
- **kwargs (*args,) –
Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.
Returns: Rounded values of the DeferredSeries.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
numpy.around()
- Round values of an np.array.
DeferredDataFrame.round()
- Round values of a DeferredDataFrame.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([0.1, 1.3, 2.7]) >>> s.round() 0 0.0 1 1.0 2 3.0 dtype: float64
-
take
(**kwargs)¶ pandas.Series.take is not supported in the Beam DataFrame API because it is deprecated in pandas.
-
to_dict
(**kwargs)¶ pandas.Series.to_dict is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_frame
(**kwargs)¶ Convert Series to DataFrame.
Parameters: name (object, default None) – The passed name should substitute for the series name (if it has one). Returns: DeferredDataFrame representation of DeferredSeries. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series(["a", "b", "c"], ... name="vals") >>> s.to_frame() vals 0 a 1 b 2 c
-
unstack
(**kwargs)¶ pandas.Series.unstack is not supported in the Beam DataFrame API because the columns in the output DataFrame depend on the data.
For more information see {reason_data[‘url’]}.
-
values
¶ pandas.Series.values is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
view
(**kwargs)¶ pandas.Series.view is not supported in the Beam DataFrame API because it relies on memory-sharing semantics that are not compatible with the Beam model.
-
str
¶
-
apply
(**kwargs)¶ Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
Parameters: - func (function) – Python function or NumPy ufunc to apply.
- convert_dtype (bool, default True) – Try to find better dtype for elementwise function results. If False, leave as dtype=object.
- args (tuple) – Positional arguments passed to func after the series value.
- **kwds – Additional keyword arguments passed to func.
Returns: If func returns a DeferredSeries object the result will be a DeferredDataFrame.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.map()
- For element-wise operations.
DeferredSeries.agg()
- Only perform aggregating type operations.
DeferredSeries.transform()
- Only perform transforming type operations.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Create a series with typical summer temperatures for each city. >>> s = pd.Series([20, 21, 12], ... index=['London', 'New York', 'Helsinki']) >>> s London 20 New York 21 Helsinki 12 dtype: int64 Square the values by defining a function and passing it as an argument to ``apply()``. >>> def square(x): ... return x ** 2 >>> s.apply(square) London 400 New York 441 Helsinki 144 dtype: int64 Square the values by passing an anonymous function as an argument to ``apply()``. >>> s.apply(lambda x: x ** 2) London 400 New York 441 Helsinki 144 dtype: int64 Define a custom function that needs additional positional arguments and pass these additional arguments using the ``args`` keyword. >>> def subtract_custom_value(x, custom_value): ... return x - custom_value >>> s.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64 Define a custom function that takes keyword arguments and pass these arguments to ``apply``. >>> def add_custom_values(x, **kwargs): ... for month in kwargs: ... x += kwargs[month] ... return x >>> s.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64 Use a function from the Numpy library. >>> s.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64
-
map
(**kwargs)¶ Map values of Series according to input correspondence.
Used for substituting each value in a Series with another value, that may be derived from a function, a
dict
or aSeries
.Parameters: - arg (function, collections.abc.Mapping subclass or DeferredSeries) – Mapping correspondence.
- na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence.
Returns: Same index as caller.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.apply()
- For applying more complex functions on a DeferredSeries.
DeferredDataFrame.apply()
- Apply a function row-/column-wise.
DeferredDataFrame.applymap()
- Apply a function elementwise on a whole DeferredDataFrame.
Notes
When
arg
is a dictionary, values in DeferredSeries that are not in the dictionary (as keys) are converted toNaN
. However, if the dictionary is adict
subclass that defines__missing__
(i.e. provides a method for default values), then this default is used rather thanNaN
.Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series(['cat', 'dog', np.nan, 'rabbit']) >>> s 0 cat 1 dog 2 NaN 3 rabbit dtype: object ``map`` accepts a ``dict`` or a ``Series``. Values that are not found in the ``dict`` are converted to ``NaN``, unless the dict has a default value (e.g. ``defaultdict``): >>> s.map({'cat': 'kitten', 'dog': 'puppy'}) 0 kitten 1 puppy 2 NaN 3 NaN dtype: object It also accepts a function: >>> s.map('I am a {}'.format) 0 I am a cat 1 I am a dog 2 I am a nan 3 I am a rabbit dtype: object To avoid applying the function to missing values (and keep them as ``NaN``) ``na_action='ignore'`` can be used: >>> s.map('I am a {}'.format, na_action='ignore') 0 I am a cat 1 I am a dog 2 NaN 3 I am a rabbit dtype: object
-
T
¶
-
abs
(**kwargs)¶ Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
Returns: DeferredSeries/DeferredDataFrame containing the absolute value of each element. Return type: abs Differences from pandas
This operation has no known divergences from the pandas API.
See also
numpy.absolute()
- Calculate the absolute value element-wise.
Notes
For
complex
inputs,1.2 + 1j
, the absolute value is \(\sqrt{ a^2 + b^2 }\).Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Absolute numeric values in a Series. >>> s = pd.Series([-1.10, 2, -3.33, 4]) >>> s.abs() 0 1.10 1 2.00 2 3.33 3 4.00 dtype: float64 Absolute numeric values in a Series with complex numbers. >>> s = pd.Series([1.2 + 1j]) >>> s.abs() 0 1.56205 dtype: float64 Absolute numeric values in a Series with a Timedelta element. >>> s = pd.Series([pd.Timedelta('1 days')]) >>> s.abs() 0 1 days dtype: timedelta64[ns] Select rows with data closest to certain value using argsort (from `StackOverflow <https://stackoverflow.com/a/17758115>`__). >>> df = pd.DataFrame({ ... 'a': [4, 5, 6, 7], ... 'b': [10, 20, 30, 40], ... 'c': [100, 50, -30, -50] ... }) >>> df a b c 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 >>> df.loc[(df.c - 43).abs().argsort()] a b c 1 5 20 50 0 4 10 100 2 6 30 -30 3 7 40 -50
-
add
(**kwargs)¶
-
argsort
(**kwargs)¶
-
asfreq
(**kwargs)¶
-
asof
(**kwargs)¶
-
astype
(**kwargs)¶ Cast a pandas object to a specified dtype
dtype
.Parameters: - dtype (data type, or dict of column name -> data type) – Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DeferredDataFrame’s columns to column-specific types.
- copy (bool, default True) – Return a copy when
copy=True
(be very careful settingcopy=False
as changes to values then may propagate to other pandas objects). - errors ({'raise', 'ignore'}, default 'raise') –
Control raising of exceptions on invalid data for provided dtype.
raise
: allow exceptions to be raisedignore
: suppress exceptions. On error return original object.
Returns: casted
Return type: same type as caller
Differences from pandas
This operation has no known divergences from the pandas API.
See also
to_datetime()
- Convert argument to datetime.
to_timedelta()
- Convert argument to timedelta.
to_numeric()
- Convert argument to a numeric type.
numpy.ndarray.astype()
- Cast a numpy array to a specified type.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Create a DataFrame: >>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df.dtypes col1 int64 col2 int64 dtype: object Cast all columns to int32: >>> df.astype('int32').dtypes col1 int32 col2 int32 dtype: object Cast col1 to int32 using a dictionary: >>> df.astype({'col1': 'int32'}).dtypes col1 int32 col2 int64 dtype: object Create a series: >>> ser = pd.Series([1, 2], dtype='int32') >>> ser 0 1 1 2 dtype: int32 >>> ser.astype('int64') 0 1 1 2 dtype: int64 Convert to categorical type: >>> ser.astype('category') 0 1 1 2 dtype: category Categories (2, int64): [1, 2] Convert to ordered categorical type with custom ordering: >>> cat_dtype = pd.api.types.CategoricalDtype( ... categories=[2, 1], ordered=True) >>> ser.astype(cat_dtype) 0 1 1 2 dtype: category Categories (2, int64): [2 < 1] Note that using ``copy=False`` and changing data on a new pandas object may propagate changes: >>> s1 = pd.Series([1, 2]) >>> s2 = s1.astype('int64', copy=False) >>> s2[0] = 10 >>> s1 # note that s1[0] has changed too 0 10 1 2 dtype: int64 Create a series of dates: >>> ser_date = pd.Series(pd.date_range('20200101', periods=3)) >>> ser_date 0 2020-01-01 1 2020-01-02 2 2020-01-03 dtype: datetime64[ns] Datetimes are localized to UTC first before converting to the specified timezone: >>> ser_date.astype('datetime64[ns, US/Eastern]') 0 2019-12-31 19:00:00-05:00 1 2020-01-01 19:00:00-05:00 2 2020-01-02 19:00:00-05:00 dtype: datetime64[ns, US/Eastern]
-
at
¶
-
at_time
(**kwargs)¶
-
attrs
¶ pandas.DataFrame.attrs is not supported in the Beam DataFrame API because it is experimental in pandas.
-
autocorr
(**kwargs)¶
-
backfill
(**kwargs)¶
-
between_time
(**kwargs)¶
-
bfill
(**kwargs)¶
-
bool
()¶
-
cat
¶
-
combine
(**kwargs)¶
-
combine_first
(**kwargs)¶
-
compare
(**kwargs)¶
-
convert_dtypes
(**kwargs)¶
-
copy
(**kwargs)¶ Make a copy of this object’s indices and data.
When
deep=True
(default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).When
deep=False
, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).Parameters: deep (bool, default True) – Make a deep copy, including a copy of the data and the indices. With deep=False
neither the indices nor the data are copied.Returns: copy – Object type matches caller. Return type: DeferredSeries or DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
Notes
When
deep=True
, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).While
Index
objects are copied whendeep=True
, the underlying numpy array is not copied for performance reasons. SinceIndex
is immutable, the underlying data can be safely shared and a copy is not needed.Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2], index=["a", "b"]) >>> s a 1 b 2 dtype: int64 >>> s_copy = s.copy() >>> s_copy a 1 b 2 dtype: int64 **Shallow copy versus default (deep) copy:** >>> s = pd.Series([1, 2], index=["a", "b"]) >>> deep = s.copy() >>> shallow = s.copy(deep=False) Shallow copy shares data and index with original. >>> s is shallow False >>> s.values is shallow.values and s.index is shallow.index True Deep copy has own copy of data and index. >>> s is deep False >>> s.values is deep.values or s.index is deep.index False Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged. >>> s[0] = 3 >>> shallow[1] = 4 >>> s a 3 b 4 dtype: int64 >>> shallow a 3 b 4 dtype: int64 >>> deep a 1 b 2 dtype: int64 Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy. >>> s = pd.Series([[1, 2], [3, 4]]) >>> deep = s.copy() >>> s[0][0] = 10 >>> s 0 [10, 2] 1 [3, 4] dtype: object >>> deep 0 [10, 2] 1 [3, 4] dtype: object
-
describe
(**kwargs)¶
-
div
(**kwargs)¶
-
divide
(**kwargs)¶
-
divmod
(**kwargs)¶
-
drop
(labels, axis, index, columns, errors, **kwargs)¶
-
drop_duplicates
(**kwargs)¶
-
droplevel
(level, axis)¶
-
dt
¶
-
duplicated
(**kwargs)¶
-
empty
¶
-
eq
(**kwargs)¶ Return Equal to of series and other, element-wise (binary operator eq).
Equivalent to
series == other
, but with support to substitute a fill_value for missing data in either one of the inputs.Parameters: - other (DeferredSeries or scalar value) –
- fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful DeferredSeries alignment, with this value before computation. If data in both corresponding DeferredSeries locations is missing the result of filling (at that location) will be missing.
- level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: The result of the operation.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) >>> a a 1.0 b 1.0 c 1.0 d NaN dtype: float64 >>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e']) >>> b a 1.0 b NaN d 1.0 e NaN dtype: float64 >>> a.eq(b, fill_value=0) a True b False c False d False e False dtype: bool
-
equals
(other)¶
-
ewm
(**kwargs)¶
-
expanding
(**kwargs)¶
-
explode
(**kwargs)¶
-
factorize
(**kwargs)¶
-
ffill
(**kwargs)¶
-
fillna
(value, method, axis, limit, **kwargs)¶
-
first_valid_index
(**kwargs)¶
-
flags
¶
-
floordiv
(**kwargs)¶
-
ge
(**kwargs)¶ Return Greater than or equal to of series and other, element-wise (binary operator ge).
Equivalent to
series >= other
, but with support to substitute a fill_value for missing data in either one of the inputs.Parameters: - other (DeferredSeries or scalar value) –
- fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful DeferredSeries alignment, with this value before computation. If data in both corresponding DeferredSeries locations is missing the result of filling (at that location) will be missing.
- level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: The result of the operation.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e']) >>> a a 1.0 b 1.0 c 1.0 d NaN e 1.0 dtype: float64 >>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f']) >>> b a 0.0 b 1.0 c 2.0 d NaN f 1.0 dtype: float64 >>> a.ge(b, fill_value=0) a True b True c False d False e True f False dtype: bool
-
get
(**kwargs)¶
-
groupby
(by, level, axis, as_index, group_keys, **kwargs)¶
-
gt
(**kwargs)¶ Return Greater than of series and other, element-wise (binary operator gt).
Equivalent to
series > other
, but with support to substitute a fill_value for missing data in either one of the inputs.Parameters: - other (DeferredSeries or scalar value) –
- fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful DeferredSeries alignment, with this value before computation. If data in both corresponding DeferredSeries locations is missing the result of filling (at that location) will be missing.
- level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: The result of the operation.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e']) >>> a a 1.0 b 1.0 c 1.0 d NaN e 1.0 dtype: float64 >>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f']) >>> b a 0.0 b 1.0 c 2.0 d NaN f 1.0 dtype: float64 >>> a.gt(b, fill_value=0) a True b False c False d False e True f False dtype: bool
-
hasnans
¶
-
hist
(**kwargs)¶ pandas.DataFrame.hist is not supported in the Beam DataFrame API because it is a plotting tool.
For more information see {reason_data[‘url’]}.
-
iat
¶
-
idxmax
(**kwargs)¶
-
idxmin
(**kwargs)¶
-
iloc
¶
-
index
¶
-
infer_objects
(**kwargs)¶
-
is_monotonic
¶
-
is_monotonic_decreasing
¶
-
is_monotonic_increasing
¶
-
isin
(**kwargs)¶ Whether each element in the DataFrame is contained in values.
Parameters: values (iterable, DeferredSeries, DeferredDataFrame or dict) – The result will only be true at a location if all the labels match. If values is a DeferredSeries, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DeferredDataFrame, then both the index and column labels must match. Returns: DeferredDataFrame of booleans showing whether each element in the DeferredDataFrame is contained in values. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Equality test for DeferredDataFrame.
DeferredSeries.isin()
- Equivalent method on DeferredSeries.
DeferredSeries.str.contains()
- Test if pattern or regex is contained within a string of a DeferredSeries or Index.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]}, ... index=['falcon', 'dog']) >>> df num_legs num_wings falcon 2 2 dog 4 0 When ``values`` is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings) >>> df.isin([0, 2]) num_legs num_wings falcon True True dog False True When ``values`` is a dict, we can pass values to check for each column separately: >>> df.isin({'num_wings': [0, 3]}) num_legs num_wings falcon False False dog False True When ``values`` is a Series or DataFrame the index and column must match. Note that 'falcon' does not match based on the number of legs in df2. >>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]}, ... index=['spider', 'falcon']) >>> df.isin(other) num_legs num_wings falcon True True dog False False
-
item
(**kwargs)¶
-
kurt
(**kwargs)¶
-
kurtosis
(**kwargs)¶
-
last_valid_index
(**kwargs)¶
-
le
(**kwargs)¶ Return Less than or equal to of series and other, element-wise (binary operator le).
Equivalent to
series <= other
, but with support to substitute a fill_value for missing data in either one of the inputs.Parameters: - other (DeferredSeries or scalar value) –
- fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful DeferredSeries alignment, with this value before computation. If data in both corresponding DeferredSeries locations is missing the result of filling (at that location) will be missing.
- level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: The result of the operation.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e']) >>> a a 1.0 b 1.0 c 1.0 d NaN e 1.0 dtype: float64 >>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f']) >>> b a 0.0 b 1.0 c 2.0 d NaN f 1.0 dtype: float64 >>> a.le(b, fill_value=0) a False b True c True d False e False f True dtype: bool
-
loc
¶
-
lt
(**kwargs)¶ Return Less than of series and other, element-wise (binary operator lt).
Equivalent to
series < other
, but with support to substitute a fill_value for missing data in either one of the inputs.Parameters: - other (DeferredSeries or scalar value) –
- fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful DeferredSeries alignment, with this value before computation. If data in both corresponding DeferredSeries locations is missing the result of filling (at that location) will be missing.
- level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: The result of the operation.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e']) >>> a a 1.0 b 1.0 c 1.0 d NaN e 1.0 dtype: float64 >>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f']) >>> b a 0.0 b 1.0 c 2.0 d NaN f 1.0 dtype: float64 >>> a.lt(b, fill_value=0) a False b False c True d False e False f True dtype: bool
-
mad
(**kwargs)¶
-
mask
(cond, **kwargs)¶
-
mod
(**kwargs)¶
-
mode
(**kwargs)¶
-
mul
(**kwargs)¶
-
multiply
(**kwargs)¶
-
nbytes
¶
-
ndim
¶
-
ne
(**kwargs)¶ Return Not equal to of series and other, element-wise (binary operator ne).
Equivalent to
series != other
, but with support to substitute a fill_value for missing data in either one of the inputs.Parameters: - other (DeferredSeries or scalar value) –
- fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful DeferredSeries alignment, with this value before computation. If data in both corresponding DeferredSeries locations is missing the result of filling (at that location) will be missing.
- level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: The result of the operation.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) >>> a a 1.0 b 1.0 c 1.0 d NaN dtype: float64 >>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e']) >>> b a 1.0 b NaN d 1.0 e NaN dtype: float64 >>> a.ne(b, fill_value=0) a False b True c True d True e True dtype: bool
-
nunique
(**kwargs)¶
-
pad
(**kwargs)¶
-
pct_change
(**kwargs)¶
-
pipe
(**kwargs)¶
-
pow
(**kwargs)¶
-
quantile
(**kwargs)¶
-
radd
(**kwargs)¶
-
rank
(**kwargs)¶
-
rdiv
(**kwargs)¶
-
rdivmod
(**kwargs)¶
-
reindex
(**kwargs)¶
-
reindex_like
(**kwargs)¶
-
reorder_levels
(**kwargs)¶ Rearrange index levels using input order. May not drop or duplicate levels.
Parameters: - order (list of int or list of str) – List representing new level order. Reference level by number (position) or by key (label).
- axis ({0 or 'index', 1 or 'columns'}, default 0) – Where to reorder levels.
Returns: Return type: Differences from pandas
This operation has no known divergences from the pandas API.
-
repeat
(**kwargs)¶
-
resample
(**kwargs)¶
-
reset_index
(**kwargs)¶
-
rfloordiv
(**kwargs)¶
-
rmod
(**kwargs)¶
-
rmul
(**kwargs)¶
-
rolling
(**kwargs)¶
-
rpow
(**kwargs)¶
-
rsub
(**kwargs)¶
-
rtruediv
(**kwargs)¶
-
sample
(**kwargs)¶
-
sem
(**kwargs)¶
-
set_axis
(**kwargs)¶
-
set_flags
(**kwargs)¶
-
shape
¶
-
size
¶
-
skew
(**kwargs)¶
-
slice_shift
(**kwargs)¶
-
sort_index
(axis, **kwargs)¶ Sort object by labels (along an axis).
Returns a new DataFrame sorted by label if inplace argument is
False
, otherwise updates the original DataFrame and returns None.Parameters: - axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
- level (int or level name or list of ints or list of level names) – If not None, sort on values in specified index level(s).
- ascending (bool or list-like of bools, default True) – Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
- inplace (bool, default False) – If True, perform operation in-place.
- kind ({'quicksort', 'mergesort', 'heapsort'}, default 'quicksort') – Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DeferredDataFrames, this option is only applied when sorting on a single column or label.
- na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
- sort_remaining (bool, default True) – If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
- ignore_index (bool, default False) –
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
- key (callable, optional) –
If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect anIndex
and return anIndex
of the same shape. For MultiIndex inputs, the key is applied per level.New in version 1.1.0.
Returns: The original DeferredDataFrame sorted by the labels or None if
inplace=True
.Return type: Differences from pandas
axis=index
is not allowed because it imposes an ordering on the dataset, and we cannot guarantee it will be maintained (see https://s.apache.org/dataframe-order-sensitive-operations). Onlyaxis=columns
is allowed.See also
DeferredSeries.sort_index()
- Sort DeferredSeries by the index.
DeferredDataFrame.sort_values()
- Sort DeferredDataFrame by the value.
DeferredSeries.sort_values()
- Sort DeferredSeries by the value.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API. In addition, some arguments shown here may not be supported, see ‘Differences from pandas’ for details.
>>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], ... columns=['A']) >>> df.sort_index() A 1 4 29 2 100 1 150 5 234 3 By default, it sorts in ascending order, to sort in descending order, use ``ascending=False`` >>> df.sort_index(ascending=False) A 234 3 150 5 100 1 29 2 1 4 A key function can be specified which is applied to the index before sorting. For a ``MultiIndex`` this is applied to each level separately. >>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd']) >>> df.sort_index(key=lambda x: x.str.lower()) a A 1 b 2 C 3 d 4
-
sort_values
(axis, **kwargs)¶ sort_values
is not implemented.It is not implemented for
axis=index
because it imposes an ordering on the dataset, and we cannot guarantee it will be maintained (see https://s.apache.org/dataframe-order-sensitive-operations).It is not implemented for
axis=columns
because it makes the order of the columns depend on the data (see https://s.apache.org/dataframe-non-deferred-column-names).
-
sparse
¶
-
squeeze
(**kwargs)¶
-
sub
(**kwargs)¶
-
subtract
(**kwargs)¶
-
swapaxes
(**kwargs)¶
-
swaplevel
(**kwargs)¶
-
to_clipboard
(**kwargs)¶
-
to_csv
(path, *args, **kwargs)¶
-
to_excel
(path, *args, **kwargs)¶
-
to_feather
(path, *args, **kwargs)¶
-
to_hdf
(**kwargs)¶ pandas.DataFrame.to_hdf is not supported in the Beam DataFrame API because HDF5 is a random access file format.
-
to_html
(path, *args, **kwargs)¶
-
to_json
(path, orient=None, *args, **kwargs)¶
-
to_latex
(**kwargs)¶
-
to_list
(**kwargs)¶
-
to_markdown
(**kwargs)¶
-
to_msgpack
(**kwargs)¶ pandas.DataFrame.to_msgpack is not supported in the Beam DataFrame API because it is deprecated in pandas.
-
to_parquet
(path, *args, **kwargs)¶
-
to_period
(**kwargs)¶
-
to_pickle
(**kwargs)¶
-
to_sql
(**kwargs)¶
-
to_stata
(path, *args, **kwargs)¶
-
to_timestamp
(**kwargs)¶
-
to_xarray
(**kwargs)¶
-
transform
(**kwargs)¶
-
transpose
(**kwargs)¶
-
truediv
(**kwargs)¶
-
truncate
(**kwargs)¶
-
tshift
(**kwargs)¶
-
tz_convert
(**kwargs)¶
-
tz_localize
(ambiguous, **kwargs)¶
-
value_counts
(**kwargs)¶
-
where
(cond, other, errors, **kwargs)¶
-
classmethod
wrap
(expr, split_tuples=True)¶
-
xs
(**kwargs)¶
-
-
class
apache_beam.dataframe.frames.
DeferredDataFrame
(expr)[source]¶ Bases:
apache_beam.dataframe.frames.DeferredDataFrameOrSeries
-
T
¶
-
columns
¶
-
loc
¶
-
iloc
¶
-
axes
¶
-
dtypes
¶
-
agg
(func, axis=0, *args, **kwargs)¶
-
applymap
(**kwargs)¶ Apply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar to every element of a DataFrame.
Parameters: - func (callable) – Python function, returns a single value from a single value.
- na_action ({None, 'ignore'}, default None) –
If ‘ignore’, propagate NaN values, without passing them to func.
New in version 1.2.
Returns: Transformed DeferredDataFrame.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.apply()
- Apply a function along input axis of DeferredDataFrame.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]]) >>> df 0 1 0 1.000 2.120 1 3.356 4.567 >>> df.applymap(lambda x: len(str(x))) 0 1 0 3 4 1 5 5 Like Series.map, NA values can be ignored: >>> df_copy = df.copy() >>> df_copy.iloc[0, 0] = pd.NA >>> df_copy.applymap(lambda x: len(str(x)), na_action='ignore') 0 1 0 <NA> 4 1 5 5 Note that a vectorized version of `func` often exists, which will be much faster. You could square each number elementwise. >>> df.applymap(lambda x: x**2) 0 1 0 1.000000 4.494400 1 11.262736 20.857489 But it's better to avoid applymap in that case. >>> df ** 2 0 1 0 1.000000 4.494400 1 11.262736 20.857489
-
add_prefix
(**kwargs)¶ Prefix labels with string prefix.
For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.
Parameters: prefix (str) – The string to add before each label. Returns: New DeferredSeries or DeferredDataFrame with updated labels. Return type: DeferredSeries or DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.add_suffix()
- Suffix row labels with string suffix.
DeferredDataFrame.add_suffix()
- Suffix column labels with string suffix.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2, 3, 4]) >>> s 0 1 1 2 2 3 3 4 dtype: int64 >>> s.add_prefix('item_') item_0 1 item_1 2 item_2 3 item_3 4 dtype: int64 >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}) >>> df A B 0 1 3 1 2 4 2 3 5 3 4 6 >>> df.add_prefix('col_') col_A col_B 0 1 3 1 2 4 2 3 5 3 4 6
-
add_suffix
(**kwargs)¶ Suffix labels with string suffix.
For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.
Parameters: suffix (str) – The string to add after each label. Returns: New DeferredSeries or DeferredDataFrame with updated labels. Return type: DeferredSeries or DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.add_prefix()
- Prefix row labels with string prefix.
DeferredDataFrame.add_prefix()
- Prefix column labels with string prefix.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2, 3, 4]) >>> s 0 1 1 2 2 3 3 4 dtype: int64 >>> s.add_suffix('_item') 0_item 1 1_item 2 2_item 3 3_item 4 dtype: int64 >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}) >>> df A B 0 1 3 1 2 4 2 3 5 3 4 6 >>> df.add_suffix('_col') A_col B_col 0 1 3 1 2 4 2 3 5 3 4 6
-
memory_usage
(**kwargs)¶ pandas.DataFrame.memory_usage is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
info
(**kwargs)¶ pandas.DataFrame.info is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
clip
(**kwargs)¶
-
corr
(method, min_periods)[source]¶ Compute pairwise correlation of columns, excluding NA/null values.
Parameters: - method ({'pearson', 'kendall', 'spearman'} or callable) –
Method of correlation:
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
- and returning a float. Note that the returned matrix from corr
will have 1 along the diagonals and will be symmetric
regardless of the callable’s behavior.
New in version 0.24.0.
- min_periods (int, optional) – Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.
Returns: Correlation matrix.
Return type: Differences from pandas
Only
method="pearson"
can be parallelized. Other methods require collecting all data on a single worker (see https://s.apache.org/dataframe-non-parallelizable-operations for details).See also
DeferredDataFrame.corrwith()
- Compute pairwise correlation with another DeferredDataFrame or DeferredSeries.
DeferredSeries.corr()
- Compute the correlation between two DeferredSeries.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API. In addition, some arguments shown here may not be supported, see ‘Differences from pandas’ for details.
>>> def histogram_intersection(a, b): ... v = np.minimum(a, b).sum().round(decimals=1) ... return v >>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) >>> df.corr(method=histogram_intersection) dogs cats dogs 1.0 0.3 cats 0.3 1.0
- method ({'pearson', 'kendall', 'spearman'} or callable) –
-
cummax
(**kwargs)¶ pandas.DataFrame.cummax is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cummin
(**kwargs)¶ pandas.DataFrame.cummin is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cumprod
(**kwargs)¶ pandas.DataFrame.cumprod is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
cumsum
(**kwargs)¶ pandas.DataFrame.cumsum is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
diff
(**kwargs)¶ pandas.DataFrame.diff is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
first
(**kwargs)¶ pandas.DataFrame.first is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
head
(**kwargs)¶ pandas.DataFrame.head is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
interpolate
(**kwargs)¶ pandas.DataFrame.interpolate is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
last
(**kwargs)¶ pandas.DataFrame.last is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
tail
(**kwargs)¶ pandas.DataFrame.tail is not supported in the Beam DataFrame API because it is sensitive to the order of the data.
For more information see {reason_data[‘url’]}.
-
isnull
(**kwargs)¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).Returns: Mask of bool values for each element in DeferredDataFrame that indicates whether an element is an NA value. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.isnull()
- Alias of isna.
DeferredDataFrame.notna()
- Boolean inverse of isna.
DeferredDataFrame.dropna()
- Omit axes labels with missing values.
isna()
- Top-level isna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False Show which entries in a Series are NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.isna() 0 False 1 False 2 True dtype: bool
-
isna
(**kwargs)¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).Returns: Mask of bool values for each element in DeferredDataFrame that indicates whether an element is an NA value. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.isnull()
- Alias of isna.
DeferredDataFrame.notna()
- Boolean inverse of isna.
DeferredDataFrame.dropna()
- Omit axes labels with missing values.
isna()
- Top-level isna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False Show which entries in a Series are NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.isna() 0 False 1 False 2 True dtype: bool
-
notnull
(**kwargs)¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.Returns: Mask of bool values for each element in DeferredDataFrame that indicates whether an element is not an NA value. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.notnull()
- Alias of notna.
DeferredDataFrame.isna()
- Boolean inverse of notna.
DeferredDataFrame.dropna()
- Omit axes labels with missing values.
notna()
- Top-level notna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are not NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.notna() age born name toy 0 True False True False 1 True True True True 2 False True True True Show which entries in a Series are not NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.notna() 0 True 1 True 2 False dtype: bool
-
notna
(**kwargs)¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.Returns: Mask of bool values for each element in DeferredDataFrame that indicates whether an element is not an NA value. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.notnull()
- Alias of notna.
DeferredDataFrame.isna()
- Boolean inverse of notna.
DeferredDataFrame.dropna()
- Omit axes labels with missing values.
notna()
- Top-level notna.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Show which entries in a DataFrame are not NA. >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], ... born=[pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... name=['Alfred', 'Batman', ''], ... toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker >>> df.notna() age born name toy 0 True False True False 1 True True True True 2 False True True True Show which entries in a Series are not NA. >>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 >>> ser.notna() 0 True 1 True 2 False dtype: bool
-
items
(**kwargs)¶ pandas.DataFrame.items is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
itertuples
(**kwargs)¶ pandas.DataFrame.itertuples is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
iterrows
(**kwargs)¶ pandas.DataFrame.iterrows is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
iteritems
(**kwargs)¶ pandas.DataFrame.iteritems is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
plot
(**kwargs)¶ pandas.DataFrame.plot is not supported in the Beam DataFrame API because it is a plotting tool.
For more information see {reason_data[‘url’]}.
-
rename_axis
(**kwargs)¶ Set the name of the axis for the index or columns.
Parameters: - mapper (scalar, list-like, optional) – Value to set the axis name attribute.
- columns (index,) –
A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the
columns
parameter is not allowed if the object is a DeferredSeries. This parameter only apply for DeferredDataFrame type objects.Use either
mapper
andaxis
to specify the axis to target withmapper
, orindex
and/orcolumns
.Changed in version 0.24.0.
- axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to rename.
- copy (bool, default True) – Also copy underlying data.
- inplace (bool, default False) – Modifies the object directly, instead of creating a new DeferredSeries or DeferredDataFrame.
Returns: The same type as the caller or None if
inplace=True
.Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredSeries.rename()
- Alter DeferredSeries index labels or name.
DeferredDataFrame.rename()
- Alter DeferredDataFrame index labels or name.
Index.rename()
- Set new names on index.
Notes
DeferredDataFrame.rename_axis
supports two calling conventions(index=index_mapper, columns=columns_mapper, ...)
(mapper, axis={'index', 'columns'}, ...)
The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter
copy
is ignored.The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.
We highly recommend using keyword arguments to clarify your intent.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
**Series** >>> s = pd.Series(["dog", "cat", "monkey"]) >>> s 0 dog 1 cat 2 monkey dtype: object >>> s.rename_axis("animal") animal 0 dog 1 cat 2 monkey dtype: object **DataFrame** >>> df = pd.DataFrame({"num_legs": [4, 4, 2], ... "num_arms": [0, 0, 2]}, ... ["dog", "cat", "monkey"]) >>> df num_legs num_arms dog 4 0 cat 4 0 monkey 2 2 >>> df = df.rename_axis("animal") >>> df num_legs num_arms animal dog 4 0 cat 4 0 monkey 2 2 >>> df = df.rename_axis("limbs", axis="columns") >>> df limbs num_legs num_arms animal dog 4 0 cat 4 0 monkey 2 2 **MultiIndex** >>> df.index = pd.MultiIndex.from_product([['mammal'], ... ['dog', 'cat', 'monkey']], ... names=['type', 'name']) >>> df limbs num_legs num_arms type name mammal dog 4 0 cat 4 0 monkey 2 2 >>> df.rename_axis(index={'type': 'class'}) limbs num_legs num_arms class name mammal dog 4 0 cat 4 0 monkey 2 2 >>> df.rename_axis(columns=str.upper) LIMBS num_legs num_arms type name mammal dog 4 0 cat 4 0 monkey 2 2
-
select_dtypes
(**kwargs)¶ Return a subset of the DataFrame’s columns based on the column dtypes.
Parameters: exclude (include,) – A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied. Returns: The subset of the frame including the dtypes in include
and excluding the dtypes inexclude
.Return type: DeferredDataFrame Raises: ValueError
– * If both ofinclude
andexclude
are empty * Ifinclude
andexclude
have overlapping elements * If any kind of string dtype is passed in.Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.dtypes()
- Return DeferredSeries with the data type of each column.
Notes
- To select all numeric types, use
np.number
or'number'
- To select strings you must use the
object
dtype, but note that this will return all object dtype columns - See the numpy dtype hierarchy
- To select datetimes, use
np.datetime64
,'datetime'
or'datetime64'
- To select timedeltas, use
np.timedelta64
,'timedelta'
or'timedelta64'
- To select Pandas categorical dtypes, use
'category'
- To select Pandas datetimetz dtypes, use
'datetimetz'
(new in 0.20.0) or'datetime64[ns, tz]'
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'a': [1, 2] * 3, ... 'b': [True, False] * 3, ... 'c': [1.0, 2.0] * 3}) >>> df a b c 0 1 True 1.0 1 2 False 2.0 2 1 True 1.0 3 2 False 2.0 4 1 True 1.0 5 2 False 2.0 >>> df.select_dtypes(include='bool') b 0 True 1 False 2 True 3 False 4 True 5 False >>> df.select_dtypes(include=['float64']) c 0 1.0 1 2.0 2 1.0 3 2.0 4 1.0 5 2.0 >>> df.select_dtypes(exclude=['int64']) b c 0 True 1.0 1 False 2.0 2 True 1.0 3 False 2.0 4 True 1.0 5 False 2.0
-
shape
¶ pandas.DataFrame.shape is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
stack
(**kwargs)¶ Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:
- if the columns have a single level, the output is a Series;
- if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
Parameters: - level (int, str, list, default -1) – Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
- dropna (bool, default True) – Whether to drop rows in the resulting Frame/DeferredSeries with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.
Returns: Stacked dataframe or series.
Return type: Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.unstack()
- Unstack prescribed level(s) from index axis onto column axis.
DeferredDataFrame.pivot()
- Reshape dataframe from long format to wide format.
DeferredDataFrame.pivot_table()
- Create a spreadsheet-style pivot table as a DeferredDataFrame.
Notes
The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
**Single level columns** >>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]], ... index=['cat', 'dog'], ... columns=['weight', 'height']) Stacking a dataframe with a single level column axis returns a Series: >>> df_single_level_cols weight height cat 0 1 dog 2 3 >>> df_single_level_cols.stack() cat weight 0 height 1 dog weight 2 height 3 dtype: int64 **Multi level columns: simple case** >>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('weight', 'pounds')]) >>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]], ... index=['cat', 'dog'], ... columns=multicol1) Stacking a dataframe with a multi-level column axis: >>> df_multi_level_cols1 weight kg pounds cat 1 2 dog 2 4 >>> df_multi_level_cols1.stack() weight cat kg 1 pounds 2 dog kg 2 pounds 4 **Missing values** >>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('height', 'm')]) >>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]], ... index=['cat', 'dog'], ... columns=multicol2) It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs: >>> df_multi_level_cols2 weight height kg m cat 1.0 2.0 dog 3.0 4.0 >>> df_multi_level_cols2.stack() height weight cat kg NaN 1.0 m 2.0 NaN dog kg NaN 3.0 m 4.0 NaN **Prescribing the level(s) to be stacked** The first parameter controls which level or levels are stacked: >>> df_multi_level_cols2.stack(0) kg m cat height NaN 2.0 weight 1.0 NaN dog height NaN 4.0 weight 3.0 NaN >>> df_multi_level_cols2.stack([0, 1]) cat height m 2.0 weight kg 1.0 dog height m 4.0 weight kg 3.0 dtype: float64 **Dropping missing values** >>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]], ... index=['cat', 'dog'], ... columns=multicol2) Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter: >>> df_multi_level_cols3 weight height kg m cat NaN 1.0 dog 2.0 3.0 >>> df_multi_level_cols3.stack(dropna=False) height weight cat kg NaN NaN m 1.0 NaN dog kg NaN 2.0 m 3.0 NaN >>> df_multi_level_cols3.stack(dropna=True) height weight cat m 1.0 NaN dog kg NaN 2.0 m 3.0 NaN
-
all
(*args, **kwargs)¶
-
any
(*args, **kwargs)¶
-
count
(*args, **kwargs)¶
-
max
(*args, **kwargs)¶
-
min
(*args, **kwargs)¶
-
prod
(*args, **kwargs)¶
-
product
(*args, **kwargs)¶
-
sum
(*args, **kwargs)¶
-
mean
(*args, **kwargs)¶
-
median
(*args, **kwargs)¶
-
take
(**kwargs)¶ pandas.DataFrame.take is not supported in the Beam DataFrame API because it is deprecated in pandas.
-
to_records
(**kwargs)¶ pandas.DataFrame.to_records is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_dict
(**kwargs)¶ pandas.DataFrame.to_dict is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_numpy
(**kwargs)¶ pandas.DataFrame.to_numpy is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_string
(**kwargs)¶ pandas.DataFrame.to_string is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
to_sparse
(**kwargs)¶ pandas.DataFrame.to_sparse is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
transpose
(**kwargs)¶ pandas.DataFrame.transpose is not supported in the Beam DataFrame API because the columns in the output DataFrame depend on the data.
For more information see {reason_data[‘url’]}.
-
update
(**kwargs)¶ Modify in place using non-NA values from another DataFrame.
Aligns on indices. There is no return value.
Parameters: - other (DeferredDataFrame, or object coercible into a DeferredDataFrame) – Should have at least one matching index/column label with the original DeferredDataFrame. If a DeferredSeries is passed, its name attribute must be set, and that will be used as the column name to align with the original DeferredDataFrame.
- join ({'left'}, default 'left') – Only left join is implemented, keeping the index and columns of the original object.
- overwrite (bool, default True) –
How to handle non-NA values for overlapping keys:
- True: overwrite original DeferredDataFrame’s values with values from other.
- False: only update values that are NA in the original DeferredDataFrame.
- filter_func (callable(1d-array) -> bool 1d-array, optional) – Can choose to replace values other than NA. Return True for values that should be updated.
- errors ({'raise', 'ignore'}, default 'ignore') –
If ‘raise’, will raise a ValueError if the DeferredDataFrame and other both contain non-NA data in the same place.
Changed in version 0.24.0: Changed from raise_conflict=False|True to errors=’ignore’|’raise’.
Returns: None
Return type: method directly changes calling object
Raises: ValueError
– * When errors=’raise’ and there’s overlapping non-NA data. * When errors is not either ‘ignore’ or ‘raise’NotImplementedError
– * If join != ‘left’
Differences from pandas
This operation has no known divergences from the pandas API.
See also
dict.update()
- Similar method for dictionaries.
DeferredDataFrame.merge()
- For column(s)-on-column(s) operations.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'A': [1, 2, 3], ... 'B': [400, 500, 600]}) >>> new_df = pd.DataFrame({'B': [4, 5, 6], ... 'C': [7, 8, 9]}) >>> df.update(new_df) >>> df A B 0 1 4 1 2 5 2 3 6 The DataFrame's length does not increase as a result of the update, only values at matching index/column labels are updated. >>> df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) >>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']}) >>> df.update(new_df) >>> df A B 0 a d 1 b e 2 c f For Series, its name attribute must be set. >>> df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) >>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2]) >>> df.update(new_column) >>> df A B 0 a d 1 b y 2 c e >>> df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) >>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2]) >>> df.update(new_df) >>> df A B 0 a x 1 b d 2 c e If `other` contains NaNs the corresponding values are not updated in the original dataframe. >>> df = pd.DataFrame({'A': [1, 2, 3], ... 'B': [400, 500, 600]}) >>> new_df = pd.DataFrame({'B': [4, np.nan, 6]}) >>> df.update(new_df) >>> df A B 0 1 4.0 1 2 500.0 2 3 6.0
-
values
¶ pandas.DataFrame.values is not supported in the Beam DataFrame API because it produces an output type that is not deferred.
For more information see {reason_data[‘url’]}.
-
abs
(**kwargs)¶ Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
Returns: DeferredSeries/DeferredDataFrame containing the absolute value of each element. Return type: abs Differences from pandas
This operation has no known divergences from the pandas API.
See also
numpy.absolute()
- Calculate the absolute value element-wise.
Notes
For
complex
inputs,1.2 + 1j
, the absolute value is \(\sqrt{ a^2 + b^2 }\).Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Absolute numeric values in a Series. >>> s = pd.Series([-1.10, 2, -3.33, 4]) >>> s.abs() 0 1.10 1 2.00 2 3.33 3 4.00 dtype: float64 Absolute numeric values in a Series with complex numbers. >>> s = pd.Series([1.2 + 1j]) >>> s.abs() 0 1.56205 dtype: float64 Absolute numeric values in a Series with a Timedelta element. >>> s = pd.Series([pd.Timedelta('1 days')]) >>> s.abs() 0 1 days dtype: timedelta64[ns] Select rows with data closest to certain value using argsort (from `StackOverflow <https://stackoverflow.com/a/17758115>`__). >>> df = pd.DataFrame({ ... 'a': [4, 5, 6, 7], ... 'b': [10, 20, 30, 40], ... 'c': [100, 50, -30, -50] ... }) >>> df a b c 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 >>> df.loc[(df.c - 43).abs().argsort()] a b c 1 5 20 50 0 4 10 100 2 6 30 -30 3 7 40 -50
-
add
(**kwargs)¶
-
apply
(**kwargs)¶
-
asfreq
(**kwargs)¶
-
asof
(**kwargs)¶
-
astype
(**kwargs)¶ Cast a pandas object to a specified dtype
dtype
.Parameters: - dtype (data type, or dict of column name -> data type) – Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DeferredDataFrame’s columns to column-specific types.
- copy (bool, default True) – Return a copy when
copy=True
(be very careful settingcopy=False
as changes to values then may propagate to other pandas objects). - errors ({'raise', 'ignore'}, default 'raise') –
Control raising of exceptions on invalid data for provided dtype.
raise
: allow exceptions to be raisedignore
: suppress exceptions. On error return original object.
Returns: casted
Return type: same type as caller
Differences from pandas
This operation has no known divergences from the pandas API.
See also
to_datetime()
- Convert argument to datetime.
to_timedelta()
- Convert argument to timedelta.
to_numeric()
- Convert argument to a numeric type.
numpy.ndarray.astype()
- Cast a numpy array to a specified type.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
Create a DataFrame: >>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df.dtypes col1 int64 col2 int64 dtype: object Cast all columns to int32: >>> df.astype('int32').dtypes col1 int32 col2 int32 dtype: object Cast col1 to int32 using a dictionary: >>> df.astype({'col1': 'int32'}).dtypes col1 int32 col2 int64 dtype: object Create a series: >>> ser = pd.Series([1, 2], dtype='int32') >>> ser 0 1 1 2 dtype: int32 >>> ser.astype('int64') 0 1 1 2 dtype: int64 Convert to categorical type: >>> ser.astype('category') 0 1 1 2 dtype: category Categories (2, int64): [1, 2] Convert to ordered categorical type with custom ordering: >>> cat_dtype = pd.api.types.CategoricalDtype( ... categories=[2, 1], ordered=True) >>> ser.astype(cat_dtype) 0 1 1 2 dtype: category Categories (2, int64): [2 < 1] Note that using ``copy=False`` and changing data on a new pandas object may propagate changes: >>> s1 = pd.Series([1, 2]) >>> s2 = s1.astype('int64', copy=False) >>> s2[0] = 10 >>> s1 # note that s1[0] has changed too 0 10 1 2 dtype: int64 Create a series of dates: >>> ser_date = pd.Series(pd.date_range('20200101', periods=3)) >>> ser_date 0 2020-01-01 1 2020-01-02 2 2020-01-03 dtype: datetime64[ns] Datetimes are localized to UTC first before converting to the specified timezone: >>> ser_date.astype('datetime64[ns, US/Eastern]') 0 2019-12-31 19:00:00-05:00 1 2020-01-01 19:00:00-05:00 2 2020-01-02 19:00:00-05:00 dtype: datetime64[ns, US/Eastern]
-
at
¶
-
at_time
(**kwargs)¶
-
attrs
¶ pandas.DataFrame.attrs is not supported in the Beam DataFrame API because it is experimental in pandas.
-
backfill
(**kwargs)¶
-
between_time
(**kwargs)¶
-
bfill
(**kwargs)¶
-
bool
()¶
-
boxplot
(**kwargs)¶
-
combine
(**kwargs)¶
-
combine_first
(**kwargs)¶
-
compare
(**kwargs)¶
-
convert_dtypes
(**kwargs)¶
-
copy
(**kwargs)¶ Make a copy of this object’s indices and data.
When
deep=True
(default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).When
deep=False
, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).Parameters: deep (bool, default True) – Make a deep copy, including a copy of the data and the indices. With deep=False
neither the indices nor the data are copied.Returns: copy – Object type matches caller. Return type: DeferredSeries or DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
Notes
When
deep=True
, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).While
Index
objects are copied whendeep=True
, the underlying numpy array is not copied for performance reasons. SinceIndex
is immutable, the underlying data can be safely shared and a copy is not needed.Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> s = pd.Series([1, 2], index=["a", "b"]) >>> s a 1 b 2 dtype: int64 >>> s_copy = s.copy() >>> s_copy a 1 b 2 dtype: int64 **Shallow copy versus default (deep) copy:** >>> s = pd.Series([1, 2], index=["a", "b"]) >>> deep = s.copy() >>> shallow = s.copy(deep=False) Shallow copy shares data and index with original. >>> s is shallow False >>> s.values is shallow.values and s.index is shallow.index True Deep copy has own copy of data and index. >>> s is deep False >>> s.values is deep.values or s.index is deep.index False Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged. >>> s[0] = 3 >>> shallow[1] = 4 >>> s a 3 b 4 dtype: int64 >>> shallow a 3 b 4 dtype: int64 >>> deep a 1 b 2 dtype: int64 Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy. >>> s = pd.Series([[1, 2], [3, 4]]) >>> deep = s.copy() >>> s[0][0] = 10 >>> s 0 [10, 2] 1 [3, 4] dtype: object >>> deep 0 [10, 2] 1 [3, 4] dtype: object
-
describe
(**kwargs)¶
-
div
(**kwargs)¶
-
divide
(**kwargs)¶
-
drop
(labels, axis, index, columns, errors, **kwargs)¶
-
drop_duplicates
(**kwargs)¶
-
droplevel
(level, axis)¶
-
dtype
¶
-
duplicated
(**kwargs)¶
-
empty
¶
-
eq
(**kwargs)¶ Get Equal to of dataframe and other, element-wise (binary operator eq).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
Parameters: - other (scalar, sequence, DeferredSeries, or DeferredDataFrame) – Any single or multiple element data structure, or list-like object.
- axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: Result of the comparison.
Return type: DeferredDataFrame of bool
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Compare DeferredDataFrames for equality elementwise.
DeferredDataFrame.ne()
- Compare DeferredDataFrames for inequality elementwise.
DeferredDataFrame.le()
- Compare DeferredDataFrames for less than inequality or equality elementwise.
DeferredDataFrame.lt()
- Compare DeferredDataFrames for strictly less than inequality elementwise.
DeferredDataFrame.ge()
- Compare DeferredDataFrames for greater than inequality or equality elementwise.
DeferredDataFrame.gt()
- Compare DeferredDataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'cost': [250, 150, 100], ... 'revenue': [100, 250, 300]}, ... index=['A', 'B', 'C']) >>> df cost revenue A 250 100 B 150 250 C 100 300 Comparison with a scalar, using either the operator or method: >>> df == 100 cost revenue A False True B False False C True False >>> df.eq(100) cost revenue A False True B False False C True False When `other` is a :class:`Series`, the columns of a DataFrame are aligned with the index of `other` and broadcast: >>> df != pd.Series([100, 250], index=["cost", "revenue"]) cost revenue A True True B True False C False True Use the method to control the broadcast axis: >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index') cost revenue A True False B True True C True True D True True When comparing to an arbitrary sequence, the number of columns must match the number elements in `other`: >>> df == [250, 100] cost revenue A True True B False False C False False Use the method to control the axis: >>> df.eq([250, 250, 100], axis='index') cost revenue A True False B False True C True False Compare to a DataFrame of different shape. >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, ... index=['A', 'B', 'C', 'D']) >>> other revenue A 300 B 250 C 100 D 150 >>> df.gt(other) cost revenue A False False B False False C False True D False False Compare to a MultiIndex by level. >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220], ... 'revenue': [100, 250, 300, 200, 175, 225]}, ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ... ['A', 'B', 'C', 'A', 'B', 'C']]) >>> df_multindex cost revenue Q1 A 250 100 B 150 250 C 100 300 Q2 A 150 200 B 300 175 C 220 225 >>> df.le(df_multindex, level=1) cost revenue Q1 A True True B True True C True True Q2 A False True B True False C True False
-
equals
(other)¶
-
ewm
(**kwargs)¶
-
expanding
(**kwargs)¶
-
ffill
(**kwargs)¶
-
fillna
(value, method, axis, limit, **kwargs)¶
-
filter
(**kwargs)¶ Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
Parameters: - items (list-like) – Keep labels from axis which are in items.
- like (str) – Keep labels from axis for which “like in label == True”.
- regex (str (regular expression)) – Keep labels from axis for which re.search(regex, label) == True.
- axis ({0 or ‘index’, 1 or ‘columns’, None}, default None) – The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, ‘index’ for DeferredSeries, ‘columns’ for DeferredDataFrame.
Returns: Return type: same type as input object
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.loc()
- Access a group of rows and columns by label(s) or a boolean array.
Notes
The
items
,like
, andregex
parameters are enforced to be mutually exclusive.axis
defaults to the info axis that is used when indexing with[]
.Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])), ... index=['mouse', 'rabbit'], ... columns=['one', 'two', 'three']) >>> df one two three mouse 1 2 3 rabbit 4 5 6 >>> # select columns by name >>> df.filter(items=['one', 'three']) one three mouse 1 3 rabbit 4 6 >>> # select columns by regular expression >>> df.filter(regex='e$', axis=1) one three mouse 1 3 rabbit 4 6 >>> # select rows containing 'bbi' >>> df.filter(like='bbi', axis=0) one two three rabbit 4 5 6
-
first_valid_index
(**kwargs)¶
-
flags
¶
-
floordiv
(**kwargs)¶
-
from_dict
(**kwargs)¶
-
from_records
(**kwargs)¶
-
ge
(**kwargs)¶ Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
Parameters: - other (scalar, sequence, DeferredSeries, or DeferredDataFrame) – Any single or multiple element data structure, or list-like object.
- axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: Result of the comparison.
Return type: DeferredDataFrame of bool
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Compare DeferredDataFrames for equality elementwise.
DeferredDataFrame.ne()
- Compare DeferredDataFrames for inequality elementwise.
DeferredDataFrame.le()
- Compare DeferredDataFrames for less than inequality or equality elementwise.
DeferredDataFrame.lt()
- Compare DeferredDataFrames for strictly less than inequality elementwise.
DeferredDataFrame.ge()
- Compare DeferredDataFrames for greater than inequality or equality elementwise.
DeferredDataFrame.gt()
- Compare DeferredDataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'cost': [250, 150, 100], ... 'revenue': [100, 250, 300]}, ... index=['A', 'B', 'C']) >>> df cost revenue A 250 100 B 150 250 C 100 300 Comparison with a scalar, using either the operator or method: >>> df == 100 cost revenue A False True B False False C True False >>> df.eq(100) cost revenue A False True B False False C True False When `other` is a :class:`Series`, the columns of a DataFrame are aligned with the index of `other` and broadcast: >>> df != pd.Series([100, 250], index=["cost", "revenue"]) cost revenue A True True B True False C False True Use the method to control the broadcast axis: >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index') cost revenue A True False B True True C True True D True True When comparing to an arbitrary sequence, the number of columns must match the number elements in `other`: >>> df == [250, 100] cost revenue A True True B False False C False False Use the method to control the axis: >>> df.eq([250, 250, 100], axis='index') cost revenue A True False B False True C True False Compare to a DataFrame of different shape. >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, ... index=['A', 'B', 'C', 'D']) >>> other revenue A 300 B 250 C 100 D 150 >>> df.gt(other) cost revenue A False False B False False C False True D False False Compare to a MultiIndex by level. >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220], ... 'revenue': [100, 250, 300, 200, 175, 225]}, ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ... ['A', 'B', 'C', 'A', 'B', 'C']]) >>> df_multindex cost revenue Q1 A 250 100 B 150 250 C 100 300 Q2 A 150 200 B 300 175 C 220 225 >>> df.le(df_multindex, level=1) cost revenue Q1 A True True B True True C True True Q2 A False True B True False C True False
-
get
(**kwargs)¶
-
groupby
(by, level, axis, as_index, group_keys, **kwargs)¶
-
gt
(**kwargs)¶ Get Greater than of dataframe and other, element-wise (binary operator gt).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
Parameters: - other (scalar, sequence, DeferredSeries, or DeferredDataFrame) – Any single or multiple element data structure, or list-like object.
- axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: Result of the comparison.
Return type: DeferredDataFrame of bool
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Compare DeferredDataFrames for equality elementwise.
DeferredDataFrame.ne()
- Compare DeferredDataFrames for inequality elementwise.
DeferredDataFrame.le()
- Compare DeferredDataFrames for less than inequality or equality elementwise.
DeferredDataFrame.lt()
- Compare DeferredDataFrames for strictly less than inequality elementwise.
DeferredDataFrame.ge()
- Compare DeferredDataFrames for greater than inequality or equality elementwise.
DeferredDataFrame.gt()
- Compare DeferredDataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'cost': [250, 150, 100], ... 'revenue': [100, 250, 300]}, ... index=['A', 'B', 'C']) >>> df cost revenue A 250 100 B 150 250 C 100 300 Comparison with a scalar, using either the operator or method: >>> df == 100 cost revenue A False True B False False C True False >>> df.eq(100) cost revenue A False True B False False C True False When `other` is a :class:`Series`, the columns of a DataFrame are aligned with the index of `other` and broadcast: >>> df != pd.Series([100, 250], index=["cost", "revenue"]) cost revenue A True True B True False C False True Use the method to control the broadcast axis: >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index') cost revenue A True False B True True C True True D True True When comparing to an arbitrary sequence, the number of columns must match the number elements in `other`: >>> df == [250, 100] cost revenue A True True B False False C False False Use the method to control the axis: >>> df.eq([250, 250, 100], axis='index') cost revenue A True False B False True C True False Compare to a DataFrame of different shape. >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, ... index=['A', 'B', 'C', 'D']) >>> other revenue A 300 B 250 C 100 D 150 >>> df.gt(other) cost revenue A False False B False False C False True D False False Compare to a MultiIndex by level. >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220], ... 'revenue': [100, 250, 300, 200, 175, 225]}, ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ... ['A', 'B', 'C', 'A', 'B', 'C']]) >>> df_multindex cost revenue Q1 A 250 100 B 150 250 C 100 300 Q2 A 150 200 B 300 175 C 220 225 >>> df.le(df_multindex, level=1) cost revenue Q1 A True True B True True C True True Q2 A False True B True False C True False
-
hist
(**kwargs)¶ pandas.DataFrame.hist is not supported in the Beam DataFrame API because it is a plotting tool.
For more information see {reason_data[‘url’]}.
-
iat
¶
-
idxmax
(**kwargs)¶
-
idxmin
(**kwargs)¶
-
index
¶
-
infer_objects
(**kwargs)¶
-
insert
(**kwargs)¶
-
isin
(**kwargs)¶ Whether each element in the DataFrame is contained in values.
Parameters: values (iterable, DeferredSeries, DeferredDataFrame or dict) – The result will only be true at a location if all the labels match. If values is a DeferredSeries, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DeferredDataFrame, then both the index and column labels must match. Returns: DeferredDataFrame of booleans showing whether each element in the DeferredDataFrame is contained in values. Return type: DeferredDataFrame Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Equality test for DeferredDataFrame.
DeferredSeries.isin()
- Equivalent method on DeferredSeries.
DeferredSeries.str.contains()
- Test if pattern or regex is contained within a string of a DeferredSeries or Index.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]}, ... index=['falcon', 'dog']) >>> df num_legs num_wings falcon 2 2 dog 4 0 When ``values`` is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings) >>> df.isin([0, 2]) num_legs num_wings falcon True True dog False True When ``values`` is a dict, we can pass values to check for each column separately: >>> df.isin({'num_wings': [0, 3]}) num_legs num_wings falcon False False dog False True When ``values`` is a Series or DataFrame the index and column must match. Note that 'falcon' does not match based on the number of legs in df2. >>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]}, ... index=['spider', 'falcon']) >>> df.isin(other) num_legs num_wings falcon True True dog False False
-
kurt
(**kwargs)¶
-
kurtosis
(**kwargs)¶
-
last_valid_index
(**kwargs)¶
-
le
(**kwargs)¶ Get Less than or equal to of dataframe and other, element-wise (binary operator le).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
Parameters: - other (scalar, sequence, DeferredSeries, or DeferredDataFrame) – Any single or multiple element data structure, or list-like object.
- axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: Result of the comparison.
Return type: DeferredDataFrame of bool
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Compare DeferredDataFrames for equality elementwise.
DeferredDataFrame.ne()
- Compare DeferredDataFrames for inequality elementwise.
DeferredDataFrame.le()
- Compare DeferredDataFrames for less than inequality or equality elementwise.
DeferredDataFrame.lt()
- Compare DeferredDataFrames for strictly less than inequality elementwise.
DeferredDataFrame.ge()
- Compare DeferredDataFrames for greater than inequality or equality elementwise.
DeferredDataFrame.gt()
- Compare DeferredDataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'cost': [250, 150, 100], ... 'revenue': [100, 250, 300]}, ... index=['A', 'B', 'C']) >>> df cost revenue A 250 100 B 150 250 C 100 300 Comparison with a scalar, using either the operator or method: >>> df == 100 cost revenue A False True B False False C True False >>> df.eq(100) cost revenue A False True B False False C True False When `other` is a :class:`Series`, the columns of a DataFrame are aligned with the index of `other` and broadcast: >>> df != pd.Series([100, 250], index=["cost", "revenue"]) cost revenue A True True B True False C False True Use the method to control the broadcast axis: >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index') cost revenue A True False B True True C True True D True True When comparing to an arbitrary sequence, the number of columns must match the number elements in `other`: >>> df == [250, 100] cost revenue A True True B False False C False False Use the method to control the axis: >>> df.eq([250, 250, 100], axis='index') cost revenue A True False B False True C True False Compare to a DataFrame of different shape. >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, ... index=['A', 'B', 'C', 'D']) >>> other revenue A 300 B 250 C 100 D 150 >>> df.gt(other) cost revenue A False False B False False C False True D False False Compare to a MultiIndex by level. >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220], ... 'revenue': [100, 250, 300, 200, 175, 225]}, ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ... ['A', 'B', 'C', 'A', 'B', 'C']]) >>> df_multindex cost revenue Q1 A 250 100 B 150 250 C 100 300 Q2 A 150 200 B 300 175 C 220 225 >>> df.le(df_multindex, level=1) cost revenue Q1 A True True B True True C True True Q2 A False True B True False C True False
-
lookup
(**kwargs)¶
-
lt
(**kwargs)¶ Get Less than of dataframe and other, element-wise (binary operator lt).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
Parameters: - other (scalar, sequence, DeferredSeries, or DeferredDataFrame) – Any single or multiple element data structure, or list-like object.
- axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: Result of the comparison.
Return type: DeferredDataFrame of bool
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Compare DeferredDataFrames for equality elementwise.
DeferredDataFrame.ne()
- Compare DeferredDataFrames for inequality elementwise.
DeferredDataFrame.le()
- Compare DeferredDataFrames for less than inequality or equality elementwise.
DeferredDataFrame.lt()
- Compare DeferredDataFrames for strictly less than inequality elementwise.
DeferredDataFrame.ge()
- Compare DeferredDataFrames for greater than inequality or equality elementwise.
DeferredDataFrame.gt()
- Compare DeferredDataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'cost': [250, 150, 100], ... 'revenue': [100, 250, 300]}, ... index=['A', 'B', 'C']) >>> df cost revenue A 250 100 B 150 250 C 100 300 Comparison with a scalar, using either the operator or method: >>> df == 100 cost revenue A False True B False False C True False >>> df.eq(100) cost revenue A False True B False False C True False When `other` is a :class:`Series`, the columns of a DataFrame are aligned with the index of `other` and broadcast: >>> df != pd.Series([100, 250], index=["cost", "revenue"]) cost revenue A True True B True False C False True Use the method to control the broadcast axis: >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index') cost revenue A True False B True True C True True D True True When comparing to an arbitrary sequence, the number of columns must match the number elements in `other`: >>> df == [250, 100] cost revenue A True True B False False C False False Use the method to control the axis: >>> df.eq([250, 250, 100], axis='index') cost revenue A True False B False True C True False Compare to a DataFrame of different shape. >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, ... index=['A', 'B', 'C', 'D']) >>> other revenue A 300 B 250 C 100 D 150 >>> df.gt(other) cost revenue A False False B False False C False True D False False Compare to a MultiIndex by level. >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220], ... 'revenue': [100, 250, 300, 200, 175, 225]}, ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ... ['A', 'B', 'C', 'A', 'B', 'C']]) >>> df_multindex cost revenue Q1 A 250 100 B 150 250 C 100 300 Q2 A 150 200 B 300 175 C 220 225 >>> df.le(df_multindex, level=1) cost revenue Q1 A True True B True True C True True Q2 A False True B True False C True False
-
mad
(**kwargs)¶
-
mask
(cond, **kwargs)¶
-
melt
(**kwargs)¶
-
mod
(**kwargs)¶
-
mul
(**kwargs)¶
-
multiply
(**kwargs)¶
-
ndim
¶
-
ne
(**kwargs)¶ Get Not equal to of dataframe and other, element-wise (binary operator ne).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
Parameters: - other (scalar, sequence, DeferredSeries, or DeferredDataFrame) – Any single or multiple element data structure, or list-like object.
- axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
Returns: Result of the comparison.
Return type: DeferredDataFrame of bool
Differences from pandas
This operation has no known divergences from the pandas API.
See also
DeferredDataFrame.eq()
- Compare DeferredDataFrames for equality elementwise.
DeferredDataFrame.ne()
- Compare DeferredDataFrames for inequality elementwise.
DeferredDataFrame.le()
- Compare DeferredDataFrames for less than inequality or equality elementwise.
DeferredDataFrame.lt()
- Compare DeferredDataFrames for strictly less than inequality elementwise.
DeferredDataFrame.ge()
- Compare DeferredDataFrames for greater than inequality or equality elementwise.
DeferredDataFrame.gt()
- Compare DeferredDataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API.
>>> df = pd.DataFrame({'cost': [250, 150, 100], ... 'revenue': [100, 250, 300]}, ... index=['A', 'B', 'C']) >>> df cost revenue A 250 100 B 150 250 C 100 300 Comparison with a scalar, using either the operator or method: >>> df == 100 cost revenue A False True B False False C True False >>> df.eq(100) cost revenue A False True B False False C True False When `other` is a :class:`Series`, the columns of a DataFrame are aligned with the index of `other` and broadcast: >>> df != pd.Series([100, 250], index=["cost", "revenue"]) cost revenue A True True B True False C False True Use the method to control the broadcast axis: >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index') cost revenue A True False B True True C True True D True True When comparing to an arbitrary sequence, the number of columns must match the number elements in `other`: >>> df == [250, 100] cost revenue A True True B False False C False False Use the method to control the axis: >>> df.eq([250, 250, 100], axis='index') cost revenue A True False B False True C True False Compare to a DataFrame of different shape. >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, ... index=['A', 'B', 'C', 'D']) >>> other revenue A 300 B 250 C 100 D 150 >>> df.gt(other) cost revenue A False False B False False C False True D False False Compare to a MultiIndex by level. >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220], ... 'revenue': [100, 250, 300, 200, 175, 225]}, ... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ... ['A', 'B', 'C', 'A', 'B', 'C']]) >>> df_multindex cost revenue Q1 A 250 100 B 150 250 C 100 300 Q2 A 150 200 B 300 175 C 220 225 >>> df.le(df_multindex, level=1) cost revenue Q1 A True True B True True C True True Q2 A False True B True False C True False
-
pad
(**kwargs)¶
-
pct_change
(**kwargs)¶
-
pipe
(**kwargs)¶
-
pivot
(**kwargs)¶
-
pivot_table
(**kwargs)¶
-
pow
(**kwargs)¶
-
radd
(**kwargs)¶
-
rank
(**kwargs)¶
-
rdiv
(**kwargs)¶
-
reindex
(**kwargs)¶
-
reindex_like
(**kwargs)¶
-
reorder_levels
(**kwargs)¶ Rearrange index levels using input order. May not drop or duplicate levels.
Parameters: - order (list of int or list of str) – List representing new level order. Reference level by number (position) or by key (label).
- axis ({0 or 'index', 1 or 'columns'}, default 0) – Where to reorder levels.
Returns: Return type: Differences from pandas
This operation has no known divergences from the pandas API.
-
resample
(**kwargs)¶
-
rfloordiv
(**kwargs)¶
-
rmod
(**kwargs)¶
-
rmul
(**kwargs)¶
-
rolling
(**kwargs)¶
-
rpow
(**kwargs)¶
-
rsub
(**kwargs)¶
-
rtruediv
(**kwargs)¶
-
sample
(**kwargs)¶
-
sem
(**kwargs)¶
-
set_axis
(**kwargs)¶
-
set_flags
(**kwargs)¶
-
size
¶
-
skew
(**kwargs)¶
-
slice_shift
(**kwargs)¶
-
sort_index
(axis, **kwargs)¶ Sort object by labels (along an axis).
Returns a new DataFrame sorted by label if inplace argument is
False
, otherwise updates the original DataFrame and returns None.Parameters: - axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
- level (int or level name or list of ints or list of level names) – If not None, sort on values in specified index level(s).
- ascending (bool or list-like of bools, default True) – Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
- inplace (bool, default False) – If True, perform operation in-place.
- kind ({'quicksort', 'mergesort', 'heapsort'}, default 'quicksort') – Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DeferredDataFrames, this option is only applied when sorting on a single column or label.
- na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
- sort_remaining (bool, default True) – If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
- ignore_index (bool, default False) –
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
- key (callable, optional) –
If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect anIndex
and return anIndex
of the same shape. For MultiIndex inputs, the key is applied per level.New in version 1.1.0.
Returns: The original DeferredDataFrame sorted by the labels or None if
inplace=True
.Return type: Differences from pandas
axis=index
is not allowed because it imposes an ordering on the dataset, and we cannot guarantee it will be maintained (see https://s.apache.org/dataframe-order-sensitive-operations). Onlyaxis=columns
is allowed.See also
DeferredSeries.sort_index()
- Sort DeferredSeries by the index.
DeferredDataFrame.sort_values()
- Sort DeferredDataFrame by the value.
DeferredSeries.sort_values()
- Sort DeferredSeries by the value.
Examples
NOTE: These examples are pulled directly from the pandas documentation for convenience. Usage of the Beam DataFrame API will look different because it is a deferred API. In addition, some arguments shown here may not be supported, see ‘Differences from pandas’ for details.
>>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], ... columns=['A']) >>> df.sort_index() A 1 4 29 2 100 1 150 5 234 3 By default, it sorts in ascending order, to sort in descending order, use ``ascending=False`` >>> df.sort_index(ascending=False) A 234 3 150 5 100 1 29 2 1 4 A key function can be specified which is applied to the index before sorting. For a ``MultiIndex`` this is applied to each level separately. >>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd']) >>> df.sort_index(key=lambda x: x.str.lower()) a A 1 b 2 C 3 d 4
-
sort_values
(axis, **kwargs)¶ sort_values
is not implemented.It is not implemented for
axis=index
because it imposes an ordering on the dataset, and we cannot guarantee it will be maintained (see https://s.apache.org/dataframe-order-sensitive-operations).It is not implemented for
axis=columns
because it makes the order of the columns depend on the data (see https://s.apache.org/dataframe-non-deferred-column-names).
-
sparse
¶
-
squeeze
(**kwargs)¶
-
std
(**kwargs)¶
-
style
¶
-
sub
(**kwargs)¶
-
subtract
(**kwargs)¶
-
swapaxes
(**kwargs)¶
-
swaplevel
(**kwargs)¶
-
to_clipboard
(**kwargs)¶
-
to_csv
(path, *args, **kwargs)¶
-
to_excel
(path, *args, **kwargs)¶
-
to_feather
(path, *args, **kwargs)¶
-
to_gbq
(**kwargs)¶
-
to_hdf
(**kwargs)¶ pandas.DataFrame.to_hdf is not supported in the Beam DataFrame API because HDF5 is a random access file format.
-
to_html
(path, *args, **kwargs)¶
-
to_json
(path, orient=None, *args, **kwargs)¶
-
to_latex
(**kwargs)¶
-
to_markdown
(**kwargs)¶
-
to_msgpack
(**kwargs)¶ pandas.DataFrame.to_msgpack is not supported in the Beam DataFrame API because it is deprecated in pandas.
-
to_parquet
(path, *args, **kwargs)¶
-
to_period
(**kwargs)¶
-
to_pickle
(**kwargs)¶
-
to_sql
(**kwargs)¶
-
to_stata
(path, *args, **kwargs)¶
-
to_timestamp
(**kwargs)¶
-
to_xarray
(**kwargs)¶
-
transform
(**kwargs)¶
-
truediv
(**kwargs)¶
-
truncate
(**kwargs)¶
-
tshift
(**kwargs)¶
-
tz_convert
(**kwargs)¶
-
tz_localize
(ambiguous, **kwargs)¶
-
value_counts
(**kwargs)¶
-
var
(**kwargs)¶
-
where
(cond, other, errors, **kwargs)¶
-
classmethod
wrap
(expr, split_tuples=True)¶
-
xs
(**kwargs)¶
-