rwskit.pandas
Utilities for working with pandas.
Attributes
Functions
|
Converts columns containing lists into (new) individual columns in the |
Module Contents
- rwskit.pandas.flatten_data_frame(df: pandas.DataFrame, string_fill: str = '[UNK]', in_place: bool = False) pandas.DataFrame[source]
Converts columns containing lists into (new) individual columns in the
DataFrame.If one or more columns in a DataFrame consist of lists, this method will remove the original column and replace it with
Ncolumns, whereNis the maximum length of the lists in the original column.If the lists are of unequal length, the additional columns will be appended to the right. Lists of strings will be padded using the given
string_fillvalue. All others will be padded withnp.nan. Note, most numpy types will convertnp.naninto an appropriate missing value for that type. For example, when used to fillnp.datetime64objects, the resulting object will benp.datetime64('NaT').If the lists are numeric (including boolean) and they do not have equal lengths, the new columns will have
dtype=np.float64regardless of the original dtype.Note
Nested lists within a column are not supported and will not be flattened.
- Parameters:
df (pandas.DataFrame) – The input DataFrame to flatten.
string_fill (any, defualt = '[UNK]') – Use this value to pad string lists. All other data types will use
np.nanin_place (bool, default = False) – Whether to modify the DataFrame in place or return a copy.
- Returns:
df – The modified DataFrame
- Return type:
Examples
>>>input_df = pd.DataFrame({ "A": [["1"], ["2", "3"]], "B": [["4", "5"], ["6", "7", "8"]], "C": [[1], [2, 3]], "D": [True, False] }) >>>print(input_df) A B C D 0 [1] [4, 5] [1] True 1 [2, 3] [6, 7, 8] [2, 3] False >>>flatten_data_frame(input_df) A__0 A__1 B__0 B__1 B__2 C__0 C__1 D 0 1 [UNK] 4 5 [UNK] 1.0 NaN True 1 2 3 6 7 8 2.0 3.0 False