rwskit.pandas
=============

.. py:module:: rwskit.pandas

.. autoapi-nested-parse::

   Utilities for working with pandas.


Attributes
----------

.. autoapisummary::

   rwskit.pandas.log


Functions
---------

.. autoapisummary::

   rwskit.pandas.flatten_data_frame


Module Contents
---------------

.. py:data:: log

.. py:function:: flatten_data_frame(df: pandas.DataFrame, string_fill: str = '[UNK]', in_place: bool = False) -> pandas.DataFrame

   Converts columns containing lists into (new) individual columns in the ``DataFrame``.

   If one or more columns in a DataFrame consist of lists, this method will
   remove the original column and replace it with ``N`` columns, where
   ``N`` is the maximum length of the lists in the original column.

   If the lists are of unequal length, the additional columns will be appended
   to the right. Lists of strings will be padded using the given
   ``string_fill`` value. All others will be padded with ``np.nan``. Note,
   most numpy types will convert ``np.nan`` into an appropriate missing
   value for that type. For example, when used to fill ``np.datetime64``
   objects, the resulting object will be ``np.datetime64('NaT')``.

   If the lists are numeric (including boolean) and they do not have equal
   lengths, the new columns will have ``dtype=np.float64`` regardless of
   the original dtype.

   .. note::
       Nested lists within a column are not supported and will not be
       flattened.

   :param df: The input DataFrame to flatten.
   :type df: pandas.DataFrame
   :param string_fill: Use this value to pad string lists. All other data types will use
                       ``np.nan``
   :type string_fill: any, defualt = '[UNK]'
   :param in_place: Whether to modify the DataFrame in place or return a copy.
   :type in_place: bool, default = False

   :returns: **df** -- The modified DataFrame
   :rtype: pandas.DataFrame

   .. rubric:: Examples

   .. code-block:: python

       >>>input_df = pd.DataFrame({
           "A": [["1"], ["2", "3"]],
           "B": [["4", "5"], ["6", "7", "8"]],
           "C": [[1], [2, 3]],
           "D": [True, False]
       })
       >>>print(input_df)
               A          B       C      D
       0     [1]     [4, 5]     [1]   True
       1  [2, 3]  [6, 7, 8]  [2, 3]  False

       >>>flatten_data_frame(input_df)
         A__0   A__1 B__0 B__1   B__2  C__0  C__1      D
       0    1  [UNK]    4    5  [UNK]   1.0   NaN   True
       1    2      3    6    7      8   2.0   3.0  False