rwskit.benchmarking
===================

.. py:module:: rwskit.benchmarking

.. autoapi-nested-parse::

   Benchmarking tools.


Attributes
----------

.. autoapisummary::

   rwskit.benchmarking.TimeUnit
   rwskit.benchmarking.AggregationFunctionName
   rwskit.benchmarking.AggregationFunction
   rwskit.benchmarking.BenchmarkSortValue


Classes
-------

.. autoapisummary::

   rwskit.benchmarking.BenchmarkRunner
   rwskit.benchmarking.BenchmarkResult


Functions
---------

.. autoapisummary::

   rwskit.benchmarking.get_time_unit_abbreviation
   rwskit.benchmarking.change_time_unit
   rwskit.benchmarking.validate_call_signature


Module Contents
---------------

.. py:data:: TimeUnit

   The supported units of time.

.. py:data:: AggregationFunctionName

   The names of the supported aggregation functions.

.. py:data:: AggregationFunction

   An aggregation function is a callable that takes an array-like object and
   returns a single float.

.. py:data:: BenchmarkSortValue

   The supported values for sorting the ``BenchmarkResults`` when represented
   as a string.

.. py:function:: get_time_unit_abbreviation(name: TimeUnit) -> TimeUnit

   Get the time unit abbreviation from the given string.

   :param name: The name or abbreviation of a supported time unit.
   :type name: str

   :returns: The abbreviation of the time unit specified by ``name``.
   :rtype: str

   :raises icontract.errors.ViolationError: If the ``name`` is not a supported :data:`TimeUnit`.


.. py:function:: change_time_unit(value: int | float, from_unit: TimeUnit, to_unit: TimeUnit) -> float

   Change the unit of time of a given ``value`` currently in the ``from_unit``
   unit to a value in the ``to_unit`` unit.

   :param value: The current time value.
   :type value: int or float
   :param from_unit: The unit of the current value.
   :type from_unit: TimeUnit
   :param to_unit: The unit to change the value into.
   :type to_unit: TimeUnit

   :returns: Return the equivalent value in the new time unit ``to_unit``
   :rtype: float


.. py:function:: validate_call_signature(fn1: Callable[Ellipsis, Any], fn2: Callable[Ellipsis, Any], strict: bool = False) -> bool

   Check that the two functions take the same parameters.

   :param fn1: The first function to compare.
   :type fn1: Callable[..., Any]
   :param fn2: The second function to compare.
   :type fn2: Callable[..., Any]
   :param strict: If ``True`` the signatures must match exactly, including whether
                  defaults are present and their values. Otherwise, they are considered
                  equal if the number and types of all parameters are the same.
   :type strict: bool, default = False

   :returns: True if the functions take the same number and type of parameters.
   :rtype: bool


.. py:class:: BenchmarkRunner(functions: Iterable[Callable] | dict[str, Callable], benchmark_space: dict[str, list[T]], setup_fn: Optional[Callable] = None, use_single_setup: bool = True, n_runs: int = 10, n_tests: int = 2, n_warm_ups: int = 1, time_unit: TimeUnit = cast(TimeUnit, 's'), test_agg_fn: AggregationFunctionName = 'min', run_label: str = 'run', show_progress: bool = False, verbose: bool = True, float_fmt: str = '0.4e', sort_by: BenchmarkSortValue = 'min', test_significance: bool = True)

   A class for profiling a set of functions based on one or mor criteria.

   The high level view of the benchmarking process is as follows.
   For every combination of parameters in the ``benchmark_space`` a
   sub-benchmark will be run.
   There are 2 nested execution loops for each sub-benchmark.
   The innermost loop runs each function on the current data ``n_tests``
   number of times and aggregates the results using the ``test_agg_fn``.
   The same data is always used for this loop no matter what. The outer
   loop will run this process ``n_runs`` times. If ``use_single_setup``
   is ``True`` then the setup function will only be called once and
   will be used for all the runs. If it is ``False`` the setup
   function will be called for every run. The execution times for all
   runs will be stored in a Pandas DataFrame that can be retrieved after
   calling the benchmark.

   .. note::
       ``min``, ``max``, ``mean``, ``std`` cannot be used as keys in
       the ``benchmark_space``.

   .. note::
       ``functions`` must be an iterable, but cannot be a generator.

   .. note::
       The first parameter in the ``benchmark_space`` is always used
       as the x-axis for :meth:`BenchmarkResult.plot`.

   :param functions: A list of functions to benchmark or a dictionary that maps a label
                     to a benchmark function.
   :type functions: list[BenchmarkFunction]
   :param benchmark_space: The space of values to benchmark over. A benchmark will be
                           executed for each combination of values obtained from the
                           dictionary. The combinations are formed by taking the Cartesian
                           product taking one value from each list in the dictionary.
                           The names of the keys of this dictionary must either be the names
                           of keyword arguments of the ``setup_fn``, or keyword arguments of
                           the benchmark functions if no ``setup_fn`` is provided. Only
                           ``bool``, ``int``, ``float``, and ``str`` values are supported.
   :type benchmark_space: dict[str, list[T]]
   :param setup_fn: A function that initializes data to be passed to the benchmark
                    ``functions``. If ``None``, the values from ``setup_args`` will
                    be passed directly to each function in ``functions``.
   :type setup_fn: SetupFunction
   :param use_single_setup: For functions that are guaranteed to be deterministic no matter
                            what the input is, this should be ``True``. However, if the
                            function is non-deterministic or the performance might depend
                            on how the data is initialized, this should be ``False``.
   :type use_single_setup: bool, default = True
   :param n_runs: The number of execution tests to run.
   :type n_runs: int
   :param n_tests: The number times to run each function in a single test.
   :type n_tests: int
   :param n_warm_ups: The number of tests to run before recording the timing data.
   :type n_warm_ups: int
   :param test_agg_fn: The function to use for aggregating individual test results within
                       a run.
   :type test_agg_fn: {'min', 'max', 'mean', 'median', 'sum'}
   :param run_label: The column label in the resulting Pandas ``DataFrame`` that
                     indicates the run number for the given execution times.
   :type run_label: str
   :param show_progress: Show progress bars while running the benchmark.
   :type show_progress: bool = False
   :param verbose: Print the full results and summary statistics to ``stdout``
                   when complete.
   :type verbose: bool, default = True
   :param float_fmt: The format used to print floating point values to a string.
   :type float_fmt: str
   :param sort_by: When ``verbose=True`` this will determine how the results are
                   sorted (either by the min run time, max run time or by the
                   function name).
   :type sort_by: str {min, mean, function}
   :param test_significance: If ``True``, test if the difference in run times are different
                             between all pairs of models.
   :type test_significance: bool, default = False

   .. rubric:: Notes

   **Deterministic Function and Deterministic Data**

   If your algorithm is deterministic and is not influenced at all
   by the content of the data, only its size, then I would suggest the
   following parameters:

   * ``use_single_setup = True``: Use the same data for all the runs
     on the current setup parameters.
   * ``n_runs > 1``: Run it at least a few times per parameter set
     to make sure there weren't any anomalies biasing the results.
   * ``n_tests = 1``: You should not need to run multiple tests here.

   **Deterministic Function and Non-Deterministic Data**

   If your function is deterministic (the sequence of execution is always
   the same), but could be influenced by the content of the data I would
   suggest the following parameters:

   * ``setup_fn != None``: The setup function should return different
     data each run (of the same size)
   * ``use_single_setup = False``: Run the setup function to generate
     new data on each run.
   * ``n_tests > 1``: Run the function on the same data a few times
     in case there was an anomaly, which could bias the result.
   * ``n_runs > 1``: Run the function on multiple different data
     sets to estimate how much variability is expected due to the
     makeup of the data.
   * ``test_agg_fn = 'min'``: Since the function should execute
     the same way on the same data, the `min` should be the most
     informative.

   **Non-Deterministic Function**

   If the function itself is non-deterministic you probably want something
   similar to the deterministic case with non-deterministic data. In
   this case however, it is probably pointless to set ``n_tests > 1``
   and you should just increase ``n_runs`` to get better overall estimates.

   .. rubric:: Examples

   .. code-block:: python

       >>> import time
       >>> sort_setup_fn = (
       ...    lambda array_size, dtype, unique_values:
       ...        np.random.randint(unique_values, size=array_size).astype(dtype)
       ... )

       >>> b = BenchmarkRunner(functions={"fn1": lambda a: time.sleep(0.01),
       ...                                "fn2": lambda a: time.sleep(0.02)},
       ...                     benchmark_space={"array_size": [100, 10000],
       ...                                      "dtype": ["U", "int"],
       ...                                      "unique_values": [10, 100, 1000]},
       ...                     setup_fn=sort_setup_fn
       ...                     time_unit="ms"
       ...                     float_fmt="0.3f")

       >>> b()
       function  array_size  unique_values    min   mean    std
       --------------------------------------------------------
            fn1         100             10  1.040  1.053  0.008
            fn2         100             10  5.057  5.058  0.001
       --------------------------------------------------------
            fn1         100            100  1.053  1.056  0.002
            fn2         100            100  5.057  5.058  0.001
       --------------------------------------------------------
            fn1         100           1000  1.056  1.057  0.000
            fn2         100           1000  5.058  5.058  0.000
       --------------------------------------------------------
            fn1       10000             10  1.056  1.057  0.000
            fn2       10000             10  5.058  5.059  0.000
       --------------------------------------------------------
            fn1       10000            100  1.056  1.057  0.000
            fn2       10000            100  5.058  5.064  0.009
       --------------------------------------------------------
            fn1       10000           1000  1.056  1.057  0.001
            fn2       10000           1000  5.063  5.066  0.002


   .. py:attribute:: functions


   .. py:attribute:: benchmark_space


   .. py:attribute:: setup_fn


   .. py:attribute:: use_single_setup
      :value: True


   .. py:attribute:: n_runs
      :value: 10


   .. py:attribute:: n_tests
      :value: 2


   .. py:attribute:: n_warm_ups
      :value: 1


   .. py:attribute:: test_agg_fn


   .. py:attribute:: run_label
      :value: 'run'


   .. py:attribute:: show_progress
      :value: False


   .. py:attribute:: verbose
      :value: True


   .. py:attribute:: time_unit
      :type:  TimeUnit


   .. py:attribute:: float_fmt
      :value: '0.4e'


   .. py:attribute:: sort_by
      :value: 'min'


   .. py:attribute:: test_significance
      :value: True


   .. py:method:: __call__() -> BenchmarkResult

      Runs the benchmark.

      :rtype: BenchmarkResult


   .. py:method:: run() -> BenchmarkResult

      Runs the benchmark.

      :rtype: BenchmarkResult


.. py:class:: BenchmarkResult(results: pandas.DataFrame, significance_results: Optional[pandas.DataFrame], benchmark_space: dict[str, list[T]], float_fmt: str = '0.4e', sort_by: BenchmarkSortValue = 'min', run_label: str = 'run', time_unit: TimeUnit = 's')

   A class for managing the results output by a :class:`BenchmarkRunner`.

   .. note::
       This class is not intended to be instantiated directly.

   :param results: The results DataFrame obtained by a :class:`BenchmarkRunner`.
   :type results: DataFrame
   :param significance_results: Pairwise t-test results.
   :type significance_results: DataFrame
   :param benchmark_space: The parameters used to benchmark the functions.
   :type benchmark_space: dict[string, list[T]]
   :param float_fmt: A valid format string to use for floating point numbers.
   :type float_fmt: str
   :param sort_by: The summary statistic to sort the results by when represented
                   as a string.
   :type sort_by: str {min, mean}
   :param run_label: The label used to indicate the run number.
   :type run_label: str
   :param time_unit: The original time unit used to benchmark the results.
   :type time_unit: TimeUnit


   .. py:method:: __repr__() -> str

      The full pandas :class:`pandas.DataFrame` containing all the runs as a string.

      :returns: The full benchmark results as a string.
      :rtype: str


   .. py:method:: __str__() -> str

      Returns a table of the summary statistics of the benchmark results as a string.

      :returns: The summary statistics of the benchmark results as a string.
      :rtype: str


   .. py:property:: benchmark_space
      :type: dict[str, list[T]]


      Return the parameters used to benchmark the functions.

      :returns: The parameters used to produce these results.
      :rtype: dict[str, list[T]]


   .. py:method:: results(wide: bool = True, as_time_unit: Optional[TimeUnit] = None) -> pandas.DataFrame

      Get a pandas ``DataFrame`` containing the results.

      :param as_time_unit: Return the results in this time unit instead of the one used
                           during the benchmark.
      :type as_time_unit: TimeUnit, optional
      :param wide: If ``True`` return the results in the default wide format, which
                   is easier to read. Otherwise, return the results in long format,
                   which can be easier to use for plotting.
      :type wide: bool, default = True

      :returns: The DataFrame containing the results, either in wide or long format.
      :rtype: DataFrame


   .. py:property:: significance_results
      :type: pandas.DataFrame


      Return the results of the significance tests as a :class:`pandas.DataFrame`.

      :returns: The results of the significance tests as a :class:`pandas.DataFrame`.
      :rtype: DataFrame


   .. py:method:: summary(wide: bool = True, as_time_unit: Optional[TimeUnit] = None) -> pandas.DataFrame

      The summary statistics of the benchmark results.

      :param wide: If ``True``, return the results in wide format, otherwise, return
                   the results in long format.
      :type wide: bool, default = True
      :param as_time_unit: Return the results in this time unit instead of the one used during
                           the benchmark.
      :type as_time_unit: TimeUnit, optional

      :returns: The DataFrame with the summary statistics.
      :rtype: pd.DataFrame


   .. py:method:: plot(x_label: str = None, use_stat: Literal['min', 'mean'] = 'min', functions: Iterable[str] = None, show_points: bool = False, show_ribbon: bool = False, free_y: bool = False, theme_name: PlotTheme = None, figure_size: Optional[tuple[int, int]] = None, as_time_unit: Optional[TimeUnit] = None) -> plotnine.ggplot

      Create and return a `ggplot <https://plotnine.org/reference/ggplot.html#plotnine.ggplot>`__
      object that visualizes the benchmark results.

      .. note::
          Use ``show` or ``save``` on the resulting object to render or save it.

      :param x_label: Label the `x-axis` with this value, if given. Otherwise, the first
                      key in the ``benchmark_space`` will be used.
      :type x_label: str, optional
      :param use_stat: Which stat to plot.
      :type use_stat: str {min, mean}
      :param functions: If defined, limit the plot to these functions.
      :type functions: Iterable[str], optional
      :param show_points: Show each individual run as a point on the graph.
      :type show_points: bool, default = False
      :param show_ribbon: If ``True``, include a ribbon that encompasses the minimum and
                          maximum value over all the runs in a group.
      :type show_ribbon: bool, default = False
      :param free_y: If ``True`` and the ``benchmark_space`` includes more than 1
                     parameter, the y-axis is not constrained to be the same for each
                     resulting plot.
      :type free_y: bool, default = False
      :param theme_name: The name of a theme to style your plot with, otherwise, it will
                         use the default theme.
      :type theme_name: PlotTheme, optional
      :param figure_size: Override the size of the resulting figure. If not specified and
                          there are more than one ``benchmark_space`` parameters a heuristic
                          is used to try to ensure each subgraph will be legible.
      :type figure_size: tuple[int, int], optional,
      :param as_time_unit: Display the results in this time unit instead of the one used
                           during the benchmarking.
      :type as_time_unit: TimeUnit, optional

      :returns: The plot object.
      :rtype: plotnine.ggplot