rwskit.benchmarking =================== .. py:module:: rwskit.benchmarking .. autoapi-nested-parse:: Benchmarking tools. Attributes ---------- .. autoapisummary:: rwskit.benchmarking.TimeUnit rwskit.benchmarking.AggregationFunctionName rwskit.benchmarking.AggregationFunction rwskit.benchmarking.BenchmarkSortValue Classes ------- .. autoapisummary:: rwskit.benchmarking.BenchmarkRunner rwskit.benchmarking.BenchmarkResult Functions --------- .. autoapisummary:: rwskit.benchmarking.get_time_unit_abbreviation rwskit.benchmarking.change_time_unit rwskit.benchmarking.validate_call_signature Module Contents --------------- .. py:data:: TimeUnit The supported units of time. .. py:data:: AggregationFunctionName The names of the supported aggregation functions. .. py:data:: AggregationFunction An aggregation function is a callable that takes an array-like object and returns a single float. .. py:data:: BenchmarkSortValue The supported values for sorting the ``BenchmarkResults`` when represented as a string. .. py:function:: get_time_unit_abbreviation(name: TimeUnit) -> TimeUnit Get the time unit abbreviation from the given string. :param name: The name or abbreviation of a supported time unit. :type name: str :returns: The abbreviation of the time unit specified by ``name``. :rtype: str :raises icontract.errors.ViolationError: If the ``name`` is not a supported :data:`TimeUnit`. .. py:function:: change_time_unit(value: int | float, from_unit: TimeUnit, to_unit: TimeUnit) -> float Change the unit of time of a given ``value`` currently in the ``from_unit`` unit to a value in the ``to_unit`` unit. :param value: The current time value. :type value: int or float :param from_unit: The unit of the current value. :type from_unit: TimeUnit :param to_unit: The unit to change the value into. :type to_unit: TimeUnit :returns: Return the equivalent value in the new time unit ``to_unit`` :rtype: float .. py:function:: validate_call_signature(fn1: Callable[Ellipsis, Any], fn2: Callable[Ellipsis, Any], strict: bool = False) -> bool Check that the two functions take the same parameters. :param fn1: The first function to compare. :type fn1: Callable[..., Any] :param fn2: The second function to compare. :type fn2: Callable[..., Any] :param strict: If ``True`` the signatures must match exactly, including whether defaults are present and their values. Otherwise, they are considered equal if the number and types of all parameters are the same. :type strict: bool, default = False :returns: True if the functions take the same number and type of parameters. :rtype: bool .. py:class:: BenchmarkRunner(functions: Iterable[Callable] | dict[str, Callable], benchmark_space: dict[str, list[T]], setup_fn: Optional[Callable] = None, use_single_setup: bool = True, n_runs: int = 10, n_tests: int = 2, n_warm_ups: int = 1, time_unit: TimeUnit = cast(TimeUnit, 's'), test_agg_fn: AggregationFunctionName = 'min', run_label: str = 'run', show_progress: bool = False, verbose: bool = True, float_fmt: str = '0.4e', sort_by: BenchmarkSortValue = 'min', test_significance: bool = True) A class for profiling a set of functions based on one or mor criteria. The high level view of the benchmarking process is as follows. For every combination of parameters in the ``benchmark_space`` a sub-benchmark will be run. There are 2 nested execution loops for each sub-benchmark. The innermost loop runs each function on the current data ``n_tests`` number of times and aggregates the results using the ``test_agg_fn``. The same data is always used for this loop no matter what. The outer loop will run this process ``n_runs`` times. If ``use_single_setup`` is ``True`` then the setup function will only be called once and will be used for all the runs. If it is ``False`` the setup function will be called for every run. The execution times for all runs will be stored in a Pandas DataFrame that can be retrieved after calling the benchmark. .. note:: ``min``, ``max``, ``mean``, ``std`` cannot be used as keys in the ``benchmark_space``. .. note:: ``functions`` must be an iterable, but cannot be a generator. .. note:: The first parameter in the ``benchmark_space`` is always used as the x-axis for :meth:`BenchmarkResult.plot`. :param functions: A list of functions to benchmark or a dictionary that maps a label to a benchmark function. :type functions: list[BenchmarkFunction] :param benchmark_space: The space of values to benchmark over. A benchmark will be executed for each combination of values obtained from the dictionary. The combinations are formed by taking the Cartesian product taking one value from each list in the dictionary. The names of the keys of this dictionary must either be the names of keyword arguments of the ``setup_fn``, or keyword arguments of the benchmark functions if no ``setup_fn`` is provided. Only ``bool``, ``int``, ``float``, and ``str`` values are supported. :type benchmark_space: dict[str, list[T]] :param setup_fn: A function that initializes data to be passed to the benchmark ``functions``. If ``None``, the values from ``setup_args`` will be passed directly to each function in ``functions``. :type setup_fn: SetupFunction :param use_single_setup: For functions that are guaranteed to be deterministic no matter what the input is, this should be ``True``. However, if the function is non-deterministic or the performance might depend on how the data is initialized, this should be ``False``. :type use_single_setup: bool, default = True :param n_runs: The number of execution tests to run. :type n_runs: int :param n_tests: The number times to run each function in a single test. :type n_tests: int :param n_warm_ups: The number of tests to run before recording the timing data. :type n_warm_ups: int :param test_agg_fn: The function to use for aggregating individual test results within a run. :type test_agg_fn: {'min', 'max', 'mean', 'median', 'sum'} :param run_label: The column label in the resulting Pandas ``DataFrame`` that indicates the run number for the given execution times. :type run_label: str :param show_progress: Show progress bars while running the benchmark. :type show_progress: bool = False :param verbose: Print the full results and summary statistics to ``stdout`` when complete. :type verbose: bool, default = True :param float_fmt: The format used to print floating point values to a string. :type float_fmt: str :param sort_by: When ``verbose=True`` this will determine how the results are sorted (either by the min run time, max run time or by the function name). :type sort_by: str {min, mean, function} :param test_significance: If ``True``, test if the difference in run times are different between all pairs of models. :type test_significance: bool, default = False .. rubric:: Notes **Deterministic Function and Deterministic Data** If your algorithm is deterministic and is not influenced at all by the content of the data, only its size, then I would suggest the following parameters: * ``use_single_setup = True``: Use the same data for all the runs on the current setup parameters. * ``n_runs > 1``: Run it at least a few times per parameter set to make sure there weren't any anomalies biasing the results. * ``n_tests = 1``: You should not need to run multiple tests here. **Deterministic Function and Non-Deterministic Data** If your function is deterministic (the sequence of execution is always the same), but could be influenced by the content of the data I would suggest the following parameters: * ``setup_fn != None``: The setup function should return different data each run (of the same size) * ``use_single_setup = False``: Run the setup function to generate new data on each run. * ``n_tests > 1``: Run the function on the same data a few times in case there was an anomaly, which could bias the result. * ``n_runs > 1``: Run the function on multiple different data sets to estimate how much variability is expected due to the makeup of the data. * ``test_agg_fn = 'min'``: Since the function should execute the same way on the same data, the `min` should be the most informative. **Non-Deterministic Function** If the function itself is non-deterministic you probably want something similar to the deterministic case with non-deterministic data. In this case however, it is probably pointless to set ``n_tests > 1`` and you should just increase ``n_runs`` to get better overall estimates. .. rubric:: Examples .. code-block:: python >>> import time >>> sort_setup_fn = ( ... lambda array_size, dtype, unique_values: ... np.random.randint(unique_values, size=array_size).astype(dtype) ... ) >>> b = BenchmarkRunner(functions={"fn1": lambda a: time.sleep(0.01), ... "fn2": lambda a: time.sleep(0.02)}, ... benchmark_space={"array_size": [100, 10000], ... "dtype": ["U", "int"], ... "unique_values": [10, 100, 1000]}, ... setup_fn=sort_setup_fn ... time_unit="ms" ... float_fmt="0.3f") >>> b() function array_size unique_values min mean std -------------------------------------------------------- fn1 100 10 1.040 1.053 0.008 fn2 100 10 5.057 5.058 0.001 -------------------------------------------------------- fn1 100 100 1.053 1.056 0.002 fn2 100 100 5.057 5.058 0.001 -------------------------------------------------------- fn1 100 1000 1.056 1.057 0.000 fn2 100 1000 5.058 5.058 0.000 -------------------------------------------------------- fn1 10000 10 1.056 1.057 0.000 fn2 10000 10 5.058 5.059 0.000 -------------------------------------------------------- fn1 10000 100 1.056 1.057 0.000 fn2 10000 100 5.058 5.064 0.009 -------------------------------------------------------- fn1 10000 1000 1.056 1.057 0.001 fn2 10000 1000 5.063 5.066 0.002 .. py:attribute:: functions .. py:attribute:: benchmark_space .. py:attribute:: setup_fn .. py:attribute:: use_single_setup :value: True .. py:attribute:: n_runs :value: 10 .. py:attribute:: n_tests :value: 2 .. py:attribute:: n_warm_ups :value: 1 .. py:attribute:: test_agg_fn .. py:attribute:: run_label :value: 'run' .. py:attribute:: show_progress :value: False .. py:attribute:: verbose :value: True .. py:attribute:: time_unit :type: TimeUnit .. py:attribute:: float_fmt :value: '0.4e' .. py:attribute:: sort_by :value: 'min' .. py:attribute:: test_significance :value: True .. py:method:: __call__() -> BenchmarkResult Runs the benchmark. :rtype: BenchmarkResult .. py:method:: run() -> BenchmarkResult Runs the benchmark. :rtype: BenchmarkResult .. py:class:: BenchmarkResult(results: pandas.DataFrame, significance_results: Optional[pandas.DataFrame], benchmark_space: dict[str, list[T]], float_fmt: str = '0.4e', sort_by: BenchmarkSortValue = 'min', run_label: str = 'run', time_unit: TimeUnit = 's') A class for managing the results output by a :class:`BenchmarkRunner`. .. note:: This class is not intended to be instantiated directly. :param results: The results DataFrame obtained by a :class:`BenchmarkRunner`. :type results: DataFrame :param significance_results: Pairwise t-test results. :type significance_results: DataFrame :param benchmark_space: The parameters used to benchmark the functions. :type benchmark_space: dict[string, list[T]] :param float_fmt: A valid format string to use for floating point numbers. :type float_fmt: str :param sort_by: The summary statistic to sort the results by when represented as a string. :type sort_by: str {min, mean} :param run_label: The label used to indicate the run number. :type run_label: str :param time_unit: The original time unit used to benchmark the results. :type time_unit: TimeUnit .. py:method:: __repr__() -> str The full pandas :class:`pandas.DataFrame` containing all the runs as a string. :returns: The full benchmark results as a string. :rtype: str .. py:method:: __str__() -> str Returns a table of the summary statistics of the benchmark results as a string. :returns: The summary statistics of the benchmark results as a string. :rtype: str .. py:property:: benchmark_space :type: dict[str, list[T]] Return the parameters used to benchmark the functions. :returns: The parameters used to produce these results. :rtype: dict[str, list[T]] .. py:method:: results(wide: bool = True, as_time_unit: Optional[TimeUnit] = None) -> pandas.DataFrame Get a pandas ``DataFrame`` containing the results. :param as_time_unit: Return the results in this time unit instead of the one used during the benchmark. :type as_time_unit: TimeUnit, optional :param wide: If ``True`` return the results in the default wide format, which is easier to read. Otherwise, return the results in long format, which can be easier to use for plotting. :type wide: bool, default = True :returns: The DataFrame containing the results, either in wide or long format. :rtype: DataFrame .. py:property:: significance_results :type: pandas.DataFrame Return the results of the significance tests as a :class:`pandas.DataFrame`. :returns: The results of the significance tests as a :class:`pandas.DataFrame`. :rtype: DataFrame .. py:method:: summary(wide: bool = True, as_time_unit: Optional[TimeUnit] = None) -> pandas.DataFrame The summary statistics of the benchmark results. :param wide: If ``True``, return the results in wide format, otherwise, return the results in long format. :type wide: bool, default = True :param as_time_unit: Return the results in this time unit instead of the one used during the benchmark. :type as_time_unit: TimeUnit, optional :returns: The DataFrame with the summary statistics. :rtype: pd.DataFrame .. py:method:: plot(x_label: str = None, use_stat: Literal['min', 'mean'] = 'min', functions: Iterable[str] = None, show_points: bool = False, show_ribbon: bool = False, free_y: bool = False, theme_name: PlotTheme = None, figure_size: Optional[tuple[int, int]] = None, as_time_unit: Optional[TimeUnit] = None) -> plotnine.ggplot Create and return a `ggplot `__ object that visualizes the benchmark results. .. note:: Use ``show` or ``save``` on the resulting object to render or save it. :param x_label: Label the `x-axis` with this value, if given. Otherwise, the first key in the ``benchmark_space`` will be used. :type x_label: str, optional :param use_stat: Which stat to plot. :type use_stat: str {min, mean} :param functions: If defined, limit the plot to these functions. :type functions: Iterable[str], optional :param show_points: Show each individual run as a point on the graph. :type show_points: bool, default = False :param show_ribbon: If ``True``, include a ribbon that encompasses the minimum and maximum value over all the runs in a group. :type show_ribbon: bool, default = False :param free_y: If ``True`` and the ``benchmark_space`` includes more than 1 parameter, the y-axis is not constrained to be the same for each resulting plot. :type free_y: bool, default = False :param theme_name: The name of a theme to style your plot with, otherwise, it will use the default theme. :type theme_name: PlotTheme, optional :param figure_size: Override the size of the resulting figure. If not specified and there are more than one ``benchmark_space`` parameters a heuristic is used to try to ensure each subgraph will be legible. :type figure_size: tuple[int, int], optional, :param as_time_unit: Display the results in this time unit instead of the one used during the benchmarking. :type as_time_unit: TimeUnit, optional :returns: The plot object. :rtype: plotnine.ggplot