rwskit.benchmarking
Benchmarking tools.
Attributes
The supported units of time. |
|
The names of the supported aggregation functions. |
|
An aggregation function is a callable that takes an array-like object and |
|
The supported values for sorting the |
Classes
A class for profiling a set of functions based on one or mor criteria. |
|
A class for managing the results output by a |
Functions
|
Get the time unit abbreviation from the given string. |
|
Change the unit of time of a given |
|
Check that the two functions take the same parameters. |
Module Contents
- rwskit.benchmarking.AggregationFunctionName[source]
The names of the supported aggregation functions.
- rwskit.benchmarking.AggregationFunction[source]
An aggregation function is a callable that takes an array-like object and returns a single float.
- rwskit.benchmarking.BenchmarkSortValue[source]
The supported values for sorting the
BenchmarkResultswhen represented as a string.
- rwskit.benchmarking.get_time_unit_abbreviation(name: TimeUnit) TimeUnit[source]
Get the time unit abbreviation from the given string.
- Parameters:
name (str) – The name or abbreviation of a supported time unit.
- Returns:
The abbreviation of the time unit specified by
name.- Return type:
str
- Raises:
icontract.errors.ViolationError – If the
nameis not a supportedTimeUnit.
- rwskit.benchmarking.change_time_unit(value: int | float, from_unit: TimeUnit, to_unit: TimeUnit) float[source]
Change the unit of time of a given
valuecurrently in thefrom_unitunit to a value in theto_unitunit.- Parameters:
value (int or float) – The current time value.
from_unit (TimeUnit) – The unit of the current value.
to_unit (TimeUnit) – The unit to change the value into.
- Returns:
Return the equivalent value in the new time unit
to_unit- Return type:
float
- rwskit.benchmarking.validate_call_signature(fn1: Callable[Ellipsis, Any], fn2: Callable[Ellipsis, Any], strict: bool = False) bool[source]
Check that the two functions take the same parameters.
- Parameters:
fn1 (Callable[..., Any]) – The first function to compare.
fn2 (Callable[..., Any]) – The second function to compare.
strict (bool, default = False) – If
Truethe signatures must match exactly, including whether defaults are present and their values. Otherwise, they are considered equal if the number and types of all parameters are the same.
- Returns:
True if the functions take the same number and type of parameters.
- Return type:
bool
- class rwskit.benchmarking.BenchmarkRunner(functions: Iterable[Callable] | dict[str, Callable], benchmark_space: dict[str, list[T]], setup_fn: Callable | None = None, use_single_setup: bool = True, n_runs: int = 10, n_tests: int = 2, n_warm_ups: int = 1, time_unit: TimeUnit = cast(TimeUnit, 's'), test_agg_fn: AggregationFunctionName = 'min', run_label: str = 'run', show_progress: bool = False, verbose: bool = True, float_fmt: str = '0.4e', sort_by: BenchmarkSortValue = 'min', test_significance: bool = True)[source]
A class for profiling a set of functions based on one or mor criteria.
The high level view of the benchmarking process is as follows. For every combination of parameters in the
benchmark_spacea sub-benchmark will be run. There are 2 nested execution loops for each sub-benchmark. The innermost loop runs each function on the current datan_testsnumber of times and aggregates the results using thetest_agg_fn. The same data is always used for this loop no matter what. The outer loop will run this processn_runstimes. Ifuse_single_setupisTruethen the setup function will only be called once and will be used for all the runs. If it isFalsethe setup function will be called for every run. The execution times for all runs will be stored in a Pandas DataFrame that can be retrieved after calling the benchmark.Note
min,max,mean,stdcannot be used as keys in thebenchmark_space.Note
functionsmust be an iterable, but cannot be a generator.Note
The first parameter in the
benchmark_spaceis always used as the x-axis forBenchmarkResult.plot().- Parameters:
functions (list[BenchmarkFunction]) – A list of functions to benchmark or a dictionary that maps a label to a benchmark function.
benchmark_space (dict[str, list[T]]) – The space of values to benchmark over. A benchmark will be executed for each combination of values obtained from the dictionary. The combinations are formed by taking the Cartesian product taking one value from each list in the dictionary. The names of the keys of this dictionary must either be the names of keyword arguments of the
setup_fn, or keyword arguments of the benchmark functions if nosetup_fnis provided. Onlybool,int,float, andstrvalues are supported.setup_fn (SetupFunction) – A function that initializes data to be passed to the benchmark
functions. IfNone, the values fromsetup_argswill be passed directly to each function infunctions.use_single_setup (bool, default = True) – For functions that are guaranteed to be deterministic no matter what the input is, this should be
True. However, if the function is non-deterministic or the performance might depend on how the data is initialized, this should beFalse.n_runs (int) – The number of execution tests to run.
n_tests (int) – The number times to run each function in a single test.
n_warm_ups (int) – The number of tests to run before recording the timing data.
test_agg_fn ({'min', 'max', 'mean', 'median', 'sum'}) – The function to use for aggregating individual test results within a run.
run_label (str) – The column label in the resulting Pandas
DataFramethat indicates the run number for the given execution times.show_progress (bool = False) – Show progress bars while running the benchmark.
verbose (bool, default = True) – Print the full results and summary statistics to
stdoutwhen complete.float_fmt (str) – The format used to print floating point values to a string.
sort_by (str {min, mean, function}) – When
verbose=Truethis will determine how the results are sorted (either by the min run time, max run time or by the function name).test_significance (bool, default = False) – If
True, test if the difference in run times are different between all pairs of models.
Notes
Deterministic Function and Deterministic Data
If your algorithm is deterministic and is not influenced at all by the content of the data, only its size, then I would suggest the following parameters:
use_single_setup = True: Use the same data for all the runs on the current setup parameters.n_runs > 1: Run it at least a few times per parameter set to make sure there weren’t any anomalies biasing the results.n_tests = 1: You should not need to run multiple tests here.
Deterministic Function and Non-Deterministic Data
If your function is deterministic (the sequence of execution is always the same), but could be influenced by the content of the data I would suggest the following parameters:
setup_fn != None: The setup function should return different data each run (of the same size)use_single_setup = False: Run the setup function to generate new data on each run.n_tests > 1: Run the function on the same data a few times in case there was an anomaly, which could bias the result.n_runs > 1: Run the function on multiple different data sets to estimate how much variability is expected due to the makeup of the data.test_agg_fn = 'min': Since the function should execute the same way on the same data, the min should be the most informative.
Non-Deterministic Function
If the function itself is non-deterministic you probably want something similar to the deterministic case with non-deterministic data. In this case however, it is probably pointless to set
n_tests > 1and you should just increasen_runsto get better overall estimates.Examples
>>> import time >>> sort_setup_fn = ( ... lambda array_size, dtype, unique_values: ... np.random.randint(unique_values, size=array_size).astype(dtype) ... ) >>> b = BenchmarkRunner(functions={"fn1": lambda a: time.sleep(0.01), ... "fn2": lambda a: time.sleep(0.02)}, ... benchmark_space={"array_size": [100, 10000], ... "dtype": ["U", "int"], ... "unique_values": [10, 100, 1000]}, ... setup_fn=sort_setup_fn ... time_unit="ms" ... float_fmt="0.3f") >>> b() function array_size unique_values min mean std -------------------------------------------------------- fn1 100 10 1.040 1.053 0.008 fn2 100 10 5.057 5.058 0.001 -------------------------------------------------------- fn1 100 100 1.053 1.056 0.002 fn2 100 100 5.057 5.058 0.001 -------------------------------------------------------- fn1 100 1000 1.056 1.057 0.000 fn2 100 1000 5.058 5.058 0.000 -------------------------------------------------------- fn1 10000 10 1.056 1.057 0.000 fn2 10000 10 5.058 5.059 0.000 -------------------------------------------------------- fn1 10000 100 1.056 1.057 0.000 fn2 10000 100 5.058 5.064 0.009 -------------------------------------------------------- fn1 10000 1000 1.056 1.057 0.001 fn2 10000 1000 5.063 5.066 0.002
- __call__() BenchmarkResult[source]
Runs the benchmark.
- Return type:
- run() BenchmarkResult[source]
Runs the benchmark.
- Return type:
- class rwskit.benchmarking.BenchmarkResult(results: pandas.DataFrame, significance_results: pandas.DataFrame | None, benchmark_space: dict[str, list[T]], float_fmt: str = '0.4e', sort_by: BenchmarkSortValue = 'min', run_label: str = 'run', time_unit: TimeUnit = 's')[source]
A class for managing the results output by a
BenchmarkRunner.Note
This class is not intended to be instantiated directly.
- Parameters:
results (DataFrame) – The results DataFrame obtained by a
BenchmarkRunner.significance_results (DataFrame) – Pairwise t-test results.
benchmark_space (dict[string, list[T]]) – The parameters used to benchmark the functions.
float_fmt (str) – A valid format string to use for floating point numbers.
sort_by (str {min, mean}) – The summary statistic to sort the results by when represented as a string.
run_label (str) – The label used to indicate the run number.
time_unit (TimeUnit) – The original time unit used to benchmark the results.
- __repr__() str[source]
The full pandas
pandas.DataFramecontaining all the runs as a string.- Returns:
The full benchmark results as a string.
- Return type:
str
- __str__() str[source]
Returns a table of the summary statistics of the benchmark results as a string.
- Returns:
The summary statistics of the benchmark results as a string.
- Return type:
str
- property benchmark_space: dict[str, list[T]][source]
Return the parameters used to benchmark the functions.
- Returns:
The parameters used to produce these results.
- Return type:
dict[str, list[T]]
- results(wide: bool = True, as_time_unit: TimeUnit | None = None) pandas.DataFrame[source]
Get a pandas
DataFramecontaining the results.- Parameters:
as_time_unit (TimeUnit, optional) – Return the results in this time unit instead of the one used during the benchmark.
wide (bool, default = True) – If
Truereturn the results in the default wide format, which is easier to read. Otherwise, return the results in long format, which can be easier to use for plotting.
- Returns:
The DataFrame containing the results, either in wide or long format.
- Return type:
DataFrame
- property significance_results: pandas.DataFrame[source]
Return the results of the significance tests as a
pandas.DataFrame.- Returns:
The results of the significance tests as a
pandas.DataFrame.- Return type:
DataFrame
- summary(wide: bool = True, as_time_unit: TimeUnit | None = None) pandas.DataFrame[source]
The summary statistics of the benchmark results.
- Parameters:
wide (bool, default = True) – If
True, return the results in wide format, otherwise, return the results in long format.as_time_unit (TimeUnit, optional) – Return the results in this time unit instead of the one used during the benchmark.
- Returns:
The DataFrame with the summary statistics.
- Return type:
pd.DataFrame
- plot(x_label: str = None, use_stat: Literal['min', 'mean'] = 'min', functions: Iterable[str] = None, show_points: bool = False, show_ribbon: bool = False, free_y: bool = False, theme_name: PlotTheme = None, figure_size: tuple[int, int] | None = None, as_time_unit: TimeUnit | None = None) plotnine.ggplot[source]
Create and return a ggplot object that visualizes the benchmark results.
Note
Use
show` or ``save`on the resulting object to render or save it.- Parameters:
x_label (str, optional) – Label the x-axis with this value, if given. Otherwise, the first key in the
benchmark_spacewill be used.use_stat (str {min, mean}) – Which stat to plot.
functions (Iterable[str], optional) – If defined, limit the plot to these functions.
show_points (bool, default = False) – Show each individual run as a point on the graph.
show_ribbon (bool, default = False) – If
True, include a ribbon that encompasses the minimum and maximum value over all the runs in a group.free_y (bool, default = False) – If
Trueand thebenchmark_spaceincludes more than 1 parameter, the y-axis is not constrained to be the same for each resulting plot.theme_name (PlotTheme, optional) – The name of a theme to style your plot with, otherwise, it will use the default theme.
figure_size (tuple[int, int], optional,) – Override the size of the resulting figure. If not specified and there are more than one
benchmark_spaceparameters a heuristic is used to try to ensure each subgraph will be legible.as_time_unit (TimeUnit, optional) – Display the results in this time unit instead of the one used during the benchmarking.
- Returns:
The plot object.
- Return type:
plotnine.ggplot