Basic statistical tools

Moving window statistics


hydrobox.toolbox.moving_window(x, window_size=5, window_type=None, func='nanmean')[source]

Moving window statistics

Applies a moving window function to the input data. Each of the grouped windows will be aggregated into a resulting time series.

x : pandas.Series, pandas.DataFrame

Input data. The data should have a pandas.DatetimeIndex in order to produce meaningful results. However, this is not needed and will technically work on different indexed data.

window_size : int

The specified number of values will be grouped into a window. This parameter might have different behavior in case the window_type is not None.

window_type : str, default=None

If None, an even spaced window will be used and shifted by one for each group. Else, a window constructing class can be specified. Possible constructors are specified in pandas.DataFrame.rolling.

func : str

Aggregating function for calculating the new window value. It has to be importable from numpy, accept various input values and return only a single value like numpy.std or numpy.median.



Be aware that most window types (if window_type is not None) do only work with either numpy.sum or numpy.mean.

Furthermore, most windows cannot work with the ‘nan’ versions of numpy aggregating function. Therefore in case window_type is None, any ‘nan’ will be removed from the func string. In case you want to force this behaviour, wrap the numpy function into a lambda.


This way, you can prevent the replacement of a np.nan* function:

>>> moving_window(x, func=lambda x: np.nanmean(x))
array([NaN, NaN, NaN, 4.7445, 4.784 ... 6.34532])

Linear Regression


hydrobox.toolbox.linear_regression(*x, df=None, plot=False, ax=None, notext=False)[source]

Linear Regression tool

This tool can be used for a number of regression related tasks. It can calculate a linear regression between two observables and also return a scatter plot including the regression parameters and function.

In case more than two Series or arrays are passed, they will be merged into a DataFrame and a linear regression between all combinations will be calculated and potted if desired.

x : pandas.Series, numpy.ndarray

If df is None, at least two Series or arrays have to be passed. If more are passed, a multi output will be produced.

df : pandas.DataFrame

If df is set, all x occurrences will be ignored. DataFrame of the input to be used for calculating the linear regression, This attribute can be useful, whenever a multi input to x does not get merged correctly. Note that linear_regression will only use the array and ignore all other structural elements.

plot : bool

If True, the function will output a matplotlib Figure or plot into an existing instance. If False (default) the data used for the plots will be returned.

ax : matplotlib.Axes.Axessubplot

Has to be a single matplotlib Axes instance if two data sets are passed or a list of Axes if more than two data sets are passed.

notext : bool

If True, the output of the fitting parameters as a text into the plot will be suppressed. This setting is ignored, is plot is set to False.



If plot is True and ax is not None, the number of passed Axes has to match the total combinations between the data sets. This is


where N is the length of x, or the length of df.columns.


This function does just calculate a linear regression. It handles a multi input recursively and has some data wrangling overhead. If you are seeking a fast linear regression tool, use the scipy.stats.linregress function directly.