Basic statistical tools

Moving window statistics

moving_window

hydrobox.toolbox.moving_window(x, window_size=5, window_type=None, func='nanmean')[source]

Moving window statistics

Applies a moving window function to the input data. Each of the grouped windows will be aggregated into a resulting time series.

Parameters:
x : pandas.Series, pandas.DataFrame

Input data. The data should have a pandas.DatetimeIndex in order to produce meaningful results. However, this is not needed and will technically work on different indexed data.

window_size : int

The specified number of values will be grouped into a window. This parameter might have different behavior in case the window_type is not None.

window_type : str, default=None

If None, an even spaced window will be used and shifted by one for each group. Else, a window constructing class can be specified. Possible constructors are specified in pandas.DataFrame.rolling.

func : str

Aggregating function for calculating the new window value. It has to be importable from numpy, accept various input values and return only a single value like numpy.std or numpy.median.

Returns:
pandas.Series
pandas.DataFrame

Notes

Be aware that most window types (if window_type is not None) do only work with either numpy.sum or numpy.mean.

Furthermore, most windows cannot work with the ‘nan’ versions of numpy aggregating function. Therefore in case window_type is None, any ‘nan’ will be removed from the func string. In case you want to force this behaviour, wrap the numpy function into a lambda.

Examples

This way, you can prevent the replacement of a np.nan* function:

>>> moving_window(x, func=lambda x: np.nanmean(x))
array([NaN, NaN, NaN, 4.7445, 4.784 ... 6.34532])

Linear Regression

linear_regression

hydrobox.toolbox.linear_regression(*x, df=None, plot=False, ax=None, notext=False)[source]

Linear Regression tool

This tool can be used for a number of regression related tasks. It can calculate a linear regression between two observables and also return a scatter plot including the regression parameters and function.

In case more than two Series or arrays are passed, they will be merged into a DataFrame and a linear regression between all combinations will be calculated and potted if desired.

Parameters:
x : pandas.Series, numpy.ndarray

If df is None, at least two Series or arrays have to be passed. If more are passed, a multi output will be produced.

df : pandas.DataFrame

If df is set, all x occurrences will be ignored. DataFrame of the input to be used for calculating the linear regression, This attribute can be useful, whenever a multi input to x does not get merged correctly. Note that linear_regression will only use the DataFrame.data array and ignore all other structural elements.

plot : bool

If True, the function will output a matplotlib Figure or plot into an existing instance. If False (default) the data used for the plots will be returned.

ax : matplotlib.Axes.Axessubplot

Has to be a single matplotlib Axes instance if two data sets are passed or a list of Axes if more than two data sets are passed.

notext : bool

If True, the output of the fitting parameters as a text into the plot will be suppressed. This setting is ignored, is plot is set to False.

Returns:
matplotlib.Figure
numpy.ndarray

Notes

If plot is True and ax is not None, the number of passed Axes has to match the total combinations between the data sets. This is

N^2

where N is the length of x, or the length of df.columns.

Warning

This function does just calculate a linear regression. It handles a multi input recursively and has some data wrangling overhead. If you are seeking a fast linear regression tool, use the scipy.stats.linregress function directly.