Contribution Guide

How to Contribute

There are several ways how you can contribute to hydrobox. All contributions should make use of the Fork / Pull request workflow in the GitHub repository. More information on pull requests can be found on the GitHub About pull requests page.

  1. Add new tools to the toolbox
  2. Improve / Add unit tests to increase code coverage
  3. Improve / Add docstrings on existing functions
  4. Add more examples to the documentation

Add Tools to the Toolbox

Important

In a nutshell:
  1. Fork the repository on GitHub
  2. Commit your method to your fork
  3. Add documentation and unittests for your method
  4. Make sure your fork is building correctly
  5. Pull request your fork back into the main repository

The idea behind hydrobox is to be used on top of numpy, scipy and pandas. This implies using the data types defined in these libraries whenever possible. The main purpose of hydrobox is to save hydrologists from reproducing their codes in every single project. Therefore a hydrobox tool should:

  1. combine analysis steps belonging together into one function, while
  2. separating preprocessing from analysis
  3. be helpful to other hydrologists
  4. output common python, numpy or pandas datatypes

Important

For this guide, we will add a function from_csv to the io submodule. This should illustrate how you can add your stuff.

Fork and structure

Once you forked the project, place a new file in the appropriate module or add a new one. Once your function has been added, import your function in the hydrobox.toolbox file. Please use an meaningful name for your function. It should be clear what the function does. In some cases tool functions are tool specific to make them available at the global hydrobox.toolbox scope. Then the submodule itself will be imported in the toolbox and you do not need to adjust the imports. One example is the io submodule.

Here, we pretend to add a from_csv file to the toolbox. This function will go into a file text.py in the hydrobox.io submodule:

1
2
3
4
5
def from_csv(path, sep=','):
    """
    numpydoc docstring here
    """
    return pd.from_csv(path, sep=sep)

Now, import this function in the __init__ of hydrobox.io. If you want your method to be available in the global scope, import it in hydrobox.toolbox as well.

Important

Please do only use numpydoc docstring conventions and make sure to properly style and comment the Parameters section.

Decorating your tool

Hydrobox includes two helpful decorators in the hydrobox.util submodule: accept and enforce. We encourage you to use the accept decorator whenever possible. This will help to produce way cleaner code. This decorator will check the input data for their data type and raise a TypeError in case the passed data does not have the correct type. If more than one type is accepted, simply pass a tuple. In case a argument can be on NoneType or a callable, use the two literals ‘None’ and ‘callable’ and pass them as strings.

1
2
3
4
5
from io import TextIOWrapper

@accept(path=(str, TextIOWrapper), sep=str)
def from_csv(path, sep=','):
    ...

In this example, the path argument can be a string or a file pointer, sep has to be a string. Thus, there is no need to check the user input in your tool as the decorator already did this for you. We encourage you to use this decorator especially for checking the input data to be of type numpy.ndarray pandas.DataFrame and pandas.Series.

Test your tool

Although the code coverage of this project is not yet really good, it would be nice not to drop it any further. A good code coverage needs unit tests. Beyond code coverage, unit tests will help us to detect whenever our contribution breaks existing code. And last but not least a unit test will help you to build more reliable code. In a nutshell, it would be really helpful if you produce unit tests for your code. More information on unit tests is given in the Add / Improve unit tests section. Some useful links to get you stated with unit tests in Python can be found below.

Document your tool

In order to make it possible for others to use your tool, a good, comprehensive documentation is needed. As a first step, you should always add a docstring to your function. For hydrobox, please use the numpydoc docstring format. More information can also be found in the Add / Improve docstrings section.

Produce examples

Sometimes a docstring is not enough to understand a tool. Although short examples, references and formulas can go into numpydoc docstring formats, you might want to offer different examples covering the whole bandwidth of your tool. Then you should produce some examples for this documentation. You can refer to the Examples section for more information.

See also

Pull Request

Once your have finished with your implementations, create a pull request on GitHub. More info in the Pull Request section.

Add / improve unit tests

Important

If you are not familiar with unit testing in general, please refer to https://en.wikipedia.org/wiki/Unit_testing. If you are not familiar with the unittests module. please refer to https://docs.python.org/3/library/unittest.html

Unit tests are important as they make your code much more reliable and reusable for other users. The basic idea behind a unit test is to test any possible input and output to your tool against the expected behavior. For this you have to set up a test case, run the scenario and compare it to what you expected. When some modules and packages which you rely on change and break your code, the unit test will notice and fail. I am personally using unit tests whenever I try to improve my code, this way I can be sure that I did not optimize any functionality away (and that happens a lot…).

For creating a unit test you need to define a class. Each method of this class represents a test. There are different ways of implementing unit tests, either one test method to test a whole tool or one test method per single check you want to perform. In hydrobox, we decided to use one TestCase class for each method and try to break down each check into a single test method. The example below illustrates this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import unittest
import pandas as pd
import from_csv         # import your tool here

class TestFromCsv(unittest.TestCase):
    def test_row_count(self):
        df = from_csv('file_of_known_size.txt')
        self.assertEqual(len(df), 450)

    def test_col_count(self):
        df = from_csv('file_of_known_size.txt')
        self.assertEqual(len(df.columns), 5)

    def test_change_sep(self):
        """
        change the separator to a sign that does not appear
        in the file. then there sould be only one column.
        """
        df = from_csv('file_of_known_size.txt', sep="|")
        self.assertEqual(len(df.columns)), 1)

if __name__=='__main__':
    unittest.main()

This is a very basic example that checks three different things. It uses our new tool to load a file of known content into the variable df.

Add / improve docstrings

Important

If you are not familiar with the numpydoc docstring format, please refer to http://numpydoc.readthedocs.io/en/latest/format.html.

The most important parts of a numpydoc docstring are shown in the example below. Please make sure, that your docstring always contains the main description, parameters and returns.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
@accept(path=(str, TextIOWrapper), sep=str)
def from_csv(path, sep=','):
r"""short descriptive tile

After a short title, give a few sentences of explanation. What does this
Method do and how is it intended to be used?

Parameters
----------
path : str, TextIOWrapper
    The parameters can also get a full description about their meaning
    possible values. Please be as extensive as necessary here.
    Note the whitespace between the parameter name and the (list) of
    accepted types.
sep : str, optional
    In case an argument is optional, you can indicate this by the
    optional keyword after the type.

Returns
-------
pandas.DataFrame

Notes
-----

The first description at the top should be a rather technical description.
The optional Notes section can be added and used to inform the user about
the background of the function or further readings.
For this purpose you can also include references[1]_ into your Notes.
In the documentations, these will be rendered in the Reference section.

And last but not least you can also input some math:

.. math:: a^2 + b^2 = c^2

References
----------

..  [1] Python, M., Chapman, G., Cleese, J., Gilliam, T., Jones, T.,
    Idle, E., & Palin, M. (2000). the Holy Grail. EMI Records.

Examples
--------

>>> from_csv('file.txt').size
(220201, 5)

"""
...

Note

You should only add short and descriptive examples into the docstring itself. Make use of the Examples section of this documentation.

Enhance the Examples

Todo

write this section

Create a Pull Request

Important

If you are not familiar with Pull Requests, please refer to https://help.github.com/articles/about-pull-requests.

The best scenario for a pull request would be one that includes the new tool / enhancement, a proper docstring, unit tests and a new example. However, we will also accept a pull request including only a tool and a docstring. In these cases, please provide a proper description in the pull request message in order to make it possible for others to add missing content.

Beside a good description, a descriptive title is vital. Please state what you actually want to contribute in the pull request title. For the examples produced in this guide a descriptive title would be something like: Added from_csv tool for reading files.

Note

If your contribution does only contain minor changes like PEP8 fixes, typos and small bugfixes, you can of course pull request these changes without examples and unittests.

And finally, I am really looking forward to your contributions and thanks in advance!