In this post, we discuss how we can ensure that products in our application repository are ready to use. In order to know whether the applications that we build in our Jenkins are properly built, we include a test phase between build and deployment. Similarly, the only way to know whether our execution environment will work in the wild is to test it too.
CODE-RADE is designed to deliver products to arbitrary endpoints in a non-invasive way. The removal of this restriction on where the applications will actually run may seem like an invitation to uncertainty, but in practice these environments end up being one of four categories1 :
- Single user laptops
- Shared HPC clusters
- IaaS clouds
- Container platforms
As a platform, we need to provide reliable means to provision CODE-RADE client environments on at least these. The only way we can know that they are reliable is to actually test it in them.
Testing in this case starts with the code which expresses the environment, which is implemented in an Ansible role2. This was done for two reasons:
- Make the configuration declarative
- Make the provisioning portable and repeatable
These come directly from 12 Factor methodology
The twelve-factor app is a methodology for building software-as-a-service apps that:
- Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
- Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
- Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
- Minimize divergence between development and production, enabling continuous deployment for maximum agility;
- And can scale up without significant changes to tooling, architecture, or development practices.
By expressing the configuration in this way, it’s easier to see where errors arise, to track down bugs across deploys and maintain the code in the long run. What is more, it gives some freedom of choice in the provisioning the execution environment by infrastructure providers.
Bias is a terrible thing, which can convince you of things that aren’t true. In order to be sure that we are not biasing our configuration, provisioning and deployment methodology towards one or another personal favourite, we should try to write “minimum bias” tests - tests which make the least assumptions about the environment as possible. Previous efforts in testing Ansible roles consisted of applying them to some pre-defined environment, e.g. running them against vanilla Travis instances. This is not sufficient in many cases, as it does not provide the coverage that we need in CODE-RADE. Remember, we need to assure arbitrary execution environments! This is not just along the operating system axis, but also along the deployment axis - e.g. Docker, virtual machine, IaaS cloud, etc. In order to do this, we need a testing framework - not just individual tests - such as TestInfra, ServerSpec or our personal favorite InSpec.
Luckily, the universe is a magical place and someone invented molecule. Molecule allows us to combine builders, provisioners, testing and linting or style checks into one combined workflow, for various scenarios.
Function over Form
Separation of testing from provisioning means that we can publish the tests independently of the rest of the workflow. In principle, it is these tests which decide whether the environment is properly deployed, regardless of the method used to achieve it. These are tests of the final state of the deployment scenarios, and seek only to answer the question - with as little bias as possible - “Has the environment for CODE-RADE been deployed correctly ?”.
While the monitoring of production infrastructure seeks to detect defects in the “steady-state” of a system, testing in continuous integration seeks to detect errors introduced by changes. The testing phase is a crucial part of adding reliability and clarity to the development cycle, giving project contributors and maintainers confidence in developing features or responding to requests. The testing phase seeks to answer questions such as:
- “What will happen to aspect X, if I change aspect Y?”
- “Will change Y have unforeseen side-effects on the system that affect it’s performance or break it?”
Not all contributions and changes will result in positive outcomes, some of them will inevitably break the system, and we need to detect those breaks before they are propagated to the production environment - indeed, preferably as early as possible3.
To this end, we have introduced molecule into our continuous integration pipeline. In order to run these tests, we need to consider the various scenarios in which the CODE-RADE execution environment could be deployed, and create relevant testing scenarios for them.
In order to test scenarios, we first need to imagine and create them.
Sure, we can’t imagine all environments possible, but we can consider common aspects of various kinds of environments.
As we mentioned above, the environment should pass similar tests irrespective of the environment.
For example, it should have the CVMFS configuration, repository keys, bash-modules package, and some user environment configured.
We have initially created three testing scenarios, using molecule.
Looking at the
molecule directory of the role, we can see three subdirectories:
These each define different drivers which molecule uses to provision the environment -
The tests themselves are kept in each scenario’s
test subdirectory, and are implemented with TestInfra, the default testing framework used with Molecule.
The tests run at the
verify stage of the testing strategy as their role is to check that the actual environment created by the playbook is correct, irrespective of how we created it.
These tests include, e.g.:
- Is the
cvmfs2executable available ?
- Is the client package installed in the correct location ?
- Is the
code-rade.africa-grid.orgrepository mounted ?
- Does the repository contain the correct version ?
For more details on how these tests are expressed, see the default scenario tests. There are probably other tests that we can imagine, coming both from the Dev (i.e. those contributing to the actual Ansible role) as well as the Ops (i.e. those responsible for using the role at their sites and keeping things running) sides.
One thing is sure : we have far better understanding of where, when and how this role fails, by testing it in realistic scenarios.
Open Source Open Tests
CODE-RADE is an Open Source project - we want to position it firmly in the Commons: a piece of infrastructure that anyone can contribute to and use. This means that we need as much transparency in the process of contributing to it as we can handle, so the tests need to be publicly visible and linked to the codebase. In fact, we have a section of CODE-RADE just for infrastructure. There, you can find the current and historic results for the tests of this role. We also test the role in Travis.
In this post, we have discussed how we approach testing the role responsible for the creating the CODE-RADE user environment. This role is meant to be used by site administrators and users alike, with whatever provisioning system they prefer, to configure their site to use CODE-RADE. CODE-RADE is designed to be non-invasive and community-based, and we place a high premium on the reliability and automation of the provisioning tools, in order to reduce to as near to zero as possible the barrier to entry for all. Using the fantastic tool Molecule, we are able to simulate the expected environments, and test the deployment in various scenarios, giving clear and actionable feedback on any changes to the code. The ability to run the same tests over arbitrary environments means that site operations teams can express the expected behaviour of the system independently of the way in which it is provisioned, reducing bias in the development side, and increasing our trust in automated deployments.
References and Footnotes
We will write up how one can actually use CODE-RADE in these scenarios perhaps in a later post. ↩
What is meant by “fail fast” - detect failures as soon as possible, so that they can be addressed, before their impact has wider-ranging consequences. ↩