Nominee’s bio

I'm Zack Williams (zdw@cs.arizona.edu, zdw on Slack), a systems programmer working at the University of Arizona on the OpenCloud project which uses CORD. My primary skill sets are in automation, operations, and build systems.

How long have been working in the CORD community?

I started working on CORD at the beginning of 2016.

What contributions have you made in the past to the CORD community?

I've worked on many parts of the CORD system, but my primary tasks are working on the build and deployment system.

I've also participated in various CORD events such as presenting on XOS architecture at the CORD Summit in July of 2016, and assisting with the E-CORD deployment at China Mobile in August of 2017.

For the 4.0 release, I did primary development on the new make-based build system and docker image build/download/versioning tool. These changes unified the development and deployment workflows, allowed for a more modular and incremental build process, provided infrastructure for creating multiple pod topologies, and dramatically reduced the build time for a CORD pod.

What are you actively working on in CORD?

The build system, which cuts across development and deployment tasks, and the OpenCloud profile/scenario.

Why do you feel you would be a good candidate for this position?

I want CORD to be wildly successful and I think we get there by lowering the barriers to develop, deploy, and operate it.

I'm extremely detailed oriented which is helpful when designing systems and handling technical and design challenges.

While I'm primarily a developer, I also handle the deployment of OpenCloud and will advocate for improving and streamlining the operational processes required to run CORD in a long-running production environment.

Are there any changes you would like to bring to the community if elected into this position?

We have a well developed, community driven process for deciding what features to work on through the TST, but the our processes to deliver and test our releases could use some work.

I tend to think that this is a process issue - most of our development and testing processes either were inherited from ONOS (which is a dramatically different project in terms of structure and deliverables) or were invented on an ad-hoc basic, and may not be optimal for our needs.

To that end, I'd like to re-evaluate the processes we use for testing and release engineering, then apply those findings to the development process. This will help us deliver a more robust CORD in both the formal releases and during ongoing development.

While not a perfect match, there are many best practices that come from the DevOps and Site Reliability Engineering schools of thought that can help us to "close the loop" when problems occur and allow for continuous process improvement. Much of this comes from quality management and lean manufacturing theory- the "Five Whys", root-cause analysis, and blame-free postmortems when problems occur. All of these have a goal of only solving a problem once, using the opportunity to build knowledge within the development team, and avoiding future problems.

One example of how this could play out:

  1. An issue is reported that needs to be resolved
  2. A determination is made of the nature of the issue
  3. We take steps to avoid repeating the issue

Example:

  1. CORD passes all tests in a virtual pod, but fails only in a specific unusual but valid physical configuration.
  2. The problem is triaged, in the process discovering an edge case that occur only in the physical deploy.
  3. Steps are taken to prevent the failure from recurring:
  • The cause of the problem is resolved.
  • Tests on the virtual pod are updated to trigger the conditions that would cause the failure, so if the flaw is reintroduced layer it would be caught in testing.
  •  If an issue with our development or testing process is discovered, we evaluate and make both technical as "soft" changes to that process to prevent future similar errors. This may be as minor as adding a bit of documentation, or more substantial as necessary.

I think that adopting this type of process is a beneficial and achievable goal, and see TST participation as one way to move in this direction.