CORD : What Every Developer Should Know

Introduction

This document tries to record all the “assumed knowledge” about CORD, with the goal of helping developers come up to speed and adopt consistent practices. It is not intended as a replacement for other more detailed documentation (links are encouraged). The goal is to identify the “unwritten rules” that all developers should know, with a focus on system-wide practices.

Software Organization

This section describes where to find code, at different points in time (e.g., in Gerrit, on the development machine, etc.)

Gerrit

CORD source code is organized as a set of repos. The authoritative version of these repos are maintained as Gerrit projects, with most mirrored on GitHub. The current plan-of-record is to mirror all the projects that are listed in the manifest file.

Instructions for accessing and committing source code:

Build

A subset of those repos collectively define the CORD build process. These include:

  • manifest → Specifies the collection of repos and branches to be included in a release

  • cord → named `build` when checked out, contains base gradle build config and virtual environment bootstrapping

  • maas → Installs MAAS tools for booting compute nodes from bare-metal

  • platform-install → Set of playbooks for installing software for different CORD profiles

  • config → Specifies several global configuration parameters

  • service-profile → Deprecated (profiles for creamy-vegetable and mysterious-decision)

XOS

A second subset of repos collectively define XOS, which implements the CORD Controller. These include:

  • xos → Implements the XOS data model

  • xos-gui → Implements a GUI on top of the XOS data model

  • xos-rest-gw → Binding between Redis Events and Web Socket

  • xos-sample-gui-extension → A sample GUI extension that others can mimic

  • chameleon → Tool used to translate between the native XOS API and REST APIs

Profiles

A third subset of repos specify each of the profiles of CORD that can be built. This set will grow over time, but currently includes:

  • ecord → Configuration files and assorted customizations for building E-CORD

  • mcord → Configuration files and assorted customizations for building M-CORD

  • rcord → Configuration files and assorted customizations for building R-CORD

  • opencloud → Configuration files and assorted customizations for building OpenCloud

Each repo currently includes any profile-specific Models and GUI Extensions. Although not part of Dangerous-Addition, the plan is to move all profile-related specification files to these repositories (e.g., the TOSCA files for each profile) and leave all the playbooks in platform-install.

Tests

A fourth subset of repos provide assorted tests. These include:

  • cord-tester → Set of tests that can be run against a CORD

  • fabric-oftest → Set of tests that can be run against the CORD switching fabric

Services

The remaining repos correspond to services that can be on-boarded into one or more profiles of CORD. These services are at various levels of maturity, with some included in official releases of CORD, and others still in various stages of development.

The ExampleService repo illustrates the structure of each service repo. It includes an xos directory that specifies the service model and routines that implement a Synchronizer.

Build Time Layout

When checked out, the repo tool creates a directory structure (usually under a directory named `cord`) as defined by the manifest.

Container Images

Once an install is complete, a set of containers are instantiated on the head node. These potentially change on a release-by-release basis, so it is best to consult the latest build guides for the most current inventory: quickstart, and quickstart_physical.

Best Practices

The following identifies practices that we want to adopt uniformly across CORD.

Release Planning

At the beginning of each release period, the TST schedules a set of planning meetings, the end result of which is a prioritized list of Features that the community will work on during the release period.

In addition to the collaborate release planning documents, we distill the plan into a Roadmap page on the Wiki. This Roadmap includes links to various Design Documents that describe how a feature is implemented in more detail.

Sprint Planning

Development within each four-month release cycle occurs as a series of sprints, each averaging about 3 weeks.  A four-month release cycle results in six 2-3 week sprints.  The schedule for the release should be established at or prior to release planning.  Feature freeze occurs on the last day of the next to last sprint, requiring that all code intended for the target release be reviewed and merged by this date.  The last sprint is considered the hardening sprint and should focus on defect fixes.  Support branches for the release should be created at the beginning of this sprint, leaving master open for new features to be developed.  Release candidates will be generated based off of the support branch and upon agreement that all critical, ship-stopper defects are addressed and successful feature/system/stress/performance testing, an official release will be announced.     

Each sprint begins with a one hour sprint planning session where the team discusses what they are planning to deliver for that sprint.  Each sprint ends with a one hour sprint review session where the team goes over what was delivered, any work left in progress, and any tasks that may not have been started.  This review should also include a retrospective from the scrum as to why stories committed for the sprint were not completed, enabling the team to address this in future sprints/releases and to establish a predictable burndown pattern for their stories.

Jira

Jira (https://jira.opencord.org/secure/Dashboard.jspa) is used to capture and track the work needed to deliver features, as well as, any bugs found that needs to be addressed.  Ideally, deliverables for the whole release should be identified upfront, minimally at a Jira epic level and more coarse-grained stories to capture the extent of the deliverables for each release, identifying time-boxed investigation spikes as needed to further flesh out tasks for these deliverables.  

Epics should be feature-based rather than component-based. Epics and user stories should be prioritized within each release and sprint, in order to provide a list that the team can work on to deliver features in priority order.  

The whole scrum should work together to discuss each story, making sure all members understand the work required to deliver the story, and through consensus, determine the story points associated with each user story.  Deriving this value together allows the team to better understand the overall design and work required, as well as allow the team to deliver in a more swarm-like model where the members of the scrum can load balance each other to deliver the features, ensuring that features are addressed and completed in priority order.  Here are guidelines around story points usage/assignment:

  • 1 takes less than a day to complete

  • 2 is about a day

  • 3 is a few days

  • 5 is a week

  • 8 or more is usually not in a sprint and is instead broken down into smaller tasks

Stories should also note the Affected Version(s) and on completion of the story, the Fixed Version(s).  This enables Jira to provide some automated tracking of completed stories/features per release.  

All commits in Gerrit should be associated with a Jira ticket.  The Jira ticket number should be included in the Gerrit commit to enable tracking between the systems.  

Daily Stand-up Meetings

Daily stand-ups allow the team to sync up on a daily basis to 1) ensure alignment on work effort and 2) raise any issues or concerns.  Stand-ups should aim to last for only 15 minutes and each member of the scrum should discuss what they have worked on since the last stand-up, what they plan to do next, and whether they are blocked on an issue.  The scrum master can then work with the team/team member to unblock them.

Design Documents

As a release period unfolds, there are many conversations about how various features should be designed and implemented. These design decisions are documented in a collection of Google Documents. In addition to giving people a place to record and comment on design questions, these notes serve as a basis for official documentation produced for the release (see next).

The design document should capture the requirements, what the feature is, and how the feature would be designed/developed, noting components in the system that would need to be modified to support this feature.  The design document should ideally also capture the end state of what the “users” of the system would see, or this overall picture/experience can be documented as part of an overall architecture document.

The team should account for time for the design to be reviewed by peers/community (post to dev mailing list) to ensure potential issues are uncovered and addressed.  Note all dependencies on other components, ensuring alignment with other components and members of the team on touch points and needed interfaces between the components.  Ideally work would be prioritized to ensure that interfaces are defined and finalized between the components prior to start of implementation to ensure that development of dependent components can proceed in parallel.

Code Review

Development occurs in iterative sprints with code committed at frequent intervals to Gerrit in small increments to enable team members and subject matter experts (module owners) to review easily and +2.

Let’s flesh out the guidelines…

  • Module owners for services and for platform level (i.e ECORD, RCORD, FABRIC, XOS, GUI - per repo/sub-repo granularity?)

  • Jenkins coverage

  • WIP or draft patches

Unit tests

Unit tests have been ad hoc, but are becoming a focus of attention.  A document describing our plan to improve and measure unit test coverage can be found here.

Documentation

Documentation includes a combination of GitBook, Wiki pages, and Google Docs. Our current practices for document CORD are documented here.

Build Stages

Building CORD involves a five-stage workflow: Configure → Fetch → Build → Publish → Deploy

The five stages are summarized as follows, where the key is to identify the “end state” for each:

  • Configure: Users configure a particular build by saying which profile they wish to install and what target POD they want to install it on. The end state is that all unresolved variables in the profile are resolved, resulting in fully qualified manifest files that can be “executed” during subsequent stages.

  • Fetch: The end state is for all container images that we have chosen to pull rather than build locally (as identified by sha256 or tag) have been downloaded to the development machine. It should also be the case that the viability of the specification can be confirmed (e.g., that the inter-service dependencies have been satisfied, and so on).

  • Build: The end state is a set of generic container images to be installed on the target POD. This involves building from source and extending base images. Images are not customized with deployment-specific information like keys and certificates; that happens during the Deploy stage.

  • Publish: The end state is a complete set of generic container images loaded into the local registry (running on the POD). By convention, all published images are tagged :candidate.   This step may not be necessary or required, depending on the type of deployment.

  • Deploy: Container images required for the targeted build are downloaded from the local registry and the corresponding containers are instantiated on the target POD. Where necessary, images are customized with necessary deployment-specific information (e.g., keys and certificates). The end-state is the set of  containers required for the targeted build running on the target POD.

Extending the GUI

Todo...

Logging Conventions

Logging information output by each component is collected by Elk Stack and logged to a local file or (optionally) console. Components are expected to adopt the following practices…

  • For all python components, the xos logging module should be used.

  • The xos logging module is to be configured using the configuration system. For example, log levels, log file names, etc. Each component should use a different config file, allowing logging to be individually tailored for that component.

  • Components should log to a unique filename named after that component. For example, /var/log/vsg-synchronizer.log.

  • Multiple log levels are encouraged. Limit verbose debugging log messages to the DEBUG log level. Ensure errors are logged at the ERROR log level. Conditions that are abnormal but do not necessarily rise to the level of an error should be logged at the WARNING log level.

  • Log events should include context. For example, if a particular model is being synced by a synchronizer, include the name of that model and its identifier.

  • Log messages should be human readable and intuitive.

  • If an exception occurs and an error is logged, then the exception traceback should be logged with that error message.

Jenkins and QA

CORD Jenkins can be accessed from https://jenkins.opencord.org

Currently there are two types of jobs that are run.  

  • CI based jobs : Builds are triggered by commits on few repos and executes tests.

  • QA jobs: Runs tests by deploying CORD

Few jobs also check for change sets in the 2.0 and 3.0 branch along with the master branch.  Most of the jobs built use pipeline plugin that helps to build, test and deploy changes.  

Quick overview on few jobs that are run on the CORD jenkins server:

  • cord-gui-pipeline:  Repo tested - `xos-gui` , job checks if the container (xos-spa-gui) is properly built, the plan is to add E2E tests for the GUI

  • cord-in-a-box: Build physical PODs periodically

  • cordtester-ciab-pipeline: Repo tested - `cord-tester`, Deploys CiaB and runs regression tests (QA Job)

  • platform-install: Repo tested - `platform-install` ,

  • xos-api-sanity-pipeline : Repos tested : `platform-install`, `xos`, `service-profile`, tests include deploying front-end configuration and validate various xos end-points

  • xos-gui : Repo - `xos-gui`, Unit Tests for the new XOS GUI Single Page Application based on Angular1.6

Ansible Playbooks

CORD makes heavy use of Ansible (e.g., see platform_install). We adopt the following conventions:

  • Before commit, run ansible-lint to sanity check your playbooks and roles.  This will point out (and frequently suggest fixes) for a variety of best practice related issues. Jenkins will run this automatically on many of the repos as a commit acceptance test.

  • Use the YAML-style `var: value` syntax rather than the `var=value` syntax.  This fixes various issues with code highlighting and quoting, especially when using editors that can perform syntax highlighting.

  • Avoid using ansible’s tags to turn on/off behavior - instead use conditional role includes, and separate roles for different features.  A role should perform a nearly identical set of steps whenever it is run.

Service Inventory & Dependencies

There is an official TST-curated Service Inventory (and list of Service Profiles) on the wiki. This inventory specifies several pieces of information about the services included in a release, including the relationship between CORD-visible services and the underlying code repositories (a mapping that is not necessarily 1-to-1), but it is far from being complete. There are often many assumptions and dependencies. While the plan is to make all dependencies explicit and to run static dependency checks for any service graph a user specifies, for now we depend on service developers documenting their assumptions and dependencies in a README.md file.