CORD : Troubleshooting CORD with Elastic Stack

Pending Release: This documentation pertains to a set of features that are under development. This line will be removed when they become available.

CORD comes with Elastic Stack baked into it. CORD components are instrumented with logging capabilities that collect enough context centrally for a CORD operator to be able to gather information about errors, investigate anomalies, construct a high-level view of system behavior, or simply watch operations unfold in a centralized UI. An attempt has been made to collect rich structured log data, which can be used to extract information at a high granularity.

Example Queries

Here are examples of queries that can be run using this facility:

Get a high level overview of CORD function, such as phases of the installation
Get Ansible logs for a particular object in the data model
Get all Ansible logs with failures, in a particular component, e.g. a synchronizer
Get logs pertaining to a particular slice, which are also associated with a given controller or a given user
Get the versions of a particular package (e.g. docker) that were installed as part of the installation process
Get a list of Python exceptions for all of CORD in a given interval of time
Stream log messages for a particular Synchronizer

Enable Elastic Stack

To enable Elastic Stack, run the cord-in-a-box script as usual, without the -k option. This will bring up a Vagrant VM with the name elastic, which will run Logstash, ElasticSearch and Kibana. You can connect to the Kibana web interface to visualize the data collected by connecting to port 5601 via your web browser. You can forward that port to your local machine using the following command:

ssh -L 5601:elastic:5601 <ssh url of machine>

and then browse to localhost:5601

Overview of Kibana

TBA

Logging data in a Synchronizer

To log data within a Synchronizer, you can use the XOS logger. Here's an example:

logger.info('Database initialization succeeded',extra={'username':username})

The above call logs a status message, and associates with it the tag 'username' set to a string. Later, a user can search for messages with username set to 'bob.'

If there is a data model object associated with the operation, then you can tack this information on, by using the object's serializer:

logger.info('Database initialization succeeded',extra = db_object.tologdict())

Doing so will associate all of the fields in the database object with the log line.

Logging data outside of a Synchronizer

There are two ways of logging data from a component that is external to your synchronizer implementation. The first is to send logs formatted via JSON to the IP address of the elastic VM, over UDP port 5617. See the example on this page for details: Logstash example. The second is to use the logstash_tail FIXME tool, which is part of CORD. logstash_tail is able to parse the output of Ansible and convert it into structured information that can be used in filters and queries.

You can use it as follows:

some_command | logstash_tail -elastic_ip <ip address of elastic>

or set it to watch for updates in a given log file:

logstash_tail -elastic_ip <ip address of elastic> -f <path to log file>

Currently, we support the following log formats:

Ansible playbooks
apt installation

CORD tags

CORD operations are associated with rich context information that run from the top level of the data model, to operations on the substrate, such as via Ansible. When you log using library functions provided by XOS and the Synchronizer framework, this context is automatically incorporated in your logs. For example, if a Synchronizer step logs a warning, then it automatically gets associated with the name of the model being synchronized, as well as the object id of the object in question.

If you run external operations, such as commands in a shell script, and have the associated data model information at hand, then consider passing it to the shell script and adding it to the json data that you send to elastic.

logstash_tail -elastic_ip 10.2.2.50 -f install_db.out -tags model_name:DBService,object_id:17,global_tag:Installing DB

From within the Synchronizer, consider adding structured logging information in the log dictionary, instead of writing it in plain text.

I.e. instead of:

logger.info('Length of list: %d First element: %s'%len(lst),lst[0])

use:

logger.info('Received list of NICs',extra={'len':len(lst), 'first':lst[0]})

Here is a list of tags you can use, along with a description of each one:

global_tag

Describes what the system is doing at the top level. E.g. titles that summarize the stages of the installation. This will be the first thing that a user looks at to get a birds-eye-view of operations.

model_name and id

The name of the model and the id of the object with which an operation is associated

synchronizer_name

The name of the synchronizer.