Soak and Stability Tests - CORD 5.0

Test Purpose

Purpose of Soak and Stability Tests is to validate the CORD platform stability by keeping the POD to be up for couple months and observe the stability of the POD.  During this soak time, POD is tested by scaling vSGs/vCPEs and generating/emulating traffic flowing through the vSGs continuously.

Functional Tests - CORD 4.1


(These tests were performed during the start of 6.0 release since the test scope of this activity is based on the stable version of previous release of CORD which is 5.0)

R-CORD Scale/Stability Test Scenarios

  • Soak/Stability Testing is performed using CORD 5.0 stable version

  • Scale vCPEs in a single vSG

    • Required test scripts will be achieved for CORD 5.0 release

    • Tests will be performed Flex POD

  • Scale vSGs

    • Verify that the vSGs created are ACTIVE

    • Verify that the respective containers are created within

    • Verify the stability of CORD (memory utilization, restart checks and any error checks)

  • Stability Tests

    • POD runs for almost 3-month time

    • Following checks are performed to check the health of the POD

      • Errors in Containers, ONOS Logs

      • Restart checks on any containers

      • Memory utilization on the containers

      • Out of Memory issues

  • Traffic Emulation tests using Spirent Test Tools

    • Generation of continuous traffic scenarios using multiple vSGs

    • Scaling of vSGs and vCPEs followed by generation of more traffic

Test Bed/Environment

Infrastructure Details 

Flex POD-1

 

·       Compute (Flex 1U Victoria) x4

o   2x Intel E5-2630v3 8c 2.4Ghz, 192GB RAM

o   2x1G, 2x40G dual port Mellanox NIC

o   1x Intel SSD 480GB

·       Switch (Accton AS6712-54X 48x40G, 6x40G) x4

 

Spirent Configuration

 

·       Spirent Test Center virtual was used.

·       STCv was deployed as an Instance on the underlying infrastructure running ESXi 5.5.

·       It is a lightweight VM.

  • STCv Number of Ports - 2 (Tx - 1 and Rx - 2)

  • vCPU’s – 4

  • Memory – 3GB

  • Speed on Ports - 1Gbps

    • Shared amongst all the vSG’s in place.

  • Adapter – VxNET3

Testing Scenario Details

Scenario 1 -

As RCORD Solution is deployed on Open-stack – Spirent test center virtual(STCv)VM’s can be created and deployed on the same environment. In the current scenario, STCv is deployed on a server running ESXi v5.5.

 

 

 

Detailed Packet flow

·      The test client (STCv) emits double-tagged VLAN packets (S-Tag and C-Tag) onto the “vSG” interface on the compute node.

·       Depending on the functionality configured on the vSG the test client (STCv) on the receiver end receives the traffic.

 

Wireshark Capture

 


Scenario 2

 


Detailed Packet flow

  • The test client (STCv) emits double-tagged VLAN packets (S-Tag and C-Tag) onto the “fabric”.

    • Fabric here in the above picture is a switch running ONOS.

    • Flows are configured on this switch in ONOS such that when it receives double-tagged traffic, the traffic is routed to corresponding vSG’s.

  • All the packets are always forwarded on the “fabric” interface of the compute nodes, where they are forwarded to the corresponding vSG.

  • Depending upon the functionality of the vSG the test client (STCv) on the receiver end receives the traffic.

  • Analytical report is generated by the Spirent tool. This report has a detailed insight that provides us various details like frame size, latency, throughput, dropped frames, etc.

  • These reports help us analyze the life of the packet in the CORD environment.

 

 

 

Spirent Frame Details –

 

 

Test Results

  • Number of vSGs created : 17

  • Number of vCPEs created : 25

Description

The below are the results of a SOAK test that was conducted over the weekend.

Below are the details of the configuration and traffic that was in place for the testing scenario.

 

  • Traffic Type - Continuous

  • Frame Size - 1280

  • Configuration

    • Number of vSG’s - 5 (808 & 802, 333 & 888, 805 & 802, 739 & 734 and 655 & 533).

Spirent Test Center Details  

·       Spirent Test Center virtual was used.

·       STCv was deployed as an Instance on the underlying infrastructure running ESXi 5.5

·       It is a lightweight VM.

  • STCv Number of Ports - 2 (Tx - 1 and Rx - 2)

  • vCPU’s – 4

  • Memory – 3GB

  • Speed on Ports - 1Gbps

    • Shared amongst all the vSG’s in place.

  • Adapter – VxNET3

Detailed - Spirent Configuration in GUI (along with stream blocks and results)

Results

 

 

Issues Observed


1. Limitation of cross connect

With CORD-5.0 we could only use cross connect to forward double-tagged traffic.

However, cross connect can only connect two ports on the same switch. Since one end of cross connect needs to be the Spirent box, the other end also needs to be connected to the same leaf switch (of:0000cc37abd93769 in our test) and thus the only option is one of the compute nodes. Therefore we could only use vSGs created on that specific compute node for the test. There are also a number of vSGs created on the other compute-nodes but they cannot be utilized since the compute-nodes they reside on are connected to the other leaf switch.

Moving forward there’ll be 2 solutions:

- One is to use pseudo wire to create a “tunnel” to the other leaf. This is already in 1.12 but not in CORD-5.0

- There’ll be another feature to forward the double tagged traffic to the other leaf. This is still WIP


2. Out of Memory issue of onosfabric container

We observed out-of-memory issue on onosfabric container after keeping the Flex POD running for several weeks.

We’ll be collecting more data the next time we hit this issue.


3. Performance during creation of vSGs/vCPEs


We observed that the creation of vSGs/vCPEs is not the same while creating multiple instances at one time.

Let’s say we create 4 vSGs simultaneously.  The first vSG takes less than 1minute to get

Created and the successive instance creations varies between 10min - 15min.  


We’ll be re-testing the scenario and get requested logs.


4. Issue with dangling vSGs


For any reason when one of the vSG does not get created/or still waiting to get created. (Reason: Loss of connectivity to the compute node). When the internet connectivity is brought back to the computes, when at this time new instance creations also get hung because the older vSG is still hung to get created.  The only workaround is to delete the dangling vSG before creating any new vSGs.