Container-based network monitoring for microservices

Containerization technologies are major enablers of the widespread use of microservices. Yet monitoring remains a challenge to be addressed in order to facilitate the adoption of containers in production environments.

Container-based network monitoring features

We present ConMon, a framework for zero-touch container-based monitoring of network performance for microservices.

Most existing monitoring tools for containers provide insights into runtime metrics, such as memory and CPU usage and network I/O metrics. However, inherent lightweight support for measuring other metrics – such as passive monitoring of the network performance of the containers that are communicating with each other – is still missing.

The challenge of monitoring containers

Containers provide lightweight virtualization with much lower overheads and system-resource consumption compared to Virtual Machines. They are ideal for realizing microservices, where applications consist of many small and highly optimized processes that run inside isolated containers and communicate with each other via an API.

These containers can be started, stopped, moved dynamically and can change frequently. Therefore, a monitoring solution should be able to keep up with the lifecycle events of containers and adapt monitoring to whichever containers are communicating with each other. The dynamic nature of these containers introduces challenges for zero-touch monitoring of the end-to-end network performance of microservices.

ConMon framework

ConMon is designed by us in the cloud management team at Ericsson Research. ConMon enables network performance to be monitored either passively or actively with automatic instantiation and configuration of monitoring functions. Passive network monitoring requires observing the traffic of applications. In our framework, we have developed a setup where functions that monitor the network traffic between containers are deployed in monitoring containers adjacently to and interconnected through a virtual switch with the monitored containers.

Figure 1: ConMon architecture for passive network measurement.

Figure 1: ConMon architecture for passive network measurement.

The monitoring functions inside the monitor container in each host receive a copy of the application packets, such as by using port mirroring on the virtual switch, and can perform filtering and sampling of the packets before using them for measuring network performance metrics. In this setup, the sender and receiver monitor functions can also exchange synchronization and control messages required for monitoring, like for initiating measurement sessions and calculating metrics such as delay between sender and receiver application containers.

ConMon: Advantages and Accuracy
Running monitoring functions in a separate container instead of running them inside the monitored application container has several advantages. The main advantage is that monitoring is done transparently and isolated from application containers. This means that failure of the monitoring process will not adversely affect the application, and the monitoring functions can be updated, re-programmed or changed on-the-fly. This also means that a single monitoring container can be used to monitor multiple application containers in contrast to instrumenting each individual application container with monitoring functions.

When it comes to accuracy, we have investigated the accuracy of the measurements in our framework, by comparing the timestamps obtained for packets observed inside the application containers and the timestamps obtained from inside the monitor containers to calculate the measurement time stamping errors.

Figure 2: Time stamping errors on the sender host (left) and the receiver host (right) for different data send rates.

Figure 2: Time stamping errors on the sender host (left) and the receiver host (right) for different data send rates.

Our experiments on a test bed running Docker containers and Open vSwitch switches have shown consistent and manageable monitoring errors, validating the feasibility of our approach.

The main source of error is the switch processing time on the sender side which is higher for the first packet of each new flow (100-300 microseconds), but is very low and stable for the rest of the packets of the flow (Figure 2); at the sender side the error is in average 10-20 microseconds, while at the receiver side the error is only a few microseconds.

If you want to know more information about our measurement setup and results, please read our paper presented at IFIP Networking in May 2016.

The Ericsson Blog

Like what you’re reading? Please sign up for email updates on your favorite topics.

Subscribe now

At the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.