Demystifying reinforcement learning and convolutional neural network

Amit Sharma, Electronic System Design Master student at the Indian Institute of Information Technology Design and Manufacturing, Chennai, India, spent the summer at Ericsson Research developing a mobile robot using Artificial Intelligence, Deep Learning and Internet of Things technologies. The robot was tested by performing tasks in an indoor plant-health monitoring system.

Students from many different fields join Ericsson Research for internships or thesis work. You could be next. Follow our blog to learn more about the students at Ericsson Research during the summer of 2018.

A Convolutional Neural Network (CNNs) is a deep learning technique that is being successfully used in most computer vision applications, such as image recognition, due to its capability to correctly identify the object in an image. Reinforcement Learning (RL) is another paradigm of Machine Learning (ML), where an agent learns to solve a non-deterministic problem by interacting with the environment following a policy and getting a reward for every action it takes. This process continues until the model converges to an optimal solution.

The work seeks to develop a robotic system that explores, sees, thinks and acts in an unknown grid world environment. A grid world, as an example, can refer to a simulated 3*3 matrix block created to map into a real-world scenario. The goal is for the robot to learn to reach a pre-defined destination – a grid – on its own, without explicitly giving the path. The robot can navigate from any given grid to a destination grid. We designed and developed a two-wheeled mobile robot with a 3-servo, 3D printed robotic arm mounted on it. We used stepper motors and drivers to get the precise traction. It is controlled by a Raspberry Pi 3 – a credit card sized computer with 1 GB of RAM running at a clock speed of 1 GHz.

RL and CNN for the mobile robot: mobility and object detection

The goal of the project work is to study the feasibility of implementing the RL principle in the development of the mobile robot cognitive system. The designed system performs a finite set of actions for learning an optimal path from any given place to the one specific destination in a grid world environment. It follows a uniform policy throughout the grid. This is illustrated in Fig.1.

Amit had the opportunity to pursue an internship under the supervision of Dr. M. Saravanan and Dr. P. Satheesh Kumar. After his passion for robotics, embedded systems and some related work was noted, he was assigned to develop and program a mobile robot. The objective for the project is to develop an autonomous mobile agent with a robotic arm. A challenge in building an autonomous system is simultaneous localization and mapping (SLAM) due to the computational cost of processing raw input from the sensors. The cost of the hardware required presents another challenge. As an alternative, it is possible to obtain the autonomous characteristics by using ML-based methods. Therefore, in this project, we shifted our focus to design an economic, efficient mobile robot with autonomous characteristics using ML algorithms.

The experimental setup for the localizing and mapping task for the mobile robot is shown in Fig 1. The environment is a 4×4 grid, where each cell is 1 foot square. The robot has three fundamental operations, namely, turn left, turn right, and move forward one foot. With these basic movements, the robot can maneuver in four perpendicular directions in 2-D space. Initially, in every grid, the mobile robot will have four possible actions to be picked with a probability of 0.25. To move from one grid to another, the robot should pick one action from the current grid and perform it.


Figure. 1 Cognitive State of the Mobile Robot Initially and after Learning the Environment.

After interacting with the environment, a sufficient number of times (more times corresponds to more accurate learning), the robot will have a complete idea of the optimal trajectory it should follow to move from the current location to the destination. A Q-learning module, a method in RL, is used to achieve autonomous navigation of the mobile robot. Every action executed by the robot earns it a reasonable reward (reinforcement), and an appropriately high reward is given at the goal. At each grid, the mobile robot will be a little greedy; it will pick an action that will grant it the highest reward in that grid such that the robot’s cumulative reward from start to finish would be the highest possible.

We used a deep convolution neural network (D-CNN) to identify the object at the destination. To train the network, we used the images of three objects taken from various angles and labels. The trained D-CNN model is loaded on the Raspberry Pi where it identifies the objects in the images taken from the camera.

Project work components

The mobile robot components for the project comprise the cognitive system, traction system, vision system and manipulator arm.


Figure 2. Project work components

The mobile robot vision system is a CNN model for a fixed number of classes. The goal is to identify the output (object) of a trained CNN model for a given number of classes. In our case, a model is trained for three classes of objects. The robot checks the presence of the target object and marks its location as the destination on the grid. Once it reaches the object, the cognitive system rewards the mobile robot with the highest reward for that learning phase. This process is repeated until the agent converges to an optimal path.

Thus, our idea of implementing RL for mobile robot SLAM in a grid world environment worked out very well in the given scenario. RL performance is pretty good and it does not take a developer to program the mobile robot for SLAM once the model is created and deployed into the environment. To test and prove the knowledge gained from the above implementations, we tasked the robot with monitoring and managing plant health in an indoor environment.

IoT-based plant monitoring system


The mobile robot developed with the above techniques is deployed and studied for monitoring plant health with a robotic arm mounted on a wheeled chassis. The robotic arm is calibrated for two end-effector points, namely a pick-point and no-pick-point. As the mobile robot reaches the plant pot, the robotic arm comes into action and reads off the parameters affecting plant health using sensors mounted on the end-effector point of the robotic arm. The sensors are used to measure soil moisture, ambient temperature, and humidity. To incorporate the IoT concept, we used Ericsson’s APPIOT platform for registering the device and sensors for real-time monitoring of plant health.

Experience at Ericsson

It’s been a rich experience at Ericsson Research in Chennai. I learned a lot – everything from personal skills development by interacting with people who are adept in so many things to an introduction to machine learning, which I hitherto had little knowledge of, being from an electronics background. I believe that the skills I’ve developed here will help me further advance my career in the near future.

The Ericsson blog

In a world that is increasingly complex, we are on a quest for easy. At the Ericsson blog, we provide insight, news and opinion to help make complex ideas on technology, business and innovation simple. If you want to hear from us directly, please head over to our contact page.

Contact us