Learning to fly
Flying drones with the help of Reinforcement Learning has been the task this summer for Jonas and Sven, two students doing an internship in the Machine Intelligence and Automation research area at Ericsson Research in Kista, Sweden.
Students from many different fields join Ericsson Research for internships or thesis work. You could be next. Follow our blog to learn more about the students at Ericsson Research during the summer of 2018.
The project: flying drones using Machine Learning
The basis for the project is explained in an Ericsson white paper on AI and Machine Learning in next generation systems:
The combination of billions of connected devices, petaflops of computing resources and advanced communication capabilities that enable real-time interactions is leading to the creation of systems on a scale and level of complexity that is beyond the ability of humans to fully comprehend and control. Management and operation of such systems requires an extremely high degree of intelligent automation.
On the upside, the growing scale and number of interactions leading to increasing system complexity also enables vast amounts of data to be gathered from different parts of the system. The data can be fed into models and methods to derive deep insights that can be used to optimize and tailor the behavior of the system.
During the summer, we explored these ideas in a greatly scaled-down setting. We used Crazyflie 2.0 drones and experimented with making them fly autonomously using Machine Learning instead of classical control theory. The objective was to demonstrate the ability to train an agent in a simulation before transferring it to a real-life setting.
One area in Machine Learning is called Reinforcement Learning where a software agent learns how to act in an environment with the goal of maximizing a cumulative future reward. A common approach is to model the environment as a series of states and actions (called a Markov decision process). The agent then tries to learn by interacting with the environment in order to find the policy that maximizes the expected future reward.
The agent interacting with the environment and receiving a reward for its action.
Given enough time, the agent is sometimes able to reach a state of superhuman performance, where it effectively performs better than any human expert ever could in making decisions. This opens the door to many interesting high-impact improvements in telecommunication systems where operators and end users alike have a lot to gain.
However, one weakness with this method is the way the agent must first explore the environment to begin making good decisions. Initially, the agent makes decisions that appear to be random before it gains enough experience to know how to act. This makes it hard to learn solely by interacting with reality. For example; it is not practical to let a self-flying drone crash in order for it to learn how to fly.
The same challenges also exist in telecommunication networks, except that the stakes are higher. In order to be a viable option, the agent has to perform better than the solution already deployed, right from day one. The agent must also guarantee that the actions it makes does not affect the stability or performance of the network in any negative way, but in order to learn, the agent has to explore actions that affect the network in both good and bad ways.
Instead of letting the agent train in the real world, we let it train in a simulation, where nothing severe happens if it makes a bad decision. The computer’s ability to perform thousands of simulations during the time it would take to complete a single run in the real world is an added benefit. However, since no simulation is a perfect model of the real world, moving the agent from simulation to reality can be somewhat challenging. In the case of autonomous drones, challenges might include ground-effect and turbulence as well as communication delays or noisy and inexact sensor data.
A drone training in a simulator.
During the summer, we experimented with this transition from simulation to reality. The agent was trained in a simulator tailored for Crazyflie 2.0 in order to learn the basic actions. We then worked on transferring the trained model to a real Crazyflie drone in order to demonstrate Reinforcement Learning as a viable method of dealing with complex problems in the real world.
A Crazyflie 2.0 drone from Bitcraze hovering.
Machine learning is an area where a great deal of research is currently in progress. This can sometimes make it hard to keep up to date with the latest changes, and when problems are encountered there are rarely any definitive solutions as many of the problems are ongoing research problems. This can sometimes be challenging because even problems that appear simple on the surface might require very intricate, complex solutions. When problems do occur, it’s not always possible to find a documented solution, but instead we have to rely on our intuition and creativity.
Working at Ericsson Research
Working at Ericsson Research is really stimulating. You’re constantly exposed to new things and ideas you’ve never worked with before. It’s a very rewarding work place where you never run out of stuff to do and ideas to work on. This makes it a very rewarding way to spend your summer, getting exposed to new technologies and working in a smart team.
We usually spend our days in the lab debugging our machine-learning models and thoughts. Working in a creative, relaxed environment with interesting, inspiring colleagues makes the days fly by.
Jonas and one of the Crazyflie drones
About Jonas and Sven
Jonas and Sven are students from the Royal Institute of Technology in Stockholm, Sweden, pursuing degrees in Computer Science and Engineering. They live in Stockholm and enjoy spending time with their friends and family’s. When not working, Jonas spends his late evenings bouldering or researching subjects which pique his curiosity. Sven appreciates going for the occasional run or swim, weather permitting.