A day in the life of a Telecom BSS engineer: cloud and culture

Cloud native technologies can reduce IT costs by using hardware more efficiently, allowing applications to self-heal after a failure and auto-scale to respond quicker to sudden peaks in usage. All of this can be achieved if the application is designed to be truly cloud native, and if the infrastructure allows for it. The technologies evolve in a well-structured ecosystem where cloud giants like Amazon, Google, Uber and Netflix and collaborative open source communities regularly pitch in and provide best-practices. As a developer, it also means I have access to a vast amount of knowledge and resources, compared to using home-brewed internal tools that, to be honest, aren’t always very well-documented.
Historically, telecom BSS systems tend to run for several years getting only the most necessary correction packs and security fixes. Consequently, a system upgrade to get new features can be a huge undertaking. It often takes months to prepare and execute and involves a sizable amount of people. This is a costly and cumbersome process with many manual steps across the interconnected flows and dependencies that form the fabric of the BSS. This way of working has existed for decades, which means that organizations are built around it, and a culture of “if it works, don’t fix it” is commonplace. This of course stems from the “telecom grade” requirements on service availability, and most failures comes from change. But the remedy is – paradoxically – to make changes more frequently!
It’s not just about the technology itself though; an important aspect of cloud technologies is that they enable and encourage a constant flow of features and corrections into the live applications. Often multiple deployments per day, directly into the production system and without disrupting the service availability. Most of the cloud technologies have been engineered with exactly that in in mind.
This is also leveraged in our daily work; I can deploy a complete test environment on my laptop in a matter of minutes. Before the cloud-native transition I would need a handful of big VM instances at my disposal and at least one hour of waiting for installation and configuration to finish.
In my cloud deployment; whenever I want to bring in a new set of changes it’s just a few commands, I wait a couple of minutes, and I’m good to go again with the latest code. Important to mention is that the very same commands would be used to deploy the system at a much larger scale in a production environment at a customer site, which really helps to bridge the gap between development and operations. This technology really comes with DevOps built-in from start!
While this makes my life as an engineer in a development unit a lot easier, it can for sure make life easier for our internal quality assurance (QA) and all the way to Operations at customer sites. My team and I often wait for months before getting feedback on how our features perform outside the development unit, but I believe that this new technology can be an enabler for true agility.
Cloud technologies have a completely different speed of change
Take for example, Kubernetes, the de-facto standard container orchestrator, there are normally only patches for 18 months after the initial release before a release approaches end-of-life. That’s an end-of-life after only one and a half years! Kubernetes, of course, supports zero downtime upgrades if you are coming from a supported version, but in practice it means that you must upgrade the platform your business-critical applications are running on at least every 18 months – even if it is working perfectly fine and you haven’t requested any new functionality. Additionally, the applications running on Kubernetes will track these releases as well, meaning that they too must be upgraded. They might even require a feature from the latest Kubernetes version, so keeping the entire stack as fresh as possible becomes important.
This rate of change might scare a lot of people, and even deter them from embarking on the cloud native journey. But fear not – this is also addressed by the cloud native toolchain: the answer is Network automation. The rationale is that taking many small steps, often, is a much safer and smoother ride than taking a few seldom giant leaps. To be able to keep pace with the stream of new software versions we need to take some help from the machines and automate everything! Manual intervention must be eliminated or kept to a minimum, and humans should only be involved to take the business-critical decisions. Computers should and can do the hard and repetitive labor, and that is what they do best. With tools like Jenkins and Spinnaker the tasks needed to prepare and deploy new software become consistent, repeatable and safe.
So, how will this work? Let me give an example: As an ICT operator, you pull the latest new version of your installed product from Ericsson, for instance Ericsson Charging, and deploy it into a production-like staging environment where you thoroughly examine the system with realistic test suites (upgrade, rollback, performance, functionality, etc.) and any other tools you need – all in an automated fashion. If the new version passes the acceptance test, it can be deployed into production, but regardless of the outcome the result is fed back to Ericsson for analysis. This pull-test-feedback cycle is what we refer to as a “pipeline”.
Before we publish software and make it available for a customer pull, it has already passed numerous pipelines within Ericsson. The first pipeline is triggered when a developer makes a single change to the software, and it in turn triggers another one, and so on. Each new pipeline adds some more context to the system and one or many aspects of quality assurance. If anything fails it is immediately fed back for troubleshooting and the change is blocked from going any further. Eventually the change has been scrutinized both from functional and non-functional angles, making it a candidate for release. The release decision is taken after every pipeline, and always by a human; maybe several changes need to be delivered together for a new feature to be complete, or some critical correction from a different part of the system must be fast-tracked through the entire chain.
The use of pipelines requires a completely new mindset and the eradication of silos
Since it is no longer possible to “throw new software over the wall” for some other organization to pick up, this new mindset must be established on the recipients’ side too. The pipelines run horizontally from multiple development units and combine to form complete Ericsson products that are conveyed all the way to the customers. This effectively breaks the silos that have been induced by the current culture, and to keep the steady flow of software up, collaboration and close cooperation is not only needed, it is essential!
I believe that this new way of working is taking effect as a catalyst for a series of beneficial changes and has already bred a new culture that gives us as developers a better end-to-end understanding of the Ericsson products and the customer needs – a culture that embraces the cloud-native concepts of agility, elasticity and continuous improvement and leverages them to create a world class experience for everyone.
Want to know more?
Read the lastest blog post The journey to a cloud BSS
Learn more about Telecom BSS
Learn about CI/CD
RELATED CONTENT
Like what you’re reading? Please sign up for email updates on your favorite topics.
Subscribe nowAt the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.