Democratizing AI with automated machine learning technology
As AI becomes more implemented into global systems, there are also notable challenges, particularly the accessibility of AI and the data science involved. Here, we discuss why democratizing AI is an important step in AI’s development, and how a citizen data science platform may just provide the key.
Today, companies across society are applying AI to optimize internal processes to improve the quality and performance of their existing products, to design new products and/or to further optimize the workforce. AI has proven to be critical for managing and predicting operations of a telecommunication network. However, most of the time, AI is restricted to data scientists and data analysts who are specialists specifically trained in AI. At the same time, it’s the subject matter expert, i.e., experienced engineers and technicians who have the expert knowledge in a specific business or technical area. They generally also own the data.
Why not make AI more accessible to citizen data science users?
One way of bringing AI closer to the subject matter expert (SME) is by democratizing AI. Democratization allows for easy access by making AI available to every person in the organization. In this manner, AI can be used to leverage the skills of experienced engineers and technicians. These SMEs then become citizen data scientists.
In order to design a machine learning (ML) model to perform a specific task, a data scientist will have to prepare the data, perform feature engineering, choose amongst different algorithms, and tweak the hyperparameters before any training can happen (see Figure 1). These are time consuming and laborious tasks. One of the solutions is to automate these machine learning processes.
Automated machine learning (Auto-ML) technologies, or AutoML, refers to the automation of iterative and often time consuming tasks of machine-learning model development. It helps save time and additional effort for data scientists, analysts, and developers to build high-scale and efficient ML models while retaining the model quality.
The traditional ML model development approach not only requires domain knowledge, but also requires a good amount of time to produce and compare the number of created models. On the other hand, AutoML accelerates the process of creating production ready and efficient ML models. AutoML allows SMEs to make data-driven decisions. For example, when monitoring assets in the supply chain and logistics, or in a radio network, the SME has the knowledge of how and what every asset is measuring, as well as the critical performance indicator values. AutoML allows SMEs to not only monitor the output of various data sources, but to employ advanced algorithms and ML to take action from real-time insights.
Figure 1 below summarizes the high-level view of traditional machine learning flow, in relation to AutoML.
The citizen data science platform
AutoML technologies are at the heart of the democratization of AI. Let the expert be the expert and equip them with the necessary tools to become a citizen data scientist.
AutoML solutions bring together the data engineer, the data scientist and the SME, as they allow the citizen data science user to upload their data into the AutoML platform, build hundreds of models based on the existing ML algorithms and choose the best performing model. The citizen will then analyze the results to take educated decisions.
Trained data scientists play a crucial role of oversight and assurance to ensure that the models provide reliable insights, retraining the model when needed, maintaining the lifecycle of the model, and crafting custom algorithms. Recent work provides benchmarking scores for current state of the art AutoML, and evaluates the performance against data scientists. Specifically, the authors show that automated frameworks perform better than or equal to the ML community in seven out of 12 OpenML tasks.
One strategy for achieving AI democratization is by building a citizen data science (CDS) platform. The CDS platform enables SMEs and domain experts to work like data scientists, while taking us closer to the data and AI democratization vision. Figure 2 shows the high-level system flow envisioned by the CDS platform.
At the core of the architecture (depicted as blue boxes in Figure 2) we find the AutoML functionality: data preparation, feature engineering, model selection and deployment and data visualization. Data (raw data or data in a datalake in Figure 2) is gathered from internal processes at Ericsson, or from networks, depending on the data insights we would like to gain, it goes through the AutoML machinery, and at the end of the workflow, the citizen data scientist, or SME will be able to make data-driven decisions by looking at the modelled data. The CDS platform also offers a user community and includes an academy to learn from previous use cases and models that have been built.
A very important parameter in this equation is the data. Within the context of big data we can classify data as:
- structured, for example, databases, data warehouses, enterprise systems
- unstructured, such as analogue data, GPS tracking data, audio/video streams
- semi-structured, for example, XML and email.
From the origin perspective, we can also classify data as enterprise data and customer data. Enterprise data is data that we generate as a company. This includes data from the supply chain, purchase orders, contract signing, as well as data collected by our radio equipment. Customer data includes data collected from our customer’s networks for fault isolation, predictive analytics, and so on. This data can be on-prem or cloud born data. Whatever the data type, it is accessible from the datalake and datapipe to which the CDS platform connects to.
Another important parameter is to communicate information clearly and efficiently through data visualization (end of the flow in Figure 2). Data needs to be visualized in the shape of a graph or visual object that captures the relationship between different parameters in the dataset under analysis, so we can reason about data and evidence. Data visualization needs to be functional to allow the user to tell real-time stories about the business.
More insights into the CDS platform
Zooming into the core functionalities of the AutoML part of the CDS platform, we find data preparation, features engineering, model selection, deployment and operation, and data visualization. Data preparation involves cleaning, reformatting, correcting and reshaping raw data prior to the processing and analysis of it. This is a prerequisite to eliminating erroneous data that could lead to wrong conclusions.
In the next step, it’s crucial to identify the features or attributes that any analyses or predictions will be based on. In this step, data mining techniques might be used to highlight the target variables.
Once the features and/or cross-features have been identified, a series of ML algorithms will be run on the data producing hundreds of models. In this step, the citizen user can choose from the best performing models and analyze the data further. In the final stage, the selected model is deployed and put into operation. Once the model is in production is when it starts adding value.
In a nutshell, the data science platform and AutoML functionalities provide many advantages such as easy implementations to deliver quick results, and enables SMEs to work as data scientists. However, while the ML models are implemented using data science platforms, the platform itself mostly remains a black box for the users, i.e., the coding part of the data science job is hidden, and often makes the citizen data scientists over-reliant on the automation, avoiding the creative exploration aspects. We shall continue to carry out sanity checks and ask strong questions while automating the pipeline and flows. It’ll ensure that we get maximum benefit of the data science platforms, as well as mathematical expertise of data scientists and SMEs.
Automated machine learning: enabling AI to scale
A CDS platform enables SMEs and domain experts to work like data scientists. It takes us closer to the data and AI democratization vision. We’re quickly moving towards the next revolution with AI and its democratization, which enables the organization and society to use data for faster growth as well as common good.
In this blog post, we discussed how the citizen data platform can help democratize AI. We showed that, CDS is an end-to-end analytics/ML tool that enables SMEs to use analytics/ML to gain insights, make better decisions on their data, and enlarge Ericsson’s data science capacity beyond the data scientist. It is also a collaboration platform that can drive collaboration efforts and drive down costs from data-to-insight and cut down the time-to-value.
Finally, a platform that enables automation and the reuse of recipes, workflows, algorithms, parameter tuning and the analytics/ML application models across market areas, business areas and group functions not only drives down costs, and improves the time to market, but opens up a business innovation opportunity to reuse and sell our organization capabilities to our customers .
Want to learn more?
Find out more about AI by Design – AI solutions solving the right problems.
Read out previous blog post, How can we democratize machine learning on IoT devices?
What does data-driven mean exactly? Here’s our introduction to data-driven network architecture.