Skip navigation
Like what you’re reading?

How much telecommunication knowledge can a small language model retain?

  • Generative AI and large language models can be very useful for supporting trouble shooting and solution finding, but require substantial compute and storing resources.
  • This blog post explores if models can be shrunk and still effectively handle both general knowledge queries and telecommunication questions.

Principal Researcher, AI systems

Hashtags
Hashtags
#LLM

Principal Researcher, AI systems

Principal Researcher, AI systems

Hashtags
#LLM

Generative AI has taken the world by storm, demonstrating impressive capabilities in generating text, images, audio, and video, and solving difficult tasks. In telecommunications, this technology has become indispensable, enhancing several key areas. It serves as an unflagging assistant in Customer Experience and Support, helping users navigate the network effortlessly. In Network Optimization and Management, it automates network operations for greater efficiency. Additionally, it bolsters security by detecting emerging vulnerabilities within networks, ensuring robust protection. 

GenAI has the potential to support technicians in identifying faults and providing solutions. Integrating GenAI with mobile phones would be beneficial for field engineers addressing issues in a mobile network, particularly when working in remote areas without coverage. In this post, we will delve into the challenges and possible solutions of using Gen AI in such situations.

One class of Generative AI is large language Models (LLMs) which focuses on generating text. The most popular type of LLM is the generative-pretrained transformer (GPT). 

How does GPT work? 

In simple terms, a GPT consists of several blocks called transformers that analyze text. Text is divided into small segments known as tokens. Transformer blocks can contextualize text tokens and determine which words in a sentence are important. The final layer of the GPT, also known as its output, consists of all possible outputs –words – that can be used to complete a sentence. The figure below offers a simplified visual of a GPT-based LLM.

visual of a GPT-based LLM
Input tokens on the left are four words: This, is, a, sentence. These feed into blue boxes representing Transformer blocks. The outputs of the Transformer blocks connect to a grey box representing the Output Layer block. The final output from the block are four words with percentages indicating the probability that it is the correct next word. The output words are: about 69 percent, regarding 22 percent, forward 8 percent, even 1 percent.

If a GPT is trained on data sets that use the English language and if we consider a typical English dictionary to contain roughly 470000 words, the output layer would have 470K outputs. As indicated in Figure 1, each output contains a number, which is the probability for that word to follow a given sequence.

LLM challenges

LLMs are indeed impressive, often capable of producing entire articles, such as those from Wikipedia, by using only a couple of  words from the article’s initial sentence as input.  Give it a try using your favorite LLM!

However, they rely on a large set of parameters that must be learned  to ensure the accuracy of these outputs. A parameter can be considered as a “knob”, which has to be turned just the right way to produce (or estimate) output correctly.  LLMs these days may have billions of such parameters. The process of learning those parameters often lasts several weeks and requires substantial compute capability and energy consumption.

The result of this expensive process is often too large to be stored in a small device such as a smartphone. A simple calculation reveals that an LLM like Llama, with 70 billion parameters, would require about 168GB of memory. In contrast, a smartphone has only about 8 GB of random access memory (RAM). Therefore, it would take 21 smartphones to host such a model using RAM.

Since LLMs require significant compute, they are usually deployed in data centers and accessed as-a-service. This setup enables smaller devices to communicate remotely with the model, and ask relevant questions. 

Consider now the field engineers we mentioned earlier – a relevant use case for Ericsson and many other businesses. Field engineers often work in remote environments with poor connectivity which makes it difficult to connect to a data center. This user group could benefit greatly from small AI assistants to help them with their tasks. 

Shrinking the model

Would it be possible to use less memory to store LLMs? If shrinking such models, can they still understand English? What about telecom acronyms such as UE (User Equipment), gNB (next generation Node B), and RRC (Radio Resource Controller)? 

As part of our research, we worked with a group of students from Uppsala University to address these questions. Collaborating with student teams to study specific aspects of a research question is something we do on a regular basis. 

Small model evaluation 

We selected two small language models, known from the literature: Flan T5-Base and Flan T5 Small, and tested them for general knowledge and telecommunications.

The general knowledge evaluation focused on their ability to understand English and common information from Wikipedia.

Telecommunications testing used information from 3GPP standardization to evaluate their understanding of the telecommunications domain. 

Model Parameters (Million) General Knowledge 
(0=worst, 1=best score)
Telecommunications 
(0=worst, 1=best score)
Flan T5-Base 250 0.7033 0.8683
Flan T5 Small 80 0.1419 0.8310

The table above illustrates that a trade-off is possible. A Flan T5 Small model with 80M parameters can learn to answer telecommunication-related questions while losing general knowledge. On the contrary, Flan T5-Base, with its 250M parameters, leverages its size to effectively handle both general knowledge queries and domain-specific questions, in this case telecommunications specifically. 

The evaluation took place using two datasets. For general knowledge, the Stanford Question Answering Dataset (SQuAD) was used, while for telecommunications, we used TSpec-LLM. The scores were obtained using the sentence text similarity metric. This metric operates by representing sentences as vectors and then measuring their distance. A sentence as a vector can be viewed as arranging words on a map where words with similar meanings are placed closer to ones that have different meanings. By measuring their distance, we can identify how well an LLM answers a given question.

LLM
The image is a line graph with three vectors. A blue vector at about 30 degrees represents the real answer: The acronym UE stands for User Equipment. A green vector at about 45 degrees represents the LLM response: A UE is short for User Equipment. A red vector at about 205 degrees represent an incorrect yet convincing response: UE means User Entertainment. Distance between the blue and green vectors is indicated.

Our student collaboration

Ericsson Research and Uppsala University have a long history of collaboration within the scope of the computer science project course. As industry professionals, we present a challenging problem to a group of brave students. Within this premise, our guidance combined with the supervision provided by the university enables the students to self-organize and tackle this challenge. 

Team: Oguzhan Ersoy, Konstantinos Vandikas, Mahtab Mirhaj, Yixin Huang, Shiqi Shu

Teaching Assistants: Xuezhi Niu, Chencheng Liang

Supervising Professor: Didem Gürdür Broo

In the context of Project CS, students get to work as a team on a challenging task, received guidance from  Uppsala faculty and Ericsson Research.

Student collaboration - group photo of the team.

Read more

Project CS report

Read earlier blog posts on this topic

 

The Ericsson Blog

Like what you’re reading? Please sign up for email updates on your favorite topics.

Subscribe now

At the Ericsson Blog, we provide insight to make complex ideas on technology, innovation and business simple.