Foundation models and LLMs present new challenges to trustworthiness. The training data used for LLMs is often unknown, yet it fundamentally determines model behavior and reliability. Often, that data comes from unreliable or potentially malicious sources, such as scrapes of the public internet. LLMs are inherently non-deterministic, which means they can generate different text for the same input, may generate factually incorrect content, and can be sensitive to the phrasing of the input or prompt. Even with large context windows, their performance degrades as the context length increases [11]. Architectures of LLMs are often unpublished, which makes it difficult to explain their behavior. LLM explainability remains challenging, even if the model architecture is known, because of its black-box architecture, billions of parameters, and emergent behavior.
One approach is to ask the model to produce step-by-step natural language explanations for its output; however, this is prone to common LLM reliability challenges. Attention visualization shows the relative strength of attention weights between the tokens in the model and can provide insight into which tokens were used, or given attention, when predicting the next token.
Both model accuracy and explanation quality can be enhanced by integrating knowledge bases that include semantic graphs and ontologies. By explicitly modeling the semantics of telecommunications concepts and storing actual network configurations within this framework, we can more effectively ground LLMs in the telecom domain and, for example, improve the accuracy of telecom-specific reasoning. This approach allows for the generation of explanations based on well-defined concepts derived from the underlying ontology.
As with traditional ML, LLM training data may include sensitive content such as PII, raising similar privacy concerns. The model can also absorb sensitive data if the system is structured such that it learns continuously from user input. Even if such data is de-identified, the analytical and highly correlative nature of the LLM opens the possibility of using it to attack that de-identification. This may be an easier attack to mount for an LLM than for a traditional ML model because of the typical conversational nature of LLM systems. Bias in outputs is also a concern, since the model will generally reflect any bias in the training data. Researchers have shown that only a small amount of training or fine-tuning data needs to be affected to impart bias to the overall model [13].
Implementing a data governance framework is essential for training, fine-tuning, testing, and using LLMs, so that their outputs can be trusted. Data sources must be trustworthy, collected data must be properly curated, and licenses, intellectual property, and copyright requirements must be respected. Data retention policies and regulatory compliance are mandatory.
End-to-end data traceability or data provenance must be implemented from data sources to their use in training, testing, and inference. It is essential to be able to assure the desired data quality. Because LLMs may regurgitate training data, anonymization of training data, removal of PII, and implementation of data leakage prevention mechanisms are necessary. If Retrieval Augmented Generation (RAG) or knowledge bases are used, the data governance framework should cover them as well.
LLMs share the same security concerns as any large application and introduce some unique ones. One notable attack is prompt injection. The input fed to the model consists of three parts. The system prompt is instructions to the model written by its creators or deployers. The user prompt is instructions or questions to the model from the user. User data is any data provided for the model to process. These three components are concatenated and fed to the model as a single stream. As a result, if the data, for example, contains instructions, the model may interpret them as such and act on them. These instructions might override the system and user prompts and cause the model to act incorrectly. There is particular danger when the user data is retrieved from an untrusted source, such as content pulled from the internet. Similarly, the user may attempt to put content in the user prompt that overrides controls or guardrails in the system prompt. Prompt injection is an example of a classic vulnerability known as confusion of control and data channels, where data is mixed with instructions and therefore interpreted as commands.
The only sure prevention for prompt injection would be to separate the control and data channels, but the inherent nature of LLMs makes this impossible. Various techniques have been devised to protect against prompt injection, but none have been found to work with a high degree of success [14]. Best practices currently involve using a layered defense strategy, based on a risk analysis. Controls include instructions in the system prompt to constrain model operation (also known as guardrails), filtering of the user prompt, restrictions on allowed data sources, filtering of user data, using multiple models separated into trusted and untrusted zones, and human-in-the-loop architectures where outputs are checked before use. In situations where model output can affect other systems or the physical world, for example, in agentic systems, surrounding the model with deterministic or algorithmic code that constrains its actions can be useful.