Magical Generative AI buzzwords and where to use them

“Our RAG application integrates seamlessly with AI agents and utilizes Knowledge Graph technology to autonomously fulfill user requests.” – If that sentence sounds confusing, you’re not alone. Many customer inquiries mix these buzzwords, leaving especially non-technical audiences overwhelmed by the vast array of available technologies. That’s why Niklas Frühauf, Data Scientist at sovanta, is here to cut through the noise. He’ll clarify these concepts, bring structure to the buzzword jungle, and, most importantly, highlight the real business value and future use cases they enable.

LLMs and their limitations

Let’s start simple: Large Language Models (LLMs) such as ChatGPT, Gemini and recently DeepSeek are the main building block for most of today’s Generative AI approaches. At their core, they are trained on terabytes of public text data, rendering them capable of “generating” text based on arbitrary input “prompts”. Due to the nature of their training process, they are unable to recall information that was made public after their training was completed, and obviously also unable to recall information that is not public, such as company-internal guidelines and documents. Instead, they will in the best case tell you that they cannot answer the question, and in the worst case confidently “hallucinate” an incorrect answer. Asking ChatGPT about an existing contract with sovanta thus is most likely not going to give you any meaningful information. Sure, you could fine-tune these LLMs on company-specific data, but many studies show that unless you have loads of high-quality documents available, this will degrade their overall performance. Add to that the rather high time (and cloud costs!) required to do so, and it quickly becomes infeasible, especially for smaller- or medium-sized companies.

Shift to Retrieval-Augmented Generation (RAG)

As a result, companies can shift to an approach called “Retrieval-Augmented Generation” (RAG) or “document grounding”. Basically, your internal documents are first indexed into a vector database, using a specific “embedding model” which turns each section of the document into a list of numbers that represent the semantic content of this “chunk”. Whenever a user wants to ask a question that may be related to these documents, the question is also embedded (using the exact same model as during indexing), and the chunks with the highest semantic similarity are considered further (“retrieval”). Next, these chunks (sometimes also called “context information”) are passed to the LLM together with the user question, enabling answers based on internal data as well as allowing the user to check the contents of files that were found during the retrieval process, providing much-needed transparency. In our example, indexing your contracts with RAG would then allow a user to ask questions such as “What’s the title and duration of the contract we have with sovanta?” without the need for LLM fine-tuning.

What’s more, it is technically feasible to self-host all components (Ollama for Open-Source LLMs, Transformers + ONNX for embeddings), thus reducing the dependency on 3rd party vendors. Our GenAI DocumentChat follows this technological approach, too, relying on SAP-hosted Open Source Models via their Business Technology Platform (BTP), and leveraging the SAP HANA Vector Engine for quick document retrieval.

GenAI DocumentChat

Our GenAI DocumentChat lets users “chat” with your internal documentation, legal texts, compliance guides, and other complex documents.

Next: Knowledge Graphs

However, this classical RAG approach has drawbacks: What if the answer to a user question requires more than let’s say the top 10 most similar chunks, instead running aggregations on top of them? Imagine having loaded all your PDF-based contracts with hundreds of thousands of sections into your RAG system, just for a user to ask a question such as “How many contracts do we have with companies in Berlin?”. This can be addressed using what’s often referred to “Knowledge Graphs” or “Triplet Stores”, an approach founded in 90s research on “Semantic Web”. In addition to (or instead of) using classic RAG with embeddings, all documents are turned into a graph representation, where the nodes usually represent concepts/entities, and the edges represent relationships.

In our above example, you could have a node for each contract partner (“sovanta”), connected to each legal entity or business partner (e.g. “sovanta Germany”), in turn linked to the city (“Heidelberg”) and a node for each contract (“GenAI Discovery Workshop”). The neat part: You don’t have to manually generate this graph; instead you can ask the LLM to extract those nodes and relationships from each document while they are being indexed. Using specific Graph Databases (Neo4j, the upcoming HANA Knowledge Graph Engine and others) together with a specific query language (such as SPARQL or Cypher) allows you to quickly insert and search for complex graphs. Tasked with a user request such as the one above, the LLM is used to turn it into it’s corresponding query representation:

SELECT COUNT(?contract) WHERE { ?company a :Company ; :hasLocation ?location . ?location :locatedIn "Berlin"^^xsd:string . ?company :hasContract ?contract . }

This query can be executed against the graph database, returning the correct number to the LLM to create a final answer for the user. As you can see, Knowledge Graphs are mainly useful when you are working with a set of similar documents that have a somewhat clear domain data model, but which are not yet available in tabular/structured form in any of your company systems.

Function Calling and Tools

But what if you already have your data in structured format somewhere? Or you have an external system that has a clear-defined REST API to access? In these cases, it may make sense to switch to the method of “function calling” (or user-provided “tools”). Ideally, you could also hook up not one but multiple functions to the system, allowing the user to ask questions such as “What was the last purchase order we signed with sovanta?” but also “Who’s the client partner for sovanta, and what’s his phone number?” in the same context. The approach is similar to the one using Knowledge Graphs – First, the LLM is presented with the user question and available tools/functions, then selects a function to execute together with the function inputs, runs the function and returns the result back for a final answer.

Technically speaking, a Knowledge Graph lookup is also simply a function! In our example, we may have two functions: Firstly, the “lookup_purchase_orders” one (expecting e.g. a filter argument) and the “lookup_business_partners” (the same). The initial LLM call would then decide to execute “lookup_purchase_orders” with “filter=vendor LIKE %sovanta%” order by createdAt DESC LIMIT 1”, which in turn calls the responsible system, producing the correct result and retuning that for a final user answer. This approach also allows the LLM to trigger e.g. business processes and workflows, open tickets, and many more options, making it a prime approach for supporting end to end user requests with a clearly defined scope.

Spotlight on “(AI) Agents”

Sadly, requests from user are not always straightforward, and sometimes require cross-referencing results from various different tools. Additionally, they may refer to specific edge cases that cannot be solved by a simple function call. What would you do if a user uploads a vendor offer and asks the system to create the corresponding purchase requisition? Obviously, the exact approach would depend on your corporate guidelines and the system that is used for managing the purchase requisitions. You may have different fields that are required for service or for product requisitions.

This is where the approach of AI Agents shines – instead of having a very narrow system that can only fulfil straightforward requests, you simply rely on LLMs to autonomously plan and iteratively execute various tools to ultimately fulfil a user request. The output of the plan step for our example could look something like this “First, I need to check the PDF to extract line item information. Then, I need to check the company guidelines to verify if additional information is needed. Next, I need to find the system that stores the purchase requisitions and fetch the documentation of its external REST API to find the correct format. Once this is done, I need to re-check required fields and ask the user for clarification and approval. Only then I can send the final HTTP request.” Keeping track of internal state, the Agent(s) iterate(s) until they have fulfilled their task, showcasing an ability to react to errors and incorporating new knowledge as it is uncovered.

AI Agent Starter

AI agents are intelligent software solutions powered by artificial intelligence, designed to automate complex tasks, make decisions, and independently optimize processes. They can be applied in areas such as customer …

Summing it up: Generative AI buzzwords

That was a lot of information at once. Let’s quickly recap on a high level:

  • Standard LLMs are quick and cheap, but only work on publicly available data, and fine-tuning them usually doesn’t make sense
  • Retrieval-Augmented Generation can be a cheap way to ground LLM replies in your company data, reducing hallucinations, but breaks down when faced with complex aggregation questions
  • This can be solved using Knowledge Graph Technology, turning heaps of unstructured documents into a graph representation that can quickly be queried
  • When data is in external systems and readily available via APIs, function calling or tools can be used to provide results from external systems for user questions
  • If that is not enough and user questions require complex interactions, self-organising AI Agents may be a solution

Since AI Agents seem to be the most capable, why shouldn’t we let them handle everything for us? The answer is simple – again. Cost and time! Generally speaking, the further down you go in the above list of approaches, the higher the incurred LLM token cost, but also the higher the overall execution time. While we observe sub-second delays and sub-cent costs for simple LLM calls, that can quickly increase to 3-10 seconds and 5 or more cents for function calling. Agents, ultimately, require a LOT of tokens to keep track of internal updates, sometimes requiring several tens of thousands of tokens, resulting in execution times up to two minutes, with associated costs sometimes well above 2€. The second drawback: Transparency and maintainability – Analysing why an agent has failed is much more difficult than understanding why RAG may have failed.

It is thus necessary to select the correct approach for your business case, something that we are more than happy to provide consulting on. Let’s connect!

Niklas Frühauf
Senior Data Scientist

Your Contact

Niklas Frühauf works as a Senior Data Scientist at sovanta. He focuses on keeping up with latest AI methodologies and state-of-the-art approaches in the areas of natural language understanding, computer vision and time series forecasting, leveraging these to support customers in solving their business challenges and automating their processes with AI.
Tags
AI / GenAI