Retrival Augmented Generation (RAG)

What is RAG?

Retrieval Augmented Generation

  • Give additional context to the LLM by providing trusted content

  • Inserts a information retrieval step to find relevant information from a knowledge store

  • Retrieved information is then provided to the model’s context for LLM response

  • Can help with hallucinations

  • Not always predictable which queries will retrieve which knowledge stores

Do you need RAG?

  • Can you provide the relevant context into the system prompt?
  • Can you register a tool to look up content
  • Need to manage the index and possibly more complex RAG pipeline

Can you use a tool instead?

  • Data lives in an API, database, structured source
  • Use look ups on demand (RAG process happens on each query)

When is RAG more useful?

  • Lots of unstructured data (PDFs, documents, notes, etc)

But can you add structure to it?

Create a vector store

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

docs = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(docs)

Optional: save vector store

index.storage_context.persist(persist_dir="./storage")

Retrieve concent

from llama_index.core import StorageContext, load_index_from_storage

# Load the knowledge store (index) from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

def retrieve_trusted_content(query):
    retriever = index.as_retriever(similarity_top_k=5)
    nodes = retriever.retrieve(query)
    return [f"<excerpt>{x.text}</excerpt>" for x in nodes]

Dynamic retrieval

  • Makes the retrieval step a tool for the LLM

  • More flexible and robust

Dynamic retrieval: Function

Create a function

from chatlas import ChatOpenAI
from llama_index.core import StorageContext, load_index_from_storage

# Load the knowledge store (index) from disk
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

def retrieve_trusted_content(query: str, top_k: int = 5):
    """
    Retrieve relevant content from the knowledge store.

    Parameters
    ----------
    query
        The query used to semantically search the knowledge store.
    top_k
        The number of results to retrieve from the knowledge store.
    """
    retriever = index.as_retriever(similarity_top_k=top_k)
    nodes = retriever.retrieve(query)
    return [f"<excerpt>{x.text}</excerpt>" for x in nodes]

Dynamic retrieval: Register tool

chat = ChatOpenAI(
    system_prompt="You are a helpful, but terse, assistant. "
    "If you can't answer the question based on the trusted content, say so."
)

chat.register_tool(retrieve_trusted_content)

Dynamic retrieval: Chat

chat.chat("Who created the unicorn programming language?")

Demo: Local RAG example

https://github.com/chendaniely/nydsaic2025-llm/tree/main/code/06-rag