Retrieval-Augmented Generation (RAG) is an exciting advancement in AI that combines traditional language models with an external knowledge retrieval system. Rather than relying solely on pre-trained data, RAG dynamically pulls relevant information to improve response accuracy, making it a powerful tool for various applications.

I recently put RAG to the test, evaluating its performance on different hardware setups to get a sense of how computing power affects processing time. While I won’t dive too deeply into the mechanics, I’ll share my experience and key findings from these tests. Spoiler alert: newer hardware makes a big difference!

How RAG Works (Briefly!)

At its core, RAG works by retrieving relevant documents from a knowledge source and using them to generate responses. This approach mitigates the limitations of static language models by incorporating updated, external information, allowing for more accurate and contextually aware outputs. Think of it as an AI that not only “remembers” but also “looks things up” before answering.

This method is particularly useful in domains where up-to-date or specialized knowledge is crucial, such as legal research, scientific discovery, and enterprise AI solutions.

Testing RAG Across Different Hardware

To see how hardware impacts RAG’s performance, I ran tests on two different setups:

  • Newer system: Intel Core i9-13900H, 48GB DDR4 RAM, NVIDIA GeForce RTX 4070 (8GB VRAM)
  • Older system: Intel Core i5-6600K, 16GB DDR3 RAM, NVIDIA GeForce GTX 1060

Each test involved running a similar RAG query and measuring the time it took to generate a response. The results?

  • Newer system: Around 3 minutes to complete the task
  • Older system: Between 8 to 10 minutes

This isn’t entirely surprising—hardware plays a significant role in AI model execution. However, the nearly threefold difference in processing time highlights how much a modern CPU, more RAM, and a powerful GPU contribute to efficiency.

Testing RAG with Different Documents

The tests were performed on a modified version of the Jupyter Notebook provided by LlamaIndex: https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval/

Beyond hardware performance, I also experimented with how RAG retrieves information from various document types. One interesting case was when I tested it using a PDF document that contained contradictory statements. The document had two conflicting sentences:

  • “The weather will be good tomorrow.”
  • “The weather will be bad tomorrow.”

When asked, “How will the weather be tomorrow?”, RAG retrieved both statements and displayed them as-is:

“The weather will be good tomorrow. The weather will be bad tomorrow.”

This suggests that RAG does not automatically resolve contradictions but rather retrieves all relevant information. While this can be useful in certain contexts, it also means that additional processing or filtering may be necessary to ensure consistency in responses. Additionally, the AI model and its version play a crucial role in handling contradictions. While the AI used in this case did not process conflicting information effectively, a more advanced or better-trained version might be capable of recognizing contradictions or even providing a synthesized response.

Key Takeaways

  1. Hardware Matters – The difference in response times between these two setups is significant. If you’re working with large-scale AI applications, investing in high-end hardware can yield noticeable speed improvements.
  2. GPU Acceleration is Crucial – While RAG leverages both CPU and RAM, the GPU’s role in parallel processing cannot be overlooked. The RTX 4070 outperformed the GTX 1060, showing how modern GPUs enhance performance.
  3. Scaling Considerations – While an 8–10 minute response time might be acceptable for certain offline tasks, real-time or near-real-time applications demand faster turnaround, making newer hardware essential.
  4. Handling Contradictory Information – RAG retrieves all relevant data without resolving inconsistencies. If a document contains conflicting statements, RAG may present both, requiring post-processing for clarity.

Final Thoughts

RAG presents an innovative way to improve AI responses by leveraging external knowledge retrieval. However, as my tests show, hardware significantly impacts performance. Additionally, its approach to fetching information requires careful handling when dealing with contradictions. If you’re considering deploying RAG-based solutions, hardware investment and data filtering should be key considerations.

Have you experimented with RAG? I’d love to hear about your experience and how it compares! Let’s discuss in the comments.