Evaluations for Retrieval-Augmented Generation (RAG) are essential for assessing the effectiveness and reliability of AI systems that integrate retrieved information with generated responses.

By using Atla, we can measure how accurately and relevantly the AI utilizes the provided context to generate answers. This ensures that the AI not only retrieves pertinent information but also integrates it seamlessly into its responses, maintaining factual accuracy and contextual relevance.

Atla has been trained on specific RAG metrics ensuring the best performance.

Running evals for RAG applications

To evaluate a Retrieval Augmented Generation (RAG) application with Atla, pass the retrieved context via the context parameter.

from atla import Atla

client = Atla()

score = client.evaluation.create(
  input="Is it legal to monitor employee emails under European privacy laws?",
  response="Monitoring employee emails is permissible under European privacy laws like GDPR, provided there's a legitimate purpose.",
  context="European privacy laws, including GDPR, allow for the monitoring of employee emails under strict conditions. The employer must demonstrate that the monitoring is necessary for a legitimate purpose, such as protecting company assets or compliance with legal obligations. Employees must be informed about the monitoring in advance, and the privacy impact should be assessed to minimize intrusion.",
  metrics=["groundedness"],
)

print(f"Atla's score: {score.evaluations['groundedness'].score} / 5")
print(f"Atla's critique: {score.evaluations['groundedness'].critique}")