When using Retrieval-Augmented Generation (RAG), it’s crucial to ensure that model responses accurately reflect the retrieved context. By comparing responses against the provided context, you can detect and measure hallucinations.

Using Retrieved Context

When evaluating RAG responses, Atla can check if the output stays faithful to the provided context using the model_context parameter:

from atla import Atla

client = Atla()

evaluation = client.evaluation.create(
  model_id="atla-selene",
  model_input="Is it legal to monitor employee emails under European privacy laws?",
  model_output="Monitoring employee emails is permissible under European privacy laws like GDPR, provided there's a legitimate purpose.",
  model_context="European privacy laws, including GDPR, allow for the monitoring of employee emails under strict conditions. The employer must demonstrate that the monitoring is necessary for a legitimate purpose, such as protecting company assets or compliance with legal obligations. Employees must be informed about the monitoring in advance, and the privacy impact should be assessed to minimize intrusion.",
  metric_name="atla_default_faithfulness",
).result.evaluation

print(f"Atla's score: {evaluation.score} out of 5")
print(f"Atla's critique: {evaluation.critique}")

For detecting RAG hallucinations, we recommend using the default atla_default_faithfulness metric, which evaluates how well responses align with the provided context.