When using Retrieval-Augmented Generation (RAG), it’s crucial to ensure that model responses accurately reflect the retrieved context.
By comparing responses against the provided context, you can detect and measure hallucinations.
Using Retrieved Context
When evaluating RAG responses, Atla can check if the output stays faithful to the provided context using the model_context
parameter:
from atla import Atla
client = Atla()
evaluation = client.evaluation.create(
model_id="atla-selene",
model_input="Is it legal to monitor employee emails under European privacy laws?",
model_output="Monitoring employee emails is permissible under European privacy laws like GDPR, provided there's a legitimate purpose.",
model_context="European privacy laws, including GDPR, allow for the monitoring of employee emails under strict conditions. The employer must demonstrate that the monitoring is necessary for a legitimate purpose, such as protecting company assets or compliance with legal obligations. Employees must be informed about the monitoring in advance, and the privacy impact should be assessed to minimize intrusion.",
metric_name="atla_default_faithfulness",
).result.evaluation
print(f"Atla's score: {evaluation.score} out of 5")
print(f"Atla's critique: {evaluation.critique}")
For detecting RAG hallucinations, we recommend using the default atla_default_faithfulness
metric, which evaluates
how well responses align with the provided context.