Use Cases
Ground Truth
Evaluating against known reference answers
Ground truth evaluations help determine how well AI responses match known correct answers. By comparing generated responses against reference answers, you can measure accuracy and ensure outputs meet expected standards.
Using Reference Answers
When you have a reference answer, Atla can evaluate LLM outputs against it using the expected_model_output
parameter:
For evaluations against known correct answers, we recommend using the default
atla_default_correctness
metric.