Atla provides specific metrics for evaluating against a reference (aka ground truth) that help in assessing the accuracy and completeness of the AI’s responses. These metrics ensure that the generated answers align well with the reference answers provided. Atla will score results on a Likert scale (1-5).

You can pass any or all of these ground truth metrics to the evaluate function of the Atla client, and it will perform independent evaluations over the input and generated output provided.

atla_ metrics additionally use a multi-step process to achieve even more reliable scores on long & complex samples. This method uses slightly more tokens.

Metric NameDescription
hallucinationAssesses the presence of incorrect or unrelated content in the AI’s response.
atla_precisionAssesses the relevance of all the information in the response.
atla_recallMeasures how completely the response captures the key facts and details.

Implementation Details