Our default metrics cover common evaluation scenarios and are optimized for immediate use.

MetricDescriptionScaleWhen to useRequires
atla_default_concisenessHow concise and to-the-point the LLM’s response is.Likert (1-5)When you want to evaluate whether responses are brief and efficient.model_input, model_output
atla_default_correctnessHow factually accurate the LLM’s response is.BinaryWhen you want to check whether responses contain correct information.model_input, model_output, expected_model_output
atla_default_faithfulnessHow faithful the LLM is to the provided context.Likert (1-5)When you want to check for hallucinations.model_input, model_output, model_context
atla_default_helpfulnessHow effectively the LLM’s response addresses the user’s needs.Likert (1-5)When you want to assess practical value to users.model_input, model_output
atla_default_logical_coherenceHow well-reasoned and internally consistent the response is.Likert (1-5)When you want to check whether responses follow logical reasoning.model_input, model_output
atla_default_relevanceHow well the response addresses the specific query or context.Likert (1-5)When you want to ensure responses stay on topic.model_input, model_output

Scoring Scales

Our evaluator models produce scores that indicate how well an LLM interaction matches a specific metric.

The interpretation of our default metric scales is as follows:

Binary

01
Failure or IncorrectSuccess or Correct

Likert (1-5)

12345
Very PoorPoorAcceptableGoodExcellent

Understanding these scales helps you interpret evaluation results and compare performance across different prompts and models.