Default metrics cover common evaluation scenarios and are optimized for immediate use. If you see a default metric that serves your use case, we recommend you go with it first.


Metrics have two kinds of scoring scales:

  • Binary (0-1): 0 being Failure/incorrect, 1 being success/correct
  • Likert (1-5): 1 being the lowest, and 5 the highest
MetricDescriptionScaleWhen to useRequires
atla_default_concisenessHow concise and to-the-point the LLM’s response is.Likert (1-5)When you want to evaluate whether responses are brief and efficient.model_input, model_output
atla_default_correctnessHow factually accurate the LLM’s response is.BinaryWhen you want to check whether responses contain correct information.model_input, model_output, expected_model_output
atla_default_faithfulnessHow faithful the LLM is to the provided context.Likert (1-5)When you want to check for hallucinations.model_input, model_output, model_context
atla_default_helpfulnessHow effectively the LLM’s response addresses the user’s needs.Likert (1-5)When you want to assess practical value to users.model_input, model_output
atla_default_logical_coherenceHow well-reasoned and internally consistent the response is.Likert (1-5)When you want to check whether responses follow logical reasoning.model_input, model_output
atla_default_relevanceHow well the response addresses the specific query or context.Likert (1-5)When you want to ensure responses stay on topic.model_input, model_output

You can access default metrics in your evaluations using both the SDK and the Eval Copilot.