Default Metrics
Using our default metrics
Default metrics cover common evaluation scenarios and are optimized for immediate use. If you see a default metric that serves your use case, we recommend you go with it first.
Metrics have two kinds of scoring scales:
- Binary (0-1): 0 being Failure/incorrect, 1 being success/correct
- Likert (1-5): 1 being the lowest, and 5 the highest
Metric | Description | Scale | When to use | Requires |
---|---|---|---|---|
atla_default_conciseness | How concise and to-the-point the LLM’s response is. | Likert (1-5) | When you want to evaluate whether responses are brief and efficient. | model_input , model_output |
atla_default_correctness | How factually accurate the LLM’s response is. | Binary | When you want to check whether responses contain correct information. | model_input , model_output , expected_model_output |
atla_default_faithfulness | How faithful the LLM is to the provided context. | Likert (1-5) | When you want to check for hallucinations. | model_input , model_output , model_context |
atla_default_helpfulness | How effectively the LLM’s response addresses the user’s needs. | Likert (1-5) | When you want to assess practical value to users. | model_input , model_output |
atla_default_logical_coherence | How well-reasoned and internally consistent the response is. | Likert (1-5) | When you want to check whether responses follow logical reasoning. | model_input , model_output |
atla_default_relevance | How well the response addresses the specific query or context. | Likert (1-5) | When you want to ensure responses stay on topic. | model_input , model_output |
You can access default metrics in your evaluations using both the SDK and the Eval Copilot.