Inputs

Base Input Variables

Every evaluation requires the following variables:

Input VariableDescription
model_idThe Atla evaluator model to use. See our models page for details.
model_inputThe user input to your GenAI app (e.g., a question, instruction, or chat dialogue).
model_outputThe response from your LLM.

Evaluation Metrics vs Evaluation Criteria

After selecting an evaluator model and specifying the LLM interaction, you need to define what to evaluate.

Metrics or evaluation critieria that you create and refine for one model are optimized for that model. We advise against using models interchangeably on the same criteria without testing.

Use either an evaluation metric or evaluation criteria, not both.

Choose one of these approaches:

  1. Evaluation metrics

    Use a prompt that captures a specific metric (e.g., logical_coherence) to evaluate the LLM interaction. You can use our default metrics or create your own custom metrics.

    Input VariableDescription
    metric_nameThe name of the metric to use. See our metrics page for details.
  1. Evaluation criteria

    For rapid experimentation or to use an existing evaluation prompt, provide evaluation criteria directly:

    Input VariableDescription
    evaluation_criteriaA prompt instruction defining how to evaluate the LLM interaction.

While using atla-selene-mini as your evaluation model, we strongly advise that you use the recommended template for best results.

The evaluation_criteria template for Selene Mini has the following 3 components:

  1. Description of the evaluation
  2. List of scores and their corresponding criteria
  3. A sentence that specifies constraints on the score. This sentence should contain the string Your score should be followed by the corresponding criteria for the binary or the Likert type.

Additional Inputs

Depending on your evaluation, you may need to provide additional inputs.

RAG Contexts

For RAG evaluations, provide the context available to the model:

Input VariableDescription
model_contextThe context provided to the LLM for grounding.

Reference Answers

Using reference answers is recommended for evaluation when available:

Input VariableDescription
expected_model_outputA reference “ground truth” that meets the evaluation criteria.

Few-Shot Examples

Providing few-shot examples is one of the best ways to align your evaluation, regardless of your use case:

Input VariableDescription
few_shot_examplesA list of examples with known evaluation scores.

Evaluator Output

Each evaluation produces two outputs:

Output VariableDescription
scoreA numerical score indicating how well the LLM interaction meets the criteria.
critiqueA brief explanation justifying the score.

Atla models generate the critique before deciding on a score.