Input Variables

Input VariableDescription
inputThe user input to your GenAI app. E.g., a question / instruction / previous chat dialogue etc. For previous chat dialogue, the input message should be a list of alternating user and assistant messages.
contextAdditional context fetched by your retrieval model.
ground_truthThe ‘gold standard’ or expected response.
responseThe response given by your LLM. Measures how on-point the retrieved context is. When using chat dialogue, the response should be the last assistant message in the conversation.

Evaluation Metrics

Evaluation metrics represent different dimensions to assess the performance of your GenAI app on.

Use our predefined metrics to comprehensively evaluate across the different components of your genAI application. The prompts underlying these metrics have been carefully prepared to maximise the effectiveness of our eval model.

  • Retrieval evaluation
  • Response generation evaluation
  • Language quality evaluation

Alternatively, build your own custom metrics in the custom eval prompt UI (coming soon) and deploy these to use with our API.

Scoring Format

ScoringDescription
Score of 1 - 5A Likert-scale scoring system. Commonly used by human raters for subjective assessments. The default option across our pre-defined metrics.
Binary 0 / 1A simple binary scoring system. Commonly used for classification purposes. 0 typically represents a negative outcome (no, fail, incorrect), while 1 represents a positive outcome (yes, pass, correct).
Float 0.0 - 0.1A continuous scoring format. Commonly used as a precise representation to quantify accuracy.

Evaluation Result

ResponseDescription
scoreScore of 1 - 5, Binary 0 / 1, Float 0 - 1
critiqueA brief justification for the provided score. Understand your GenAI app’s performance as you experiment with different model architectures / prompts / hyperparameters etc.