Learn about best practices, and metrics
Defining the right scoring criteria to evaluate your LLM output against is at the core of an evaluation. Even though “good” and “bad” quality can be fundamentally vague and highly context-dependent concepts, you can build strong criteria using these tips:
Examples of criteria might include evaluating a copywriting AI model on aspects of clarity, engagement, brand relevance, etc., whereas a medical AI model might want to get evaluated against aspects such as clinical relevance, legal compliance, etc.
You can define your scoring criteria with the help of Atla metrics. You can choose from a set of default metrics that Atla provides or create your own custom metrics if you need something customised for your use case.
Learn about best practices, and metrics
Defining the right scoring criteria to evaluate your LLM output against is at the core of an evaluation. Even though “good” and “bad” quality can be fundamentally vague and highly context-dependent concepts, you can build strong criteria using these tips:
Examples of criteria might include evaluating a copywriting AI model on aspects of clarity, engagement, brand relevance, etc., whereas a medical AI model might want to get evaluated against aspects such as clinical relevance, legal compliance, etc.
You can define your scoring criteria with the help of Atla metrics. You can choose from a set of default metrics that Atla provides or create your own custom metrics if you need something customised for your use case.