Skip to content
These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

Scorers

Evals are great for putting a ton of data through your system and checking the outputs. But scanning through all that data manually is hard work.

Scorers give you a way to benchmark your system. Instead of “yeah, that looks good”, you can say “our system scored 90% on this test”.

Built-in Scorers

Evalite provides ready-to-use scorers for common evaluation patterns. See Built-in Scorers for the complete list.

Custom Scorers

Evalite lets you create custom scorers to suit your needs. You specify them in the scorers array of the evalite function:

import { evalite } from "evalite";
evalite("My Eval", {
data: [{ input: "Hello" }],
task: async (input) => {
return input + " World!";
},
scorers: [
{
name: "Contains Paris",
description: "Checks if the output contains the word 'Paris'.",
scorer: ({ output }) => {
return output.includes("Paris") ? 1 : 0;
},
},
],
});

The scorer function

The scorer function receives the input, output, and expected values. It must return a number between 0 and 1.

({ input, output, expected }) => {
return output.includes("Paris") ? 1 : 0;
},

It can also be async, allowing you to perform asynchronous like contacting LLM’s or searching databases:

({ input, output, expected }) => {
const response = await generateObject({
// Your AI SDK call here
})
return response.object.score;
},

Metadata

You can provide metadata along with your custom scorer:

import { createScorer } from "evalite";
const containsParis = createScorer<string>({
name: "Contains Paris",
description: "Checks if the output contains the word 'Paris'.",
scorer: (output) => {
return {
score: output.includes("Paris") ? 1 : 0,
metadata: {
// Can be anything!
},
};
},
});

This will be visible along with the score in the Evalite UI. This is especially useful when you call LLM’s inside scorers - you can include the reasoning from the LLM in the metadata.

Reusable Scorers

If you have a scorer you want to use across multiple files, you can use createScorer to create a reusable scorer. See createScorer() for more details.

These are typed using the three type arguments passed to createScorer:

import { createScorer } from "evalite";
const containsParis = createScorer<
string, // Type of 'input'
string, // Type of 'output'
string // Type of 'expected'
>({
name: "Contains Word",
description: "Checks if the output contains the specified word.",
scorer: ({ output, input, expected }) => {
return output.includes(expected) ? 1 : 0;
},
});