pnpm add evalite@beta Scorers
Evals are great for putting a ton of data through your system and checking the outputs. But scanning through all that data manually is hard work.
Scorers give you a way to benchmark your system. Instead of “yeah, that looks good”, you can say “our system scored 90% on this test”.
Built-in Scorers
Evalite provides ready-to-use scorers for common evaluation patterns. See Built-in Scorers for the complete list.
Custom Scorers
Evalite lets you create custom scorers to suit your needs. You specify them in the scorers array of the evalite function:
import { evalite } from "evalite";
evalite("My Eval", { data: [{ input: "Hello" }], task: async (input) => { return input + " World!"; }, scorers: [ { name: "Contains Paris", description: "Checks if the output contains the word 'Paris'.", scorer: ({ output }) => { return output.includes("Paris") ? 1 : 0; }, }, ],});The scorer function
The scorer function receives the input, output, and expected values. It must return a number between 0 and 1.
({ input, output, expected }) => { return output.includes("Paris") ? 1 : 0;},It can also be async, allowing you to perform asynchronous like contacting LLM’s or searching databases:
({ input, output, expected }) => { const response = await generateObject({ // Your AI SDK call here })
return response.object.score;},Metadata
You can provide metadata along with your custom scorer:
import { createScorer } from "evalite";
const containsParis = createScorer<string>({ name: "Contains Paris", description: "Checks if the output contains the word 'Paris'.", scorer: (output) => { return { score: output.includes("Paris") ? 1 : 0, metadata: { // Can be anything! }, }; },});This will be visible along with the score in the Evalite UI. This is especially useful when you call LLM’s inside scorers - you can include the reasoning from the LLM in the metadata.
Reusable Scorers
If you have a scorer you want to use across multiple files, you can use createScorer to create a reusable scorer. See createScorer() for more details.
These are typed using the three type arguments passed to createScorer:
import { createScorer } from "evalite";
const containsParis = createScorer< string, // Type of 'input' string, // Type of 'output' string // Type of 'expected'>({ name: "Contains Word", description: "Checks if the output contains the specified word.", scorer: ({ output, input, expected }) => { return output.includes(expected) ? 1 : 0; },});