These are the docs for the beta version of Evalite. Install with
pnpm add evalite@beta createScorer()
Create a reusable scorer function for evaluating LLM outputs.
Signature
createScorer<TInput, TOutput, TExpected = TOutput>(opts: { name: string; description?: string; scorer: (input: { input: TInput; output: TOutput; expected?: TExpected; }) => Promise<number | { score: number; metadata?: unknown }> | number | { score: number; metadata?: unknown };}): Scorer<TInput, TOutput, TExpected>Parameters
opts.name
Type: string (required)
The name of the scorer. Displayed in the UI and test output.
createScorer({ name: "Exact Match", scorer: ({ output, expected }) => (output === expected ? 1 : 0),});opts.description
Type: string (optional)
A description of what the scorer evaluates. Helps document scoring logic.
createScorer({ name: "Length Check", description: "Checks if output is at least 10 characters", scorer: ({ output }) => (output.length >= 10 ? 1 : 0),});opts.scorer
Type: (input: { input, output, expected }) => number | { score: number; metadata?: unknown }
The scoring function. Receives input, output, and expected values. Must return:
- A number between 0 and 1, or
- An object with
score(0-1) and optionalmetadata
createScorer({ name: "Word Count", scorer: ({ output }) => { const wordCount = output.split(" ").length; return { score: wordCount >= 10 ? 1 : 0, metadata: { wordCount }, }; },});Return Value
Returns a Scorer function that can be used in the scorers array of evalite().
Usage
Basic Scorer
import { createScorer, evalite } from "evalite";
const exactMatch = createScorer({ name: "Exact Match", scorer: ({ output, expected }) => { return output === expected ? 1 : 0; },});
evalite("My Eval", { data: [{ input: "Hello", expected: "Hi" }], task: async (input) => callLLM(input), scorers: [exactMatch],});Scorer with Metadata
const lengthChecker = createScorer({ name: "Length Check", description: "Validates output length is within acceptable range", scorer: ({ output }) => { const length = output.length; const isValid = length >= 10 && length <= 100;
return { score: isValid ? 1 : 0, metadata: { length, minLength: 10, maxLength: 100, }, }; },});Async Scorer
Scorers can be async for LLM-based evaluation:
const llmScorer = createScorer({ name: "LLM Judge", description: "Uses GPT-4 to evaluate output quality", scorer: async ({ output, expected }) => { const response = await openai.chat.completions.create({ model: "gpt-4", messages: [ { role: "system", content: "Rate the output quality from 0 to 1.", }, { role: "user", content: `Output: ${output}\nExpected: ${expected}`, }, ], });
const score = parseFloat(response.choices[0].message.content); return score; },});Reusable Scorers
Create a library of scorers to reuse across evals:
import { createScorer } from "evalite";
export const hasEmoji = createScorer({ name: "Has Emoji", scorer: ({ output }) => (/\p{Emoji}/u.test(output) ? 1 : 0),});
export const containsKeyword = (keyword: string) => createScorer({ name: `Contains "${keyword}"`, scorer: ({ output }) => (output.includes(keyword) ? 1 : 0), });
// my-eval.eval.tsimport { evalite } from "evalite";import { hasEmoji, containsKeyword } from "./scorers";
evalite("My Eval", { data: [{ input: "Hello" }], task: async (input) => callLLM(input), scorers: [hasEmoji, containsKeyword("greeting")],});Inline Scorers
You can also define scorers inline without createScorer():
evalite("My Eval", { data: [{ input: "Hello", expected: "Hi" }], task: async (input) => callLLM(input), scorers: [ // Inline scorer (same shape as createScorer opts) { name: "Exact Match", scorer: ({ output, expected }) => (output === expected ? 1 : 0), }, ],});Both approaches are equivalent. Use createScorer() when you want to reuse the scorer across multiple evals.
See Also
- Scorers Guide - Overview of scoring strategies
- evalite() - Using scorers in evals