These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

createScorer()

Create a reusable scorer function for evaluating LLM outputs.

Signature

createScorer<TInput, TOutput, TExpected = TOutput>(opts: {
  name: string;
  description?: string;
  scorer: (input: {
    input: TInput;
    output: TOutput;
    expected?: TExpected;
  }) => Promise<number | { score: number; metadata?: unknown }> | number | { score: number; metadata?: unknown };
}): Scorer<TInput, TOutput, TExpected>

Parameters

`opts.name`

Type: string (required)

The name of the scorer. Displayed in the UI and test output.

createScorer({
  name: "Exact Match",
  scorer: ({ output, expected }) => (output === expected ? 1 : 0),
});

`opts.description`

Type: string (optional)

A description of what the scorer evaluates. Helps document scoring logic.

createScorer({
  name: "Length Check",
  description: "Checks if output is at least 10 characters",
  scorer: ({ output }) => (output.length >= 10 ? 1 : 0),
});

`opts.scorer`

Type: (input: { input, output, expected }) => number | { score: number; metadata?: unknown }

The scoring function. Receives input, output, and expected values. Must return:

A number between 0 and 1, or
An object with score (0-1) and optional metadata

createScorer({
  name: "Word Count",
  scorer: ({ output }) => {
    const wordCount = output.split(" ").length;
    return {
      score: wordCount >= 10 ? 1 : 0,
      metadata: { wordCount },
    };
  },
});

Return Value

Returns a Scorer function that can be used in the scorers array of evalite().

Usage

Basic Scorer

import { createScorer, evalite } from "evalite";

const exactMatch = createScorer({
  name: "Exact Match",
  scorer: ({ output, expected }) => {
    return output === expected ? 1 : 0;
  },
});

evalite("My Eval", {
  data: [{ input: "Hello", expected: "Hi" }],
  task: async (input) => callLLM(input),
  scorers: [exactMatch],
});

Scorer with Metadata

const lengthChecker = createScorer({
  name: "Length Check",
  description: "Validates output length is within acceptable range",
  scorer: ({ output }) => {
    const length = output.length;
    const isValid = length >= 10 && length <= 100;

    return {
      score: isValid ? 1 : 0,
      metadata: {
        length,
        minLength: 10,
        maxLength: 100,
      },
    };
  },
});

Async Scorer

Scorers can be async for LLM-based evaluation:

const llmScorer = createScorer({
  name: "LLM Judge",
  description: "Uses GPT-4 to evaluate output quality",
  scorer: async ({ output, expected }) => {
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [
        {
          role: "system",
          content: "Rate the output quality from 0 to 1.",
        },
        {
          role: "user",
          content: `Output: ${output}\nExpected: ${expected}`,
        },
      ],
    });

    const score = parseFloat(response.choices[0].message.content);
    return score;
  },
});

Reusable Scorers

Create a library of scorers to reuse across evals:

import { createScorer } from "evalite";

export const hasEmoji = createScorer({
  name: "Has Emoji",
  scorer: ({ output }) => (/\p{Emoji}/u.test(output) ? 1 : 0),
});

export const containsKeyword = (keyword: string) =>
  createScorer({
    name: `Contains "${keyword}"`,
    scorer: ({ output }) => (output.includes(keyword) ? 1 : 0),
  });

// my-eval.eval.ts
import { evalite } from "evalite";
import { hasEmoji, containsKeyword } from "./scorers";

evalite("My Eval", {
  data: [{ input: "Hello" }],
  task: async (input) => callLLM(input),
  scorers: [hasEmoji, containsKeyword("greeting")],
});

Inline Scorers

You can also define scorers inline without createScorer():

evalite("My Eval", {
  data: [{ input: "Hello", expected: "Hi" }],
  task: async (input) => callLLM(input),
  scorers: [
    // Inline scorer (same shape as createScorer opts)
    {
      name: "Exact Match",
      scorer: ({ output, expected }) => (output === expected ? 1 : 0),
    },
  ],
});

Both approaches are equivalent. Use createScorer() when you want to reuse the scorer across multiple evals.