These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

answerCorrectness

Checks if your AI’s answer is correct by comparing it to a reference answer. Combines factual accuracy (75%) and semantic similarity (25%) by default.

When to use: When you need comprehensive answer evaluation that balances exact correctness with semantic equivalence. Ideal for QA systems where both factual accuracy and meaning matter.

When NOT to use: If you only care about exact facts (use faithfulness), or only semantic similarity (use answerSimilarity). Not suitable for creative tasks where divergence from reference is desired.

Example

import { openai } from "@ai-sdk/openai";
import { evalite } from "evalite";
import { answerCorrectness } from "evalite/scorers";

evalite("Answer Correctness", {
  data: [
    {
      input: "What is the capital of France?",
      expected: {
        reference: "Paris is the capital of France.",
      },
    },
    {
      input: "Who invented the telephone?",
      expected: {
        reference:
          "Alexander Graham Bell invented the telephone. The telephone was patented in 1876.",
      },
    },
  ],
  task: async (input) => {
    // Your AI task here
    return "Paris is the capital of France and has many museums.";
  },
  scorers: [
    {
      scorer: ({ input, output, expected }) =>
        answerCorrectness({
          question: input,
          answer: output,
          reference: expected.reference,
          model: openai("gpt-4o-mini"),
          embeddingModel: openai.embedding("text-embedding-3-small"),
        }),
    },
  ],
});

Signature

function answerCorrectness(opts: {
  question: string;
  answer: string;
  reference: string;
  model: LanguageModel;
  embeddingModel: EmbeddingModel;
  weights?: [number, number];
  beta?: number;
}): Promise<{
  name: string;
  description: string;
  score: number;
  metadata: {
    classification: {
      TP: Array<{ statement: string; reason: string }>;
      FP: Array<{ statement: string; reason: string }>;
      FN: Array<{ statement: string; reason: string }>;
    };
    factualityScore: number;
    similarityScore: number;
    responseStatements: string[];
    referenceStatements: string[];
  };
}>;

Parameters

question

Type: string

The question being asked.

answer

Type: string

The AI’s answer to evaluate.

reference

Type: string

Reference answer for comparison. Should be a complete, accurate answer.

model

Type: LanguageModel

Language model to use for evaluation.

embeddingModel

Type: EmbeddingModel

Embedding model to use for semantic similarity calculation.

weights (optional)

Type: [number, number] Default: [0.75, 0.25]

Weights for combining factuality and similarity scores: [factualityWeight, similarityWeight]. Default weighs factual accuracy at 75% and semantic similarity at 25%.

beta (optional)

Type: number Default: 1.0

Beta parameter for F-beta score calculation. beta > 1 favors recall (catching all reference statements), beta < 1 favors precision (avoiding false positives).

Return Value

Returns an object with:

name: “Answer Correctness”
description: Description of what was evaluated
score: Number between 0-1 (weighted combination of factuality and similarity)
metadata: Object containing:
- classification: TP (true positives), FP (false positives), FN (false negatives) with statements and reasons
- factualityScore: F-beta score based on statement classification
- similarityScore: Cosine similarity between embeddings
- responseStatements: Decomposed statements from the answer
- referenceStatements: Decomposed statements from the reference

answerCorrectness

Example

Signature

Parameters

question

answer

reference

model

embeddingModel

weights (optional)

beta (optional)

Return Value

See Also