Skip to content
These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

answerRelevancy

Checks if your AI actually answered the question asked (vs going off-topic or being evasive).

When to use: When you want to catch answers that are technically correct but don’t address what was asked. Perfect for customer support bots, Q&A systems, or any scenario where staying on-topic matters.

When NOT to use: If your use case allows tangential or exploratory responses, or when creative interpretations of questions are desired.

Example

import { openai } from "@ai-sdk/openai";
import { evalite } from "evalite";
import { answerRelevancy } from "evalite/scorers";
evalite("Answer Relevancy", {
data: [
{
input: "What is the capital of France?",
},
{
input: "Who invented the telephone?",
},
{
input: "What are the health benefits of exercise?",
},
],
task: async (input) => {
if (input.includes("capital of France")) {
return "Paris is the capital of France. It's known for the Eiffel Tower and the Louvre Museum.";
} else if (input.includes("telephone")) {
return "Alexander Graham Bell is credited with inventing the telephone in 1876.";
} else if (input.includes("health benefits")) {
return "I don't know about that topic.";
}
return "I'm not sure about that.";
},
scorers: [
{
scorer: ({ input, output }) =>
answerRelevancy({
question: input,
answer: output,
model: openai("gpt-4o-mini"),
embeddingModel: openai.embedding("text-embedding-3-small"),
}),
},
],
});

Signature

function answerRelevancy(opts: {
question: string;
answer: string;
model: LanguageModel;
embeddingModel: EmbeddingModel;
}): Promise<{
name: string;
description: string;
score: number;
metadata: {
generatedQuestions: string[];
similarities: number[];
allNoncommittal: boolean;
};
}>;

Parameters

question

Type: string

The original question being asked.

answer

Type: string

The AI’s answer to evaluate. Note: Only supports string output, not multi-turn.

model

Type: LanguageModel

Language model to use for generating hypothetical questions.

embeddingModel

Type: EmbeddingModel

Embedding model to use for computing semantic similarity between questions.

How It Works

Looks at your AI’s answer and generates hypothetical questions it could be answering (by default, 3 questions). Then compares those generated questions to your original question using embeddings and cosine similarity. If similar, your AI stayed on topic.

Also detects evasive/noncommittal answers like “I don’t know” or “I’m not sure” and scores them as 0.

Return Value

Returns an object with:

  • name: “Answer Relevancy”
  • description: Description of what was evaluated
  • score: Number between 0-1 (mean cosine similarity of generated questions to original, or 0 if all answers are noncommittal)
  • metadata: Object containing:
    • generatedQuestions: Array of hypothetical questions generated from the answer
    • similarities: Array of cosine similarity scores for each generated question
    • allNoncommittal: Boolean indicating if all generated questions were flagged as noncommittal

See Also