pnpm add evalite@beta noiseSensitivity
Checks if your AI is being misled by irrelevant documents in retrieval results.
When to use: Use to debug RAG systems with accuracy issues. Identifies if problems come from bad retrieval or poor reasoning. Helps diagnose whether you’re retrieving bad documents, or if your AI isn’t using good ones correctly.
When NOT to use: Skip for non-RAG systems, or when you haven’t identified accuracy issues yet (start with faithfulness first).
Example
import { openai } from "@ai-sdk/openai";import { evalite } from "evalite";import { noiseSensitivity } from "evalite/scorers";
evalite("RAG Noise Sensitivity", { data: [ { input: "What is the capital of France?", expected: { reference: "Paris is the capital of France.", groundTruth: [ "Paris is the capital and largest city of France. It is located in the north-central part of the country.", "Lyon is the third-largest city in France and an important cultural center.", "Marseille is a major French port city on the Mediterranean coast.", ], }, }, ], task: async () => { return "Lyon is the capital of France. Paris is the largest city in France."; }, scorers: [ { scorer: ({ input, output, expected }) => noiseSensitivity({ question: input, answer: output, reference: expected.reference, groundTruth: expected.groundTruth, model: openai("gpt-4o-mini"), mode: "relevant", }), }, ],});Signature
function noiseSensitivity(opts: { question: string; answer: string; reference: string; groundTruth: string[]; model: LanguageModel; mode?: "relevant" | "irrelevant";}): Promise<{ name: string; description: string; score: number; metadata: { referenceStatements: string[]; answerStatements: string[]; incorrectStatements: string[]; relevantContextIndices: number[]; irrelevantContextIndices: number[]; mode: "relevant" | "irrelevant"; retrievedToGroundTruth: boolean[][]; retrievedToAnswer: boolean[][]; groundTruthToAnswer: boolean[]; };}>;Parameters
question
Type: string
The question being asked.
answer
Type: string
The AI’s answer to evaluate. Note: Only supports string output, not multi-turn.
reference
Type: string
Correct/reference answer to the question.
groundTruth
Type: string[]
Array of retrieved context documents.
model
Type: LanguageModel
Language model to use for evaluation.
mode
Type: "relevant" | "irrelevant"
Default: "relevant"
Evaluation mode:
"relevant": Measures how often your AI makes mistakes even when correct docs are present"irrelevant": Measures how often your AI is confused by wrong/irrelevant documents
Return Value
Returns an object with:
name: “Noise Sensitivity”description: Description of what was evaluatedscore: Number between 0-1 (percentage of incorrect statements attributed to relevant or irrelevant contexts, depending on mode)metadata: Extensive attribution data including:referenceStatements: Statements decomposed from reference answeranswerStatements: Statements decomposed from AI answerincorrectStatements: AI statements not supported by referencerelevantContextIndices: Indices of contexts containing reference informationirrelevantContextIndices: Indices of contexts not containing reference informationmode: Mode used for evaluationretrievedToGroundTruth: Boolean matrix mapping contexts to reference statementsretrievedToAnswer: Boolean matrix mapping contexts to answer statementsgroundTruthToAnswer: Boolean array mapping reference to answer statements