Skip to content
These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

Built-in Scorers

Ready-to-use scorer functions for common evaluation patterns. Import from evalite/scorers.

import { exactMatch, faithfulness, toolCallAccuracy } from "evalite/scorers";

Evalite’s built-in scorers are deeply integrated with the Vercel AI SDK, making it easy to evaluate LLM outputs using standardized models and embeddings.

Many scorers require AI SDK models for LLM-based evaluation. Evalite makes these cheap to use by caching their results with wrapAISDKModel.

String Scorers

Simple deterministic scorers for text matching. No AI SDK required.

RAG Scorers

LLM-based scorers for evaluating retrieval-augmented generation systems. Require AI SDK models.

  • faithfulness - Detects hallucinations by checking if answers stick to provided context
  • answerSimilarity - Measures semantic similarity between answers using embeddings
  • answerCorrectness - Comprehensive evaluation combining factual accuracy and semantic similarity
  • answerRelevancy - Checks if AI actually answered the question (vs going off-topic)
  • contextRecall - Evaluates if retrieval system found the right documents

Advanced Scorers

Specialized scorers for specific use cases.

  • toolCallAccuracy - Verifies AI is calling correct functions with correct parameters
  • noiseSensitivity - Diagnoses if AI is misled by irrelevant documents in RAG systems

Quick Reference

ScorerAI SDK RequiredUse Case
exactMatchNoExact string matching
containsNoSubstring matching
levenshteinNoFuzzy string matching
faithfulnessYes (LLM)RAG hallucination detection
answerSimilarityYes (Embeddings)Semantic similarity
answerCorrectnessYes (LLM + Embeddings)Comprehensive evaluation
answerRelevancyYes (LLM + Embeddings)On-topic checking
contextRecallYes (LLM)Retrieval quality
toolCallAccuracyNoFunction calling verification
noiseSensitivityYes (LLM)RAG debugging

See Also