pnpm add evalite@beta CLI
The evalite command-line interface for running evaluations.
Commands
evalite (default)
Alias for evalite run. Runs evals once and exits.
evaliteevalite run
Run evals once and exit. Default command when no subcommand specified.
evalite runevalite run path/to/eval.eval.tsPositional Arguments:
[path](optional) - Path filter to run specific eval files. If not provided, runs all.eval.tsfiles.
Flags:
--threshold <number>- Fails the process if the score is below threshold. Specified as 0-100. Default is 100.--outputPath <path>- Path to write test results in JSON format after evaluation completes.--hideTable- Hides the detailed table output in the CLI.--no-cache- Disables caching of AI SDK model outputs. See Vercel AI SDK caching.
Examples:
# Run all evalsevalite run
# Run specific eval fileevalite run example.eval.ts
# Fail if score drops below 80%evalite run --threshold 80
# Export results to JSONevalite run --outputPath results.json
# Hide detailed tableevalite run --hideTableevalite watch
Watch evals for file changes and re-run automatically. Starts the UI server at http://localhost:3006.
evalite watchevalite watch path/to/eval.eval.tsPositional Arguments:
[path](optional) - Path filter to watch specific eval files.
Flags:
--threshold <number>- Fails the process if the score is below threshold. Specified as 0-100. Default is 100.--hideTable- Hides the detailed table output in the CLI.--no-cache- Disables caching of AI SDK model outputs. See Vercel AI SDK caching.
Note: --outputPath is not supported in watch mode.
Watching Additional Files
By default, evalite watch only triggers reruns when your *.eval.ts files change.
If your evals depend on other files that Vitest can’t automatically detect (e.g., prompt templates, external data files, or CLI build outputs), you can configure extra watch globs in evalite.config.ts:
import { defineConfig } from "evalite/config";
export default defineConfig({ forceRerunTriggers: [ "src/**/*.ts", // helper / model code "prompts/**/*", // prompt templates "data/**/*.json", // test data ],});These globs are passed through to Vitest’s forceRerunTriggers option, so any change to a matching file will trigger a full eval rerun.
Note: Globs are resolved relative to the directory where you run evalite (the Evalite cwd).
Examples:
# Watch all evalsevalite watch
# Watch specific evalevalite watch example.eval.ts
# Watch with hidden table (useful for debugging with console.log)evalite watch --hideTableevalite serve
Run evals once and serve the UI without watching for changes. Useful when evals take a long time to run.
evalite serveevalite serve path/to/eval.eval.tsPositional Arguments:
[path](optional) - Path filter to run specific eval files.
Flags:
--threshold <number>- Fails the process if the score is below threshold. Specified as 0-100. Default is 100.--outputPath <path>- Path to write test results in JSON format after evaluation completes.--hideTable- Hides the detailed table output in the CLI.--no-cache- Disables caching of AI SDK model outputs. See Vercel AI SDK caching.
Examples:
# Run once and serve UIevalite serve
# Serve specific eval resultsevalite serve example.eval.tsevalite export
Export static UI bundle for CI artifacts. Exports a standalone HTML bundle that can be viewed offline or uploaded as a CI artifact.
evalite exportFlags:
--output <path>- Output directory for static export. Default:./evalite-export--runId <number>- Specific run ID to export. Default: latest run
Examples:
# Export latest run to default directoryevalite export
# Export to custom directoryevalite export --output ./my-export
# Export specific runevalite export --runId 123
# Export and specify both optionsevalite export --output ./artifacts --runId 42Note: If no runs exist in storage, evalite export will automatically run evaluations first.
Global Flags
All commands support these flags:
--help- Show help for the command--version- Show version information
Configuration
CLI behavior can be configured via evalite.config.ts:
import { defineConfig } from "evalite/config";
export default defineConfig({ scoreThreshold: 80, // Default threshold for all runs hideTable: true, // Hide table by default server: { port: 3006, // UI server port },});See Also
- runEvalite() - Run evals programmatically from Node.js
- defineConfig() - Configure Evalite behavior
- Watch Mode - Tips for using watch mode effectively
- CI/CD - Running evals in continuous integration