These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

runEvalite()

Run evaluations programmatically from Node.js scripts or custom tooling.

Signature

runEvalite(opts: {
  mode: "run-once-and-exit" | "watch-for-file-changes" | "run-once-and-serve";
  path?: string;
  cwd?: string;
  scoreThreshold?: number;
  outputPath?: string;
  hideTable?: boolean;
  storage?: Evalite.Storage;
  forceRerunTriggers?: string[];
}): Promise<void>

Parameters

`opts.mode`

Type: "run-once-and-exit" | "watch-for-file-changes" | "run-once-and-serve" (required)

The execution mode for running evals.

Modes:

"run-once-and-exit" - Run evals once and exit. Ideal for CI/CD pipelines.
"watch-for-file-changes" - Watch for file changes and re-run automatically. Starts the UI server.
"run-once-and-serve" - Run evals once and serve the UI without watching for changes.

import { runEvalite } from "evalite/runner";

// CI/CD mode
await runEvalite({
  mode: "run-once-and-exit",
});

// Development mode with watch
await runEvalite({
  mode: "watch-for-file-changes",
});

// Run once and keep UI open
await runEvalite({
  mode: "run-once-and-serve",
});

`opts.path`

Type: string (optional)

Path filter to run specific eval files. If not provided, runs all .eval.ts files.

await runEvalite({
  mode: "run-once-and-exit",
  path: "my-eval.eval.ts",
});

`opts.cwd`

Type: string (optional)

The working directory to run evals from. Defaults to process.cwd().

await runEvalite({
  mode: "run-once-and-exit",
  cwd: "/path/to/my/project",
});

`opts.scoreThreshold`

Type: number (optional, 0-100)

Minimum average score threshold. If the average score falls below this threshold, the process will exit with code 1.

await runEvalite({
  mode: "run-once-and-exit",
  scoreThreshold: 80, // Fail if average score < 80
});

Useful for CI/CD pipelines where you want to fail the build if evals don’t meet a quality threshold.

`opts.outputPath`

Type: string (optional)

Path to write test results in JSON format after evaluation completes.

await runEvalite({
  mode: "run-once-and-exit",
  outputPath: "./results.json",
});

The exported JSON contains the complete run data including all evals, results, scores, and traces.

Note: Not supported in watch-for-file-changes mode.

`opts.hideTable`

Type: boolean (optional, default: false)

Hide the detailed results table in terminal output. Keeps the score summary but removes the detailed table.

await runEvalite({
  mode: "watch-for-file-changes",
  hideTable: true, // Useful for debugging with console.log
});

`opts.storage`

Type: Evalite.Storage (optional)

Custom storage backend instance. If not provided, uses the storage from evalite.config.ts or defaults to in-memory storage.

import { runEvalite } from "evalite/runner";
import { createSqliteStorage } from "evalite/sqlite-storage";

await runEvalite({
  mode: "run-once-and-exit",
  storage: createSqliteStorage("./custom.db"),
});

See Storage for more details.

`opts.forceRerunTriggers`

Type: string[] (optional)

Extra file globs that trigger eval reruns in watch mode. This overrides any forceRerunTriggers setting in evalite.config.ts.

await runEvalite({
  mode: "watch-for-file-changes",
  forceRerunTriggers: ["src/**/*.ts", "prompts/**/*", "data/**/*.json"],
});

This is useful when your evals depend on files that aren’t automatically detected as dependencies (e.g., prompt templates, external data files).

Tip: forceRerunTriggers globs are evaluated relative to the cwd you pass (or process.cwd() if unspecified).

Usage Examples

Basic CI/CD Script

import { runEvalite } from "evalite/runner";

async function runTests() {
  try {
    await runEvalite({
      mode: "run-once-and-exit",
      scoreThreshold: 75,
      outputPath: "./results.json",
    });
    console.log("All evals passed!");
  } catch (error) {
    console.error("Evals failed:", error);
    process.exit(1);
  }
}

runTests();

Development Script

import { runEvalite } from "evalite/runner";

// Run specific eval in watch mode
await runEvalite({
  mode: "watch-for-file-changes",
  path: "chat.eval.ts",
  hideTable: true,
});

Custom Storage

import { runEvalite } from "evalite/runner";
import { createSqliteStorage } from "evalite/sqlite-storage";

const storage = createSqliteStorage("./evalite.db");

await runEvalite({
  mode: "run-once-and-exit",
  storage,
});

Multi-Environment Testing

import { runEvalite } from "evalite/runner";

const environments = [
  { name: "staging", url: "https://staging.example.com" },
  { name: "production", url: "https://example.com" },
];

for (const env of environments) {
  console.log(`Running evals for ${env.name}...`);

  process.env.API_URL = env.url;

  await runEvalite({
    mode: "run-once-and-exit",
    scoreThreshold: 80,
    outputPath: `./results-${env.name}.json`,
  });
}

Parallel Eval Execution

import { runEvalite } from "evalite/runner";

// Run multiple eval sets in parallel
await Promise.all([
  runEvalite({
    mode: "run-once-and-exit",
    path: "chat.eval.ts",
  }),
  runEvalite({
    mode: "run-once-and-exit",
    path: "completion.eval.ts",
  }),
]);

Configuration Priority

Options merge in this order (highest to lowest priority):

Function arguments (opts)
Config file (evalite.config.ts)
Defaults

Example:

export default defineConfig({
  scoreThreshold: 70,
  hideTable: true,
});

// script.ts
await runEvalite({
  mode: "run-once-and-exit",
  scoreThreshold: 80, // Overrides config (80 used)
  // hideTable not specified, uses config (true)
});

Error Handling

The function throws an error if:

Evals fail to run
Score threshold is not met
Invalid options are provided

try {
  await runEvalite({
    mode: "run-once-and-exit",
    scoreThreshold: 90,
  });
} catch (error) {
  console.error("Eval run failed:", error);
  // Handle error or exit
  process.exit(1);
}

Return Value

Returns a Promise<void>. The function completes when:

run-once-and-exit: All evals finish
watch-for-file-changes: Never (runs indefinitely)
run-once-and-serve: All evals finish, but UI server keeps process alive

runEvalite()

Signature

Parameters

opts.mode

opts.path

opts.cwd

opts.scoreThreshold

opts.outputPath

opts.hideTable

opts.storage

opts.forceRerunTriggers

Usage Examples

Basic CI/CD Script

Development Script

Custom Storage

Multi-Environment Testing

Parallel Eval Execution

Configuration Priority

Error Handling

Return Value

See Also

`opts.mode`

`opts.path`

`opts.cwd`

`opts.scoreThreshold`

`opts.outputPath`

`opts.hideTable`

`opts.storage`

`opts.forceRerunTriggers`