Skip to content
These are the docs for the beta version of Evalite. Install with pnpm add evalite@beta

The Dev Loop

When you’re developing evals, you want to iterate quickly. Running your entire test suite every time you make a change can be slow and frustrating.

Evalite gives you several tools to speed up your development workflow: watch mode for automatic re-runs, file filtering to run specific evals, and selective execution to focus on particular test cases.

Watch Mode

Watch mode automatically re-runs your evals whenever you make changes to your files.

Terminal window
evalite watch

This command watches all .eval.ts files in your project and re-runs them whenever they change.

Serve Mode

If you have slow-running evals, you might not want them to re-run every time you make a change. Serve mode runs your evals once and then keeps the UI available for inspection.

Terminal window
evalite serve

This runs your evals once and then serves the UI at http://localhost:3006. Your tests won’t re-run when files change.

You can then re-run your evals by pressing the “Rerun” button in the UI.

Run Specific Files

Sometimes you don’t want to run your entire eval suite. You can run specific eval files by passing them as arguments:

Terminal window
evalite my-eval.eval.ts

You can also run multiple files at once:

Terminal window
evalite eval1.eval.ts eval2.eval.ts

This works with both watch and serve modes:

Terminal window
evalite watch my-eval.eval.ts
evalite serve my-eval.eval.ts

Skip Entire Evals

If you want to temporarily disable an eval without deleting it, you can use evalite.skip():

evalite.skip("My Eval", {
data: () => [],
task: () => {},
});

Focus on Specific Test Cases

Sometimes you want to focus on a single test case within an eval. You can use the only flag to do this:

evalite("My Eval", {
data: () => [
{ input: "test1", expected: "output1" },
{ input: "test2", expected: "output2", only: true },
{ input: "test3", expected: "output3" },
],
task: async (input) => {
// Only runs for "test2"
},
});

When any data entry has only: true, only those entries will run.