Run prompt evals across different AI models.

Add one OpenRouter key, edit the model list, write prompts, run the set, then vote prompt by prompt to see which model fits your use case best.

Loading eval runner