AlpacaEval 1.1k
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Promptfoo 2.8k
Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.