We help your Agent get smarter
every time it fails.

EvalFix automatically finds what's breaking in your AI product, figures out why, and fixes it — so your team spends less time debugging and more time shipping.

CLI & Open Source Free

Run from your terminal or CI. Self-hosted, no account needed.

$ pip install evalfix

Managed UI Hosted

Visual dashboard, prompt versioning, and team collaboration. No setup.

Get started free

evalfix run my-agent/

avg score

evalfix fix my-agent/ AI

prompt diff — v2 → v3

- You are a helpful assistant that answers

- questions clearly and concisely.

+ You are a helpful assistant. When asked for

+ structured output (JSON, haiku, numbered steps),

+ follow the exact format requested. Otherwise,

+ answer clearly and concisely.

reasoning

3 tests failed due to format non-compliance. The prompt lacked explicit format instructions. Added format guidance targeting json_only_output, respond_in_haiku, and numbered_steps.

score 0.42 → 0.91

Get started in seconds

Two packages. That's it.

terminal CLI

$ pip install evalfix

Run evals, analyze failures, and fix prompts from your terminal or CI pipeline.

$ evalfix run my-agent/

$ evalfix fix my-agent/

✓ Fixed in 1 iteration score 0.42 → 0.91

support_bot.py SDK

$ pip install evalfix-sdk

from evalfix_sdk import capture, configure

configure(queue_file="support-bot/.evalfix/failures.jsonl")

if quality_score < 0.7:

capture(

input=user_msg,

output=response,

expected="Empathetic, actionable reply",

score=quality_score,

)

This is how real-world failures become eval cases. Every bad response your agent produces in production gets captured here — so evalfix fix is always optimizing against what actually breaks, not examples you invented.

Never blocks. Never throws.

Why not just edit the prompt yourself?

You can't fix what you can't see.

Manual prompt editing is a guess. evalfix gives you the failure, the context, and the fix — verified against your real test cases before it ships.

Without evalfix

# you get a Slack message

⚠ support-bot responses feel off today

# you open the prompt file

You are a helpful assistant that answers

questions clearly and concisely.

# you make a guess

+ Please be more empathetic.

# you deploy and hope

No evals. No diff. No rollback.

With evalfix

# evalfix captured 7 failures overnight

✗empathy_check0.18

got: "Your ticket is closed." expected: acknowledgement

✗action_items0.31

no next steps offered in 6/7 cases

# evalfix fix support-bot/

▸ root cause: prompt lacks tone + CTA guidance

+ Acknowledge the issue, then offer a clear next step.

✓ score 0.24 → 0.87 · verified on 7 real cases

Evaluation methods

Exact match Contains Regex LLM-as-judge Custom

Evaluate the way your use case demands — from deterministic checks to AI-graded rubrics.

We help your Agent get smarter
every time it fails.

Two packages. That's it.

You can't fix what you can't see.

Stay in the loop.

You're on the list.

We help your Agent get smarter every time it fails.

Two packages. That's it.

You can't fix what you can't see.

Stay in the loop.

You're on the list.

We help your Agent get smarter
every time it fails.