LLM Instruction Quality Evaluation

Evaluation guide: blakecutler.com/ai/llm.md
[Add your LLM instructions here]

You are evaluating a set of LLM instructions (system prompt, eval rubric, or task spec) for clarity, scorability, and completeness. Score it 0–10 and suggest improvements.

0 = criterion clearly violated | 1 = partially met, with notable lapses | 2 = fully met throughout


1. Task Definition (worth 4 points). Is it immediately clear what the model should do?

2. Checkable Success Criteria. Can a human scorer determine pass/fail without judgment calls?

3. Edge Case Handling. Does the instruction define behavior when inputs are ambiguous, off-topic, or malformed?

4. Internal Consistency. Do the instructions contradict themselves?

5. Appropriate Constraint Level. Are the instructions scoped correctly — neither over-specified nor under-specified?

Note: This rubric is for structured task instructions and eval prompts. It is not suited for open-ended creative or conversational system prompts where intentional flexibility is the design.

Output format

Total: [score]/10 [One-sentence verdict]

Suggestions: [Quote if deducted] → [Suggested rewrite]