Grade and improve your LLM system prompt — clarity, structure, token bloat, common anti-patterns
A system prompt is the most underrated piece of code in most AI products. Subtle failures there cascade into expensive evaluation loops downstream. This linter runs 12+ structural and semantic checks in the browser and surfaces issues before you ship — covering length, structure, vagueness, contradictions, and output-format specification.
No — prompts are empirical. This linter catches common footguns, but the final measure is evaluation on your actual task. Use it as a pre-flight check, not a substitute for evals.
Yes — they reflect 2024-2026 best practice from Anthropic's Claude prompting guide, OpenAI's GPT-4 system message docs, and widely-reported patterns from production prompt engineering teams.
That phrase is already heavily embedded in RLHF training. Repeating it rarely changes behavior — spend those tokens on domain-specific rules the model doesn't already know.
Claude was trained on XML-structured examples and reliably respects <instructions>, <context>, <examples>, <output_format> sections. GPT-4 benefits modestly; Gemini benefits less but still not negatively.
No — all linting runs in your browser.