Case study
AI-assisted content quality scoring and generation
For the clearest view of the content quality workflow, open the video in full screen.
As the company's product suite scaled, customer-facing content became increasingly inconsistent across dashboards, APIs, and emails. UX writing reviews could not keep pace with the speed of product development, and emerging AI-generated content introduced even more variability in tone, terminology, and clarity.
I saw an opportunity to turn UX writing guidance into something measurable and operationalized.
I built an AI-assisted content quality scoring and generation workflow. I created the evaluation framework based on internal style guides and UX writing best practices, wrote the regex-based rules that translated UX writing standards into measurable checks, and shaped the workflow teams used to score, revise, and improve content.
Impact
Made UX writing quality measurable and scalable
Improved dashboard content quality by 34.8%
Initial testing across enterprise dashboard page descriptions increased the average quality score from 7.42 to 10 after revisions generated from the system's recommendations.
Standardized UX writing guidance across multiple content types
The framework expanded beyond UI copy to support error messages, API descriptions, changelogs, customer emails, and blog content.
Reduced dependency on manual UX writing reviews
The system provided immediate, actionable feedback directly to teams, enabling faster iteration and reducing review bottlenecks for urgent releases.
Created an operational system for measuring content quality
The project transformed UX writing guidance from subjective review feedback into a repeatable scoring framework tied to terminology, readability, tone, and usability standards.
The problem
UX writing standards were treated like reference material, not operational infrastructure
Most organizations treat UX writing standards as reference material. The assumption is that if teams have access to a glossary or style guide, consistency will naturally follow.
In practice, that rarely happens. Teams were shipping content from dozens of different perspectives, each with their own assumptions about terminology, tone, and user familiarity. Some descriptions were overly technical. Others were vague or abstract. Many explained implementation details without clarifying user outcomes.
The inconsistencies became especially visible in the dashboard, where users were already navigating highly complex workflows. Even small wording differences changed how users interpreted products, policies, and system behavior.
What initially appeared to be a writing consistency issue revealed a larger systems problem. UX writing guidance was difficult to operationalize at scale, teams lacked immediate feedback loops during content creation, and content quality could not be measured consistently.
Review processes also relied heavily on institutional knowledge, while AI-generated copy introduced additional variability in tone and terminology.
The introduction of generative AI accelerated the urgency of the problem. Teams could now generate content faster than ever, but speed amplified inconsistency. AI tools were effective at producing drafts, yet they lacked awareness of company-specific terminology, UX conventions, and product context.
The challenge was no longer just "How do we review more content?" It became: "How do we create a scalable system that helps teams produce clearer content before UX review is needed?"
The framework
A custom AI-assisted evaluation system for UX writing
I designed and built a custom AI-assisted UX writing evaluation system tailored specifically to the company's products, terminology, and voice guidelines.
At the time, AI writing workflows were still in their infancy, and primarily focused on generating copy with ChatGPT or Gemini. I approached the problem differently: how could AI help evaluate and improve content quality instead?
The system transformed UX writing principles into programmable scoring logic. I translated subjective writing guidance into measurable evaluation criteria using regex-based language detection, indexed terminology validation, structural writing checks, contextual AI analysis, and content-type-specific scoring models.
This meant the system could evaluate far more than grammar. It identified passive voice, missing Oxford commas, unclear action framing, terminology inconsistencies, tone mismatches, and UX writing anti-patterns using custom-built rules and AI interpretation layers.
I was able to write the regex expressions and scoring heuristics in a way that effectively turned UX writing standards into mathematical evaluation systems.
Regex-based language detection
Found repeatable writing patterns like passive voice, missing Oxford commas, hidden verbs, and overly formal phrasing.
Indexed terminology validation
Checked copy against preferred company terms so product language stayed consistent across surfaces.
Sentence-level readability checks
Measured length, density, acronyms, and action framing to identify copy that was accurate but hard to scan.
Contextual AI analysis
Used AI interpretation for judgment-based guidance, including tone, clarity, and whether the recommendation fit the content type.
Content-type-specific scoring
Weighted rules differently for UI copy, docs, emails, changelogs, and other formats based on their user context.
Regex examples
Passive voice
Flagged passive constructions so teams could rewrite with clearer action and ownership.
\b(?:are|was|were|be|been|have|is|am|had|by|has)\s+[a-zA-Z]+(?:d|ing|en|ne|de)\bHidden verbs
Detected nominalized phrasing that made product copy feel heavier than it needed to be.
\b(?:achieve|effect|give|make|reach|take|have)\w*s(?:an|a|the)\b.*(?:ment|tion|ance|sis)\b
\b(?:the|a|an)\b\s*(?:\w*[-]?(?:ing|tion|ment|sion))\s*\bof\bOxford commas
Checked list structures against the company's style expectations for clarity and consistency.
[^,.]+, [^,.]+ \b(and|or)\b [^,]+
How it worked
Fast feedback loops, tailored scoring, and actionable recommendations
Different content types, including error messages, API parameter descriptions, UI descriptions, emails, changelogs, and blog content, each used tailored scoring criteria based on their user context and communication goals.
I also designed the product experience for the tool itself, focusing on fast iteration loops and actionable recommendations rather than abstract scoring. Teams could paste content into the system, receive immediate feedback, generate AI-assisted revisions, and iterate toward stronger content quality directly within the workflow.
The result was a company-specific UX writing system designed to scale clarity, consistency, and usability across a rapidly growing enterprise platform.
Scoring workflow

Recommendation output

01
Evaluate
Analyze content against company-specific terminology, voice, tone, usability, and content-type expectations.
02
Recommend
Surface specific issues and generate actionable AI-assisted revisions instead of only returning a score.
03
Iterate
Help teams refine content quickly inside the workflow before a manual UX writing review was needed.
Reflection
Scalable UX systems matter more than isolated copy improvements
Here are principles I would carry into any organization trying to make UX quality measurable and repeatable:
01
Scalable UX systems matter more than isolated copy improvements
The long-term value of the system was not individual recommendations. It was the creation of a reusable system that embedded UX writing standards directly into day-to-day workflows. That shift made consistency more sustainable as the organization scaled.
02
AI content generation increases the importance of UX standards
Generative AI dramatically accelerates content creation, but acceleration without guardrails increases inconsistency. Content quality systems become more valuable in AI-assisted environments because they provide measurable standards for clarity, terminology, and usability.
03
Content quality is a product experience issue
One of the strongest takeaways from this project was that unclear content often signals unclear product thinking. The most effective revisions were rarely cosmetic. They clarified intent, simplified mental models, and made system behavior easier to predict.