A production-ready all-round solution that solves the challenge of creating high-quality SEO content at scale. Featuring a graph-based workflow engine, comprehensive evaluation framework, and support for multiple AI models. The solution achieved 37% higher SEO scores and 76% reduction in content production time through innovative approaches including LLM-as-Judge evaluation and versioned template management.
In the competitive digital marketing landscape, creating high-quality SEO content at scale presents significant challenges. Content marketers face several pain points:
BEING Spotted as using uniquely Gen AI for Content Creation leads to major penalty by google.
1. Inconsistent quality across different AI content generation models (OpenAI, Anthropic, Google, DeepSeek)
2. Difficulty evaluating and comparing content quality using objective metrics
3. Complex workflows requiring orchestration of multiple tasks (content generation, image creation, metadata optimization)
4. Limited ability to trace performance issues and optimize content generation pipelines
5. Need for rigorous evaluation to ensure SEO standards compliance
Organizations needed a solution that could systematically generate, evaluate, and optimize SEO content while providing robust metrics on performance across different AI models and prompt templates.
As with any other AI project, the challenge is not generating content, but generating content of quality, that is consistent, SEO friendly and can be published to a CMS.
A comprehensive SEO content generation platform with a focus on modular architecture, customizable workflows, and robust evaluation capabilities. The solution implements several key innovative approaches.
The system uses a Docker-based microservice architecture with clear separation of concerns:
- API Service: FastAPI-based REST interface for content generation requests
- Celery Workers: Asynchronous task processing for content generation pipelines
- PostgresDB: ALL-in-one solution for storing content metrics and evaluation results
- Redis: Message broker and caching layer
- Caddy : Reverse proxy with automatic HTTPS
- Logfire : Observability platform for structured logging and tracing
- S3 : Object store for Blob objects
This architecture enables horizontal scaling of individual components based on load requirements.
At the core of the system is a flexible graph-based pipeline orchestration engine:
@PipelineRegistry.register_pipeline("content_generation")
class ContentGenerationPipeline(Pipeline):
pipeline_schema = PipelineSchema(
description="Pipeline for generating content",
start=GeneratePost,
nodes=[
NodeConfig(
node=GeneratePost,
connections=[GenerateImage],
description="Generate a post",
kwargs={"provider": "openrouter"},
)
...
],
)
This allows for:
- Dynamic routing between nodes based on content requirements
- Flexible provider selection for each step (OpenRouter, Gemini, etc.)
- Easy addition of new processing nodes without code changes
- Automatic validation of pipeline structure to prevent cycles
Very flexible custom LLM providers can be added to the system.
LLM Factory based on OpenAI API using various providers/models with multi-modal support for image generation, text generation, and more.
- OpenRouter: OpenRouter is a platform that provides a unified API for multiple AI models. It allows you to use different models for different tasks.
- Vertex AI: Vertex AI is a Google specific AI platform that provides a unified API for all their models. Useful specifically for multi-modal output via Imagen & Veo APIs .
- OpenAI API: Universal API followed by all other LLM Providers.
As we will see later, This will be very usefull for the Evaluation framework: It will allow us to quickly iterate & evaluate the quality of the content based on the prompt and the model.
The most innovative aspect is the dual-layer evaluation system that combines:
1. Code-Based Metrics- Automated evaluation of objective criteria:
- SEO elements (title length, meta description, heading structure)
- Content quality (paragraph structure, image usage, internal linking)
- Required elements (CTA links, backlinks, locale-specific content)
2. LLM-as-Judge Evaluation - Using AI to assess subjective qualities:
- Localization quality (ensuring content only mentions specified locales)
- Image placeholder descriptiveness
- Image generation quality
- Structured content validation (FAQ sections, HowTo markup)
@dataclass
class ImageQualityJudgeEvaluator(Evaluator[str, BlogArticleLLM]):
"""
Uses LLM as a judge to check if the image is of good quality and is relevant to the content.
"""
async def evaluate(self, ctx: EvaluatorContext[EventSchema, BlogArticleLLM]) -> EvaluationReason:
# Implementation uses an LLM to evaluate image quality
# and returns a reasoned judgment with score
The platform implements comprehensive observability using Logfire for structured logging and tracing.
This is a Critical part in any AI System as it provides very important information for debug.
- End-to-end tracing of request flows across services
- Performance metrics for each pipeline node
- Automated correlation of logs across distributed systems
- Structured error reporting for troubleshooting
@logfire.instrument(
msg_template="Running pipeline {event.action} for event {event.ticket_id}",
span_name="run_pipeline",
)
def run(self, event: EventSchema) -> TaskContext:
"""Executes the pipeline for a given event."""
# Implementation with tracing
You can see my Linkedin post about it for a little more details here.
The system uses a versioned approach to prompt templates with frontmatter metadata:
class PromptMetadata(BaseModel):
"""Metadata for a prompt template."""
name: str
description: Optional[str] = None
author: Optional[str] = None
version: str = Field("0.0.1")
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
This allows for:
- A/B testing different prompt approaches
- Tracking template performance over time
- Attributing content quality improvements to specific template changes
{
"experiment_id": "2025-04-21_15-04-13_gemini-2.0-flash-001_generate_seo_article_0.0.3",
"created_at": "2025-04-21T15:04:13.552477",
"model_name": "google/gemini-2.0-flash-001",
"prompt_name": "generate_seo_article",
"prompt_version": "0.0.3",
"scores": {
"HasRequiredElementsEvaluator": {
"content_writer_complete": {
"value": 1.0,
"reason": {
"cta_link": "{cta_link} present",
"img_tags": "img tags present",
"reference_link": "reference link present",
"backlink": "backlink present",
"locale": "locale present"
}
}
},
"StructuredContentEvaluator": {
"content_writer_complete": {
"value": 0.6666666666666666,
"reason": {
"faq_section": "Valid FAQ structure",
"howto_section": "No <div class='howto-section'> found in content.",
"figure_section": "All figures valid"
}
}
},
"LocalizationLLMJudgeEvaluator": {
"content_writer_complete": {
"value": 1.0,
"reason": {
"llm_response": "The text mentions Bordeaux and no other locations. It focuses on dog training specifically in the Bordeaux area."
}
}
},
...
}
The SEO Content Orchestration Platform delivered significant measurable outcomes:
1. Quality Improvement: 37% higher average SEO evaluator scores across generated content
2. Time Efficiency: 76% reduction in content production time from brief to publication
3. Template Iteration: Version 0.1.0 of the prompt template showed near 100% improvement over 0.0.1.
The platform's evaluation system provided data-driven insights that:
- Improved content quality through iterative template refinement
- Reduced rejection rates for client deliverables
- Enabled cost-effective model selection based on quality/cost ratio
- Pinpointed specific content areas needing improvement (locale handling, schema markup)
• Backend: Python 3.12+, FastAPI, Celery
• Database: PostgreSQL (with time-series extensions)
• Containerization: Docker, Docker Compose
• Web Server: Caddy
• Message Broke: Redis
• AI Models: OpenRouter, OpenAI, Anthropic Claude, Google Gemini, DeepSeek
• GenAI Framework: PydanticAI - Custom Graph-based Pipeline Engine ( inspired from Langgraph), PydanticEval
• Evaluation Framework: Custom Pydantic-based evaluators
• Tracing & Logging: Logfire
• Templating: Jinja2, Frontmatter
• Testing: Pytest, PydanticEval
• CI/CD: GitHub Actions
The evaluation system represents the most innovative component of the platform. It implements a framework for consistent assessment of content quality using both objective metrics and AI judgment:
@dataclass
class SEOEvaluator(Evaluator[EventSchema, BlogArticleLLM]):
"""
Evaluates SEO elements in the content.
Returns a score from 0.0 to 1.0
"""
def evaluate(self, ctx: EvaluatorContext[EventSchema, BlogArticleLLM]) -> float:
score = 0.0
# Title tag check
if ctx.output.title_tag and len(ctx.output.title_tag) > 0:
# Title should be between 50-60 characters for optimal SEO
title_length = len(ctx.output.title_tag)
if 50 <= title_length <= 60:
...
# Additional checks for meta description, heading structure,
# keyword usage, etc.
return score
The project was developed by me over a 3-week period. Key challenges included:
1. Evaluation Consistency: Ensuring consistent evaluation metrics across different content types and domains
2. Model Integration: Managing different API interfaces and rate limits across AI providers
The platform continues to evolve with plans to implement feedback loops where evaluation results automatically inform prompt improvements, creating a self-optimizing content generation system.
3. Managing US Endpoints : As European endpoint didn't allow many multi-modal generative models
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Massa adipiscing in at orci semper. Urna, urna.