20
 min read

Content Generation System.

A production-ready all-round solution that solves the challenge of creating high-quality SEO content at scale. Featuring a graph-based workflow engine, comprehensive evaluation framework, and support for multiple AI models. The solution achieved 37% higher SEO scores and 76% reduction in content production time through innovative approaches including LLM-as-Judge evaluation and versioned template management.

SEO CONTENT ORCHESTRATION PLATFORM

Challenge

In the competitive digital marketing landscape, creating high-quality SEO content at scale presents significant challenges. Content marketers face several pain points:

BEING Spotted as using uniquely Gen AI for Content Creation leads to major penalty by google.

1. Inconsistent quality across different AI content generation models (OpenAI, Anthropic, Google, DeepSeek)

2. Difficulty evaluating and comparing content quality using objective metrics

3. Complex workflows requiring orchestration of multiple tasks (content generation, image creation, metadata optimization)

4. Limited ability to trace performance issues and optimize content generation pipelines

5. Need for rigorous evaluation to ensure SEO standards compliance

Organizations needed a solution that could systematically generate, evaluate, and optimize SEO content while providing robust metrics on performance across different AI models and prompt templates.

As with any other AI project, the challenge is not generating content, but generating content of quality, that is consistent, SEO friendly and can be published to a CMS.

Our Approach

A comprehensive SEO content generation platform with a focus on modular architecture, customizable workflows, and robust evaluation capabilities. The solution implements several key innovative approaches.

Architecture & Workflow of the solution

The system uses a Docker-based microservice architecture with clear separation of concerns:

- API Service: FastAPI-based REST interface for content generation requests

- Celery Workers: Asynchronous task processing for content generation pipelines

- PostgresDB: ALL-in-one solution for storing content metrics and evaluation results

- Redis: Message broker and caching layer

- Caddy : Reverse proxy with automatic HTTPS

- Logfire : Observability platform for structured logging and tracing

- S3 : Object store for Blob objects

This architecture enables horizontal scaling of individual components based on load requirements.

AI Framework - Graph-Based Workflow Engine

At the core of the system is a flexible graph-based pipeline orchestration engine:

@PipelineRegistry.register_pipeline("content_generation")
class ContentGenerationPipeline(Pipeline):
    pipeline_schema = PipelineSchema(
        description="Pipeline for generating content",
        start=GeneratePost,
        nodes=[
            NodeConfig(
                node=GeneratePost,
                connections=[GenerateImage],
                description="Generate a post",
                kwargs={"provider": "openrouter"},
            )
            ...
        ],
    )

This allows for:

- Dynamic routing between nodes based on content requirements

- Flexible provider selection for each step (OpenRouter, Gemini, etc.)

- Easy addition of new processing nodes without code changes

- Automatic validation of pipeline structure to prevent cycles

LLM Providers

Very flexible custom LLM providers can be added to the system.

LLM Factory based on OpenAI API using various providers/models with multi-modal support for image generation, text generation, and more.

- OpenRouter: OpenRouter is a platform that provides a unified API for multiple AI models. It allows you to use different models for different tasks.

- Vertex AI: Vertex AI is a Google specific AI platform that provides a unified API for all their models. Useful specifically for multi-modal output via Imagen & Veo APIs .

- OpenAI API: Universal API followed by all other LLM Providers.

As we will see later, This will be very usefull for the Evaluation framework: It will allow us to quickly iterate & evaluate the quality of the content based on the prompt and the model.

Comprehensive Evaluation Framework

The most innovative aspect is the dual-layer evaluation system that combines:

1. Code-Based Metrics- Automated evaluation of objective criteria:

  - SEO elements (title length, meta description, heading structure)

  - Content quality (paragraph structure, image usage, internal linking)

  - Required elements (CTA links, backlinks, locale-specific content)

2. LLM-as-Judge Evaluation - Using AI to assess subjective qualities:

  - Localization quality (ensuring content only mentions specified locales)

  - Image placeholder descriptiveness

  - Image generation quality

  - Structured content validation (FAQ sections, HowTo markup)

@dataclass
class ImageQualityJudgeEvaluator(Evaluator[str, BlogArticleLLM]):
    """
    Uses LLM as a judge to check if the image is of good quality and is relevant to the content.
    """

    async def evaluate(self, ctx: EvaluatorContext[EventSchema, BlogArticleLLM]) -> EvaluationReason:
        # Implementation uses an LLM to evaluate image quality
        # and returns a reasoned judgment with score

Instrumentation & Tracing

The platform implements comprehensive observability using Logfire for structured logging and tracing.

This is a Critical part in any AI System as it provides very important information for debug.

- End-to-end tracing of request flows across services

- Performance metrics for each pipeline node

- Automated correlation of logs across distributed systems

- Structured error reporting for troubleshooting

@logfire.instrument(
    msg_template="Running pipeline {event.action} for event {event.ticket_id}",
    span_name="run_pipeline",
)
def run(self, event: EventSchema) -> TaskContext:
    """Executes the pipeline for a given event."""
    # Implementation with tracing

You can see my Linkedin post about it for a little more details here.

Prompt Engineering - Versioned Template Management

The system uses a versioned approach to prompt templates with frontmatter metadata:

class PromptMetadata(BaseModel):
    """Metadata for a prompt template."""
    name: str
    description: Optional[str] = None
    author: Optional[str] = None
    version: str = Field("0.0.1")
    created_at: Optional[datetime] = None
    updated_at: Optional[datetime] = None

This allows for:

- A/B testing different prompt approaches

- Tracking template performance over time

- Attributing content quality improvements to specific template changes

{
  "experiment_id": "2025-04-21_15-04-13_gemini-2.0-flash-001_generate_seo_article_0.0.3",
  "created_at": "2025-04-21T15:04:13.552477",
  "model_name": "google/gemini-2.0-flash-001",
  "prompt_name": "generate_seo_article",
  "prompt_version": "0.0.3",
  "scores": {
    "HasRequiredElementsEvaluator": {
      "content_writer_complete": {
        "value": 1.0,
        "reason": {
          "cta_link": "{cta_link} present",
          "img_tags": "img tags present",
          "reference_link": "reference link present",
          "backlink": "backlink present",
          "locale": "locale present"
        }
      }
    },
    "StructuredContentEvaluator": {
      "content_writer_complete": {
        "value": 0.6666666666666666,
        "reason": {
          "faq_section": "Valid FAQ structure",
          "howto_section": "No <div class='howto-section'> found in content.",
          "figure_section": "All figures valid"
        }
      }
    },
    "LocalizationLLMJudgeEvaluator": {
      "content_writer_complete": {
        "value": 1.0,
        "reason": {
          "llm_response": "The text mentions Bordeaux and no other locations. It focuses on dog training specifically in the Bordeaux area."
        }
      }
    },
...
}

Results & Impact

The SEO Content Orchestration Platform delivered significant measurable outcomes:

1. Quality Improvement: 37% higher average SEO evaluator scores across generated content

2. Time Efficiency: 76% reduction in content production time from brief to publication

3. Template Iteration: Version 0.1.0 of the prompt template showed near 100% improvement over 0.0.1.

The platform's evaluation system provided data-driven insights that:

- Improved content quality through iterative template refinement

- Reduced rejection rates for client deliverables

- Enabled cost-effective model selection based on quality/cost ratio

- Pinpointed specific content areas needing improvement (locale handling, schema markup)

Tech Stack

Backend: Python 3.12+, FastAPI, Celery

Database:  PostgreSQL (with time-series extensions)

Containerization: Docker, Docker Compose

Web Server: Caddy

Message Broke: Redis

AI Models: OpenRouter, OpenAI, Anthropic Claude, Google Gemini, DeepSeek

GenAI Framework: PydanticAI - Custom Graph-based Pipeline Engine ( inspired from Langgraph), PydanticEval

Evaluation Framework: Custom Pydantic-based evaluators

Tracing & Logging: Logfire

Templating: Jinja2, Frontmatter

Testing: Pytest, PydanticEval

CI/CD: GitHub Actions

Code Repository Spotlight

Evaluation System

The evaluation system represents the most innovative component of the platform. It implements a framework for consistent assessment of content quality using both objective metrics and AI judgment:

@dataclass
class SEOEvaluator(Evaluator[EventSchema, BlogArticleLLM]):
    """
    Evaluates SEO elements in the content.
    Returns a score from 0.0 to 1.0
    """

    def evaluate(self, ctx: EvaluatorContext[EventSchema, BlogArticleLLM]) -> float:
        score = 0.0

        # Title tag check
        if ctx.output.title_tag and len(ctx.output.title_tag) > 0:
            # Title should be between 50-60 characters for optimal SEO
            title_length = len(ctx.output.title_tag)
            if 50 <= title_length <= 60:
                ...

        # Additional checks for meta description, heading structure,
        # keyword usage, etc.
        
    	return score 

Additional Context

The project was developed by me over a 3-week period. Key challenges included:

1. Evaluation Consistency: Ensuring consistent evaluation metrics across different content types and domains

2. Model Integration: Managing different API interfaces and rate limits across AI providers

The platform continues to evolve with plans to implement feedback loops where evaluation results automatically inform prompt improvements, creating a self-optimizing content generation system.

3. Managing US Endpoints : As European endpoint didn't allow many multi-modal generative models

Want receive the best maketing insights? Subscribe now!

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Massa adipiscing in at orci semper. Urna, urna.

Thanks for joining our newsletter.
Oops! Something went wrong.

Latest Articles