Skip to content

[SECURITY FEATURE]: Configurable Well-Known URI Handler including security.txt and robots.txt #540

@crivetimihai

Description

@crivetimihai

🌐 FEATURE: Configurable Well-Known URI Handler

Summary: Implement a flexible /.well-known/* endpoint handler that supports standard well-known URIs like security.txt and robots.txt with user-configurable content. Defaults assume private API deployment with crawling disabled.

Implementation

1. Update config.py with Well-Known Configuration

# In mcpgateway/config.py

from typing import Dict, Optional
import json
from pydantic import field_validator
from pydantic_settings import BaseSettings
from mcpgateway.utils.logging import logger

class Settings(BaseSettings):
    # ... existing settings ...
    
    # ===================================
    # Well-Known URI Configuration
    # ===================================
    
    # Enable well-known URI endpoints
    well_known_enabled: bool = True
    
    # robots.txt content (default: disallow all crawling for private API)
    well_known_robots_txt: str = """User-agent: *
Disallow: /
    
# MCP Gateway is a private API gateway
# Public crawling is disabled by default"""
    
    # security.txt content (optional, user-defined)
    # Example: "Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en"
    well_known_security_txt: str = ""
    
    # Enable security.txt only if content is provided
    well_known_security_txt_enabled: bool = False
    
    # Additional custom well-known files (JSON format)
    # Example: {"ai.txt": "This service uses AI for...", "dnt-policy.txt": "Do Not Track policy..."}
    well_known_custom_files: str = "{}"
    
    # Cache control for well-known files (seconds)
    well_known_cache_max_age: int = 3600  # 1 hour default
    
    @property
    def custom_well_known_files(self) -> Dict[str, str]:
        """Parse custom well-known files from JSON string."""
        try:
            return json.loads(self.well_known_custom_files) if self.well_known_custom_files else {}
        except json.JSONDecodeError:
            logger.error(f"Invalid JSON in WELL_KNOWN_CUSTOM_FILES: {self.well_known_custom_files}")
            return {}
    
    @field_validator("well_known_security_txt_enabled", mode="after")
    @classmethod
    def _auto_enable_security_txt(cls, v, info):
        """Auto-enable security.txt if content is provided."""
        if "well_known_security_txt" in info.data:
            return bool(info.data["well_known_security_txt"].strip())
        return v

2. Create Well-Known Handler Router

# Create mcpgateway/routers/well_known.py

from datetime import datetime, timezone, timedelta
from typing import Optional

from fastapi import APIRouter, HTTPException, Response, Request, Depends
from fastapi.responses import PlainTextResponse

from mcpgateway.config import settings
from mcpgateway.utils.logging import logger
from mcpgateway.auth import require_auth

router = APIRouter(tags=["well-known"])

# Well-known URI registry with validation
WELL_KNOWN_REGISTRY = {
    "robots.txt": {
        "content_type": "text/plain",
        "description": "Robot exclusion standard",
        "rfc": "RFC 9309"
    },
    "security.txt": {
        "content_type": "text/plain", 
        "description": "Security contact information",
        "rfc": "RFC 9116"
    },
    "ai.txt": {
        "content_type": "text/plain",
        "description": "AI usage policies",
        "rfc": "Draft"
    },
    "dnt-policy.txt": {
        "content_type": "text/plain",
        "description": "Do Not Track policy", 
        "rfc": "W3C"
    },
    "change-password": {
        "content_type": "text/plain",
        "description": "Change password URL",
        "rfc": "RFC 8615"
    }
}


def validate_security_txt(content: str) -> Optional[str]:
    """Validate security.txt format and add headers if missing."""
    if not content:
        return None
    
    lines = content.strip().split('\n')
    
    # Check if Expires field exists
    has_expires = any(line.strip().startswith('Expires:') for line in lines)
    
    # Add Expires field if missing (6 months from now)
    if not has_expires:
        expires = datetime.now(timezone.utc).replace(microsecond=0) + timedelta(days=180)
        lines.append(f"Expires: {expires.isoformat()}Z")
    
    # Ensure it starts with required headers
    validated = []
    
    # Add header comment if not present
    if not lines[0].startswith('#'):
        validated.append("# Security contact information for MCP Gateway")
        validated.append(f"# Generated: {datetime.now(timezone.utc).replace(microsecond=0).isoformat()}Z")
        validated.append("")
    
    validated.extend(lines)
    
    return '\n'.join(validated)


@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response, request: Request):
    """
    Serve well-known URI files.
    
    Supports:
    - robots.txt: Robot exclusion (default: disallow all)
    - security.txt: Security contact information (if configured)
    - Custom files: Additional well-known files via configuration
    
    Args:
        filename: The well-known filename requested
        response: FastAPI response object for headers
        request: FastAPI request object for logging
        
    Returns:
        Plain text content of the requested file
        
    Raises:
        HTTPException: 404 if file not found or well-known disabled
    """
    if not settings.well_known_enabled:
        raise HTTPException(status_code=404, detail="Not found")
    
    # Normalize filename (remove any leading slashes)
    filename = filename.strip('/')
    
    # Set cache headers
    response.headers["Cache-Control"] = f"public, max-age={settings.well_known_cache_max_age}"
    
    # Handle robots.txt
    if filename == "robots.txt":
        response.headers["X-Robots-Tag"] = "noindex, nofollow"
        return PlainTextResponse(
            content=settings.well_known_robots_txt,
            media_type="text/plain; charset=utf-8"
        )
    
    # Handle security.txt
    elif filename == "security.txt":
        if not settings.well_known_security_txt_enabled:
            raise HTTPException(status_code=404, detail="security.txt not configured")
        
        content = validate_security_txt(settings.well_known_security_txt)
        if not content:
            raise HTTPException(status_code=404, detail="security.txt not configured")
        
        return PlainTextResponse(
            content=content,
            media_type="text/plain; charset=utf-8"
        )
    
    # Handle custom files
    elif filename in settings.custom_well_known_files:
        content = settings.custom_well_known_files[filename]
        
        # Determine content type
        content_type = "text/plain; charset=utf-8"
        if filename in WELL_KNOWN_REGISTRY:
            content_type = f"{WELL_KNOWN_REGISTRY[filename]['content_type']}; charset=utf-8"
        
        return PlainTextResponse(
            content=content,
            media_type=content_type
        )
    
    # File not found
    else:
        # Provide helpful error for known well-known URIs
        if filename in WELL_KNOWN_REGISTRY:
            raise HTTPException(
                status_code=404,
                detail=f"{filename} is not configured. "
                       f"This is a {WELL_KNOWN_REGISTRY[filename]['description']} file."
            )
        else:
            raise HTTPException(status_code=404, detail="Not found")


@router.get("/admin/well-known", response_model=dict)
async def get_well_known_status(user: str = Depends(require_auth)):
    """
    Get status of well-known URI configuration.
    
    Returns current configuration and available well-known files.
    """
    configured_files = []
    
    # Always available
    configured_files.append({
        "path": "/.well-known/robots.txt",
        "enabled": True,
        "description": "Robot exclusion standard",
        "cache_max_age": settings.well_known_cache_max_age
    })
    
    # Conditionally available
    if settings.well_known_security_txt_enabled:
        configured_files.append({
            "path": "/.well-known/security.txt",
            "enabled": True,
            "description": "Security contact information",
            "cache_max_age": settings.well_known_cache_max_age
        })
    
    # Custom files
    for filename in settings.custom_well_known_files:
        configured_files.append({
            "path": f"/.well-known/{filename}",
            "enabled": True,
            "description": "Custom well-known file",
            "cache_max_age": settings.well_known_cache_max_age
        })
    
    return {
        "enabled": settings.well_known_enabled,
        "configured_files": configured_files,
        "supported_files": list(WELL_KNOWN_REGISTRY.keys()),
        "cache_max_age": settings.well_known_cache_max_age
    }

3. Update main.py to Include Router

# In mcpgateway/main.py

from mcpgateway.routers import well_known

# Include the well-known router (no prefix needed since paths start with /.well-known)
app.include_router(well_known.router)

4. Update .env.example

#####################################
# Well-Known URI Configuration
#####################################

# Enable well-known URI endpoints (/.well-known/*)
WELL_KNOWN_ENABLED=true

# robots.txt content - Default blocks all crawlers (private API)
# Use multiline with proper escaping or keep on one line
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /\n\n# MCP Gateway is a private API gateway\n# Public crawling is disabled by default"

# security.txt content - Define your security contact information
# Format: RFC 9116 (https://www.rfc-editor.org/rfc/rfc9116.html)
# Leave empty to disable security.txt
# Example:
# WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en\nCanonical: https://example.com/.well-known/security.txt"
WELL_KNOWN_SECURITY_TXT=""

# Additional custom well-known files (JSON format)
# Example: {"ai.txt": "AI Usage: This service uses AI for tool orchestration...", "dnt-policy.txt": "We respect DNT headers..."}
WELL_KNOWN_CUSTOM_FILES="{}"

# Cache control for well-known files (seconds)
WELL_KNOWN_CACHE_MAX_AGE=3600  # 1 hour

5. Add Tests

# In tests/test_well_known.py

import pytest
from fastapi.testclient import TestClient

def test_robots_txt_default(client: TestClient):
    """Test default robots.txt blocks all crawlers."""
    response = client.get("/.well-known/robots.txt")
    assert response.status_code == 200
    assert "User-agent: *" in response.text
    assert "Disallow: /" in response.text
    assert response.headers["content-type"] == "text/plain; charset=utf-8"
    assert "Cache-Control" in response.headers

def test_security_txt_not_configured(client: TestClient):
    """Test security.txt returns 404 when not configured."""
    response = client.get("/.well-known/security.txt")
    assert response.status_code == 404

def test_security_txt_configured(client: TestClient, monkeypatch):
    """Test security.txt when configured."""
    monkeypatch.setenv("WELL_KNOWN_SECURITY_TXT", "Contact: [email protected]")
    # Reinitialize settings
    from mcpgateway.config import settings
    settings.well_known_security_txt = "Contact: [email protected]"
    settings.well_known_security_txt_enabled = True
    
    response = client.get("/.well-known/security.txt")
    assert response.status_code == 200
    assert "Contact: [email protected]" in response.text
    assert "Expires:" in response.text  # Auto-added

def test_custom_well_known_file(client: TestClient, monkeypatch):
    """Test custom well-known files."""
    monkeypatch.setenv("WELL_KNOWN_CUSTOM_FILES", '{"ai.txt": "AI Policy: We use AI responsibly"}')
    # Reinitialize settings
    from mcpgateway.config import settings
    settings.well_known_custom_files = '{"ai.txt": "AI Policy: We use AI responsibly"}'
    
    response = client.get("/.well-known/ai.txt")
    assert response.status_code == 200
    assert "AI Policy: We use AI responsibly" in response.text

def test_unknown_well_known_file(client: TestClient):
    """Test unknown well-known file returns 404."""
    response = client.get("/.well-known/unknown.txt")
    assert response.status_code == 404

def test_well_known_disabled(client: TestClient, monkeypatch):
    """Test well-known endpoints when disabled."""
    monkeypatch.setenv("WELL_KNOWN_ENABLED", "false")
    from mcpgateway.config import settings
    settings.well_known_enabled = False
    
    response = client.get("/.well-known/robots.txt")
    assert response.status_code == 404

def test_well_known_admin_status(client: TestClient, auth_headers):
    """Test admin status endpoint."""
    response = client.get("/admin/well-known", headers=auth_headers)
    assert response.status_code == 200
    data = response.json()
    assert data["enabled"] is True
    assert any(f["path"] == "/.well-known/robots.txt" for f in data["configured_files"])

6. Example Configurations

# Example 1: Basic security.txt
WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://mycompany.com/security
Encryption: https://mycompany.com/pgp-key.txt
Preferred-Languages: en, es
Canonical: https://api.mycompany.com/.well-known/security.txt"

# Example 2: Custom AI policy
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight"}

# Example 3: Allow specific crawlers
WELL_KNOWN_ROBOTS_TXT="User-agent: internal-monitor
Allow: /health
Allow: /metrics

User-agent: *
Disallow: /"

# Example 4: Multiple custom files
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight", "dnt-policy.txt": "# Do Not Track Policy\n\nWe respect the DNT header.\nNo tracking cookies are used.\nOnly essential session data is stored.", "change-password": "https://mycompany.com/account/password"}

Usage Examples

1. Basic Setup (Private API)

# Default configuration blocks all crawlers
curl https://api.example.com/.well-known/robots.txt
# Returns:
# User-agent: *
# Disallow: /
# 
# MCP Gateway is a private API gateway
# Public crawling is disabled by default

2. Security Contact Configuration

# Configure security contact
export WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://example.com/security
Acknowledgments: https://example.com/security/thanks
Preferred-Languages: en, fr, es
Hiring: https://example.com/careers"

# Access security.txt
curl https://api.example.com/.well-known/security.txt
# Returns formatted security.txt with auto-generated Expires header

3. AI Usage Policy

# Configure AI policy
export WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "# AI Usage Policy\n\nAI Model: Tool orchestration only\nData Retention: No training on user data\nHuman Oversight: Required for all operations"}'

# Access AI policy
curl https://api.example.com/.well-known/ai.txt

4. Admin Monitoring

# Check well-known configuration status
curl -H "Authorization: Bearer $API_KEY" \
  https://api.example.com/admin/well-known

# Returns:
{
  "enabled": true,
  "configured_files": [
    {
      "path": "/.well-known/robots.txt",
      "enabled": true,
      "description": "Robot exclusion standard",
      "cache_max_age": 3600
    },
    {
      "path": "/.well-known/security.txt",
      "enabled": true,
      "description": "Security contact information",
      "cache_max_age": 3600
    }
  ],
  "supported_files": [
    "robots.txt",
    "security.txt",
    "ai.txt",
    "dnt-policy.txt",
    "change-password"
  ],
  "cache_max_age": 3600
}

Security Considerations

  1. Content Validation: The security.txt validator ensures proper format and adds required fields
  2. Cache Headers: Configurable cache control prevents excessive requests
  3. Path Traversal Protection: Filename normalization prevents directory traversal
  4. Admin-Only Status: Configuration status requires authentication
  5. No Dynamic Content: All content is statically configured via environment variables

Deployment Guide

Docker Deployment

# In your Docker environment
ENV WELL_KNOWN_ENABLED=true
ENV WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /api/\nAllow: /api/health"
ENV WELL_KNOWN_SECURITY_TXT="Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z"
ENV WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "AI Policy: Responsible use only"}'
ENV WELL_KNOWN_CACHE_MAX_AGE=3600

Kubernetes ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-gateway-wellknown
data:
  WELL_KNOWN_ENABLED: "true"
  WELL_KNOWN_ROBOTS_TXT: |
    User-agent: *
    Disallow: /
    
    # Private API - No public crawling
  WELL_KNOWN_SECURITY_TXT: |
    Contact: mailto:[email protected]
    Expires: 2025-12-31T23:59:59Z
    Preferred-Languages: en
  WELL_KNOWN_CUSTOM_FILES: |
    {
      "ai.txt": "This service uses AI for tool orchestration only.",
      "dnt-policy.txt": "We honor Do Not Track headers."
    }

Docker Compose

services:
  mcp-gateway:
    environment:
      WELL_KNOWN_ENABLED: "true"
      WELL_KNOWN_ROBOTS_TXT: |
        User-agent: monitoring-bot
        Allow: /health
        
        User-agent: *
        Disallow: /
      WELL_KNOWN_SECURITY_TXT: |
        Contact: [email protected]
        Encryption: https://example.com/pgp
      WELL_KNOWN_CUSTOM_FILES: '{"ai.txt": "AI is used for tool orchestration"}'
      WELL_KNOWN_CACHE_MAX_AGE: "7200"

Monitoring and Observability

Prometheus Metrics

Add metrics to track well-known URI usage:

# In well_known.py
from prometheus_client import Counter, Histogram

well_known_requests = Counter(
    'mcp_gateway_well_known_requests_total',
    'Total well-known URI requests',
    ['filename', 'status']
)

well_known_request_duration = Histogram(
    'mcp_gateway_well_known_request_duration_seconds',
    'Well-known URI request duration',
    ['filename']
)

# In the handler
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response, request: Request):
    with well_known_request_duration.labels(filename=filename).time():
        # ... existing logic ...
        well_known_requests.labels(filename=filename, status="found").inc()

Logging

The feature includes structured logging for security monitoring:

# Log well-known access
logger.info(
    "Well-known URI accessed",
    extra={
        "filename": filename,
        "ip": request.client.host,
        "user_agent": request.headers.get("user-agent"),
        "cache_hit": False
    }
)

Testing Checklist

  • Default robots.txt blocks all crawlers
  • Security.txt auto-generates Expires header
  • Custom files are served with correct content-type
  • Unknown files return 404
  • Cache headers are properly set
  • Path traversal attempts are blocked
  • Admin status endpoint requires authentication
  • Disabled well-known returns 404 for all files

Future Enhancements

  1. Dynamic Content: Support for template variables (e.g., {{DOMAIN}}, {{CONTACT_EMAIL}})
  2. File Upload: Admin API to upload well-known files
  3. Signature Support: GPG signing for security.txt
  4. Rate Limiting: Specific limits for well-known endpoints
  5. A/B Testing: Serve different robots.txt based on user agent
  6. Internationalization: Multi-language support for policy files

FAQ

Q: Why disable crawling by default?
A: MCP Gateway is typically a private API gateway. Public crawling could expose API structure and endpoints.

Q: Can I serve HTML files?
A: The current implementation focuses on plain text files per well-known URI standards. HTML would require additional security considerations.

Q: How do I update well-known files?
A: Update environment variables and restart the service. For zero-downtime updates, use rolling deployments.

Q: Are there size limits?
A: Environment variable size limits apply (typically 32KB-1MB depending on platform). Large files should be served differently.

Q: Can I disable caching?
A: Set WELL_KNOWN_CACHE_MAX_AGE=0 to disable caching, though this increases server load.

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestsecurityImproves securitytriageIssues / Features awaiting triage

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions