-
Notifications
You must be signed in to change notification settings - Fork 310
Description
🌐 FEATURE: Configurable Well-Known URI Handler
Summary: Implement a flexible /.well-known/*
endpoint handler that supports standard well-known URIs like security.txt
and robots.txt
with user-configurable content. Defaults assume private API deployment with crawling disabled.
Implementation
1. Update config.py
with Well-Known Configuration
# In mcpgateway/config.py
from typing import Dict, Optional
import json
from pydantic import field_validator
from pydantic_settings import BaseSettings
from mcpgateway.utils.logging import logger
class Settings(BaseSettings):
# ... existing settings ...
# ===================================
# Well-Known URI Configuration
# ===================================
# Enable well-known URI endpoints
well_known_enabled: bool = True
# robots.txt content (default: disallow all crawling for private API)
well_known_robots_txt: str = """User-agent: *
Disallow: /
# MCP Gateway is a private API gateway
# Public crawling is disabled by default"""
# security.txt content (optional, user-defined)
# Example: "Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en"
well_known_security_txt: str = ""
# Enable security.txt only if content is provided
well_known_security_txt_enabled: bool = False
# Additional custom well-known files (JSON format)
# Example: {"ai.txt": "This service uses AI for...", "dnt-policy.txt": "Do Not Track policy..."}
well_known_custom_files: str = "{}"
# Cache control for well-known files (seconds)
well_known_cache_max_age: int = 3600 # 1 hour default
@property
def custom_well_known_files(self) -> Dict[str, str]:
"""Parse custom well-known files from JSON string."""
try:
return json.loads(self.well_known_custom_files) if self.well_known_custom_files else {}
except json.JSONDecodeError:
logger.error(f"Invalid JSON in WELL_KNOWN_CUSTOM_FILES: {self.well_known_custom_files}")
return {}
@field_validator("well_known_security_txt_enabled", mode="after")
@classmethod
def _auto_enable_security_txt(cls, v, info):
"""Auto-enable security.txt if content is provided."""
if "well_known_security_txt" in info.data:
return bool(info.data["well_known_security_txt"].strip())
return v
2. Create Well-Known Handler Router
# Create mcpgateway/routers/well_known.py
from datetime import datetime, timezone, timedelta
from typing import Optional
from fastapi import APIRouter, HTTPException, Response, Request, Depends
from fastapi.responses import PlainTextResponse
from mcpgateway.config import settings
from mcpgateway.utils.logging import logger
from mcpgateway.auth import require_auth
router = APIRouter(tags=["well-known"])
# Well-known URI registry with validation
WELL_KNOWN_REGISTRY = {
"robots.txt": {
"content_type": "text/plain",
"description": "Robot exclusion standard",
"rfc": "RFC 9309"
},
"security.txt": {
"content_type": "text/plain",
"description": "Security contact information",
"rfc": "RFC 9116"
},
"ai.txt": {
"content_type": "text/plain",
"description": "AI usage policies",
"rfc": "Draft"
},
"dnt-policy.txt": {
"content_type": "text/plain",
"description": "Do Not Track policy",
"rfc": "W3C"
},
"change-password": {
"content_type": "text/plain",
"description": "Change password URL",
"rfc": "RFC 8615"
}
}
def validate_security_txt(content: str) -> Optional[str]:
"""Validate security.txt format and add headers if missing."""
if not content:
return None
lines = content.strip().split('\n')
# Check if Expires field exists
has_expires = any(line.strip().startswith('Expires:') for line in lines)
# Add Expires field if missing (6 months from now)
if not has_expires:
expires = datetime.now(timezone.utc).replace(microsecond=0) + timedelta(days=180)
lines.append(f"Expires: {expires.isoformat()}Z")
# Ensure it starts with required headers
validated = []
# Add header comment if not present
if not lines[0].startswith('#'):
validated.append("# Security contact information for MCP Gateway")
validated.append(f"# Generated: {datetime.now(timezone.utc).replace(microsecond=0).isoformat()}Z")
validated.append("")
validated.extend(lines)
return '\n'.join(validated)
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response, request: Request):
"""
Serve well-known URI files.
Supports:
- robots.txt: Robot exclusion (default: disallow all)
- security.txt: Security contact information (if configured)
- Custom files: Additional well-known files via configuration
Args:
filename: The well-known filename requested
response: FastAPI response object for headers
request: FastAPI request object for logging
Returns:
Plain text content of the requested file
Raises:
HTTPException: 404 if file not found or well-known disabled
"""
if not settings.well_known_enabled:
raise HTTPException(status_code=404, detail="Not found")
# Normalize filename (remove any leading slashes)
filename = filename.strip('/')
# Set cache headers
response.headers["Cache-Control"] = f"public, max-age={settings.well_known_cache_max_age}"
# Handle robots.txt
if filename == "robots.txt":
response.headers["X-Robots-Tag"] = "noindex, nofollow"
return PlainTextResponse(
content=settings.well_known_robots_txt,
media_type="text/plain; charset=utf-8"
)
# Handle security.txt
elif filename == "security.txt":
if not settings.well_known_security_txt_enabled:
raise HTTPException(status_code=404, detail="security.txt not configured")
content = validate_security_txt(settings.well_known_security_txt)
if not content:
raise HTTPException(status_code=404, detail="security.txt not configured")
return PlainTextResponse(
content=content,
media_type="text/plain; charset=utf-8"
)
# Handle custom files
elif filename in settings.custom_well_known_files:
content = settings.custom_well_known_files[filename]
# Determine content type
content_type = "text/plain; charset=utf-8"
if filename in WELL_KNOWN_REGISTRY:
content_type = f"{WELL_KNOWN_REGISTRY[filename]['content_type']}; charset=utf-8"
return PlainTextResponse(
content=content,
media_type=content_type
)
# File not found
else:
# Provide helpful error for known well-known URIs
if filename in WELL_KNOWN_REGISTRY:
raise HTTPException(
status_code=404,
detail=f"{filename} is not configured. "
f"This is a {WELL_KNOWN_REGISTRY[filename]['description']} file."
)
else:
raise HTTPException(status_code=404, detail="Not found")
@router.get("/admin/well-known", response_model=dict)
async def get_well_known_status(user: str = Depends(require_auth)):
"""
Get status of well-known URI configuration.
Returns current configuration and available well-known files.
"""
configured_files = []
# Always available
configured_files.append({
"path": "/.well-known/robots.txt",
"enabled": True,
"description": "Robot exclusion standard",
"cache_max_age": settings.well_known_cache_max_age
})
# Conditionally available
if settings.well_known_security_txt_enabled:
configured_files.append({
"path": "/.well-known/security.txt",
"enabled": True,
"description": "Security contact information",
"cache_max_age": settings.well_known_cache_max_age
})
# Custom files
for filename in settings.custom_well_known_files:
configured_files.append({
"path": f"/.well-known/{filename}",
"enabled": True,
"description": "Custom well-known file",
"cache_max_age": settings.well_known_cache_max_age
})
return {
"enabled": settings.well_known_enabled,
"configured_files": configured_files,
"supported_files": list(WELL_KNOWN_REGISTRY.keys()),
"cache_max_age": settings.well_known_cache_max_age
}
3. Update main.py
to Include Router
# In mcpgateway/main.py
from mcpgateway.routers import well_known
# Include the well-known router (no prefix needed since paths start with /.well-known)
app.include_router(well_known.router)
4. Update .env.example
#####################################
# Well-Known URI Configuration
#####################################
# Enable well-known URI endpoints (/.well-known/*)
WELL_KNOWN_ENABLED=true
# robots.txt content - Default blocks all crawlers (private API)
# Use multiline with proper escaping or keep on one line
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /\n\n# MCP Gateway is a private API gateway\n# Public crawling is disabled by default"
# security.txt content - Define your security contact information
# Format: RFC 9116 (https://www.rfc-editor.org/rfc/rfc9116.html)
# Leave empty to disable security.txt
# Example:
# WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en\nCanonical: https://example.com/.well-known/security.txt"
WELL_KNOWN_SECURITY_TXT=""
# Additional custom well-known files (JSON format)
# Example: {"ai.txt": "AI Usage: This service uses AI for tool orchestration...", "dnt-policy.txt": "We respect DNT headers..."}
WELL_KNOWN_CUSTOM_FILES="{}"
# Cache control for well-known files (seconds)
WELL_KNOWN_CACHE_MAX_AGE=3600 # 1 hour
5. Add Tests
# In tests/test_well_known.py
import pytest
from fastapi.testclient import TestClient
def test_robots_txt_default(client: TestClient):
"""Test default robots.txt blocks all crawlers."""
response = client.get("/.well-known/robots.txt")
assert response.status_code == 200
assert "User-agent: *" in response.text
assert "Disallow: /" in response.text
assert response.headers["content-type"] == "text/plain; charset=utf-8"
assert "Cache-Control" in response.headers
def test_security_txt_not_configured(client: TestClient):
"""Test security.txt returns 404 when not configured."""
response = client.get("/.well-known/security.txt")
assert response.status_code == 404
def test_security_txt_configured(client: TestClient, monkeypatch):
"""Test security.txt when configured."""
monkeypatch.setenv("WELL_KNOWN_SECURITY_TXT", "Contact: [email protected]")
# Reinitialize settings
from mcpgateway.config import settings
settings.well_known_security_txt = "Contact: [email protected]"
settings.well_known_security_txt_enabled = True
response = client.get("/.well-known/security.txt")
assert response.status_code == 200
assert "Contact: [email protected]" in response.text
assert "Expires:" in response.text # Auto-added
def test_custom_well_known_file(client: TestClient, monkeypatch):
"""Test custom well-known files."""
monkeypatch.setenv("WELL_KNOWN_CUSTOM_FILES", '{"ai.txt": "AI Policy: We use AI responsibly"}')
# Reinitialize settings
from mcpgateway.config import settings
settings.well_known_custom_files = '{"ai.txt": "AI Policy: We use AI responsibly"}'
response = client.get("/.well-known/ai.txt")
assert response.status_code == 200
assert "AI Policy: We use AI responsibly" in response.text
def test_unknown_well_known_file(client: TestClient):
"""Test unknown well-known file returns 404."""
response = client.get("/.well-known/unknown.txt")
assert response.status_code == 404
def test_well_known_disabled(client: TestClient, monkeypatch):
"""Test well-known endpoints when disabled."""
monkeypatch.setenv("WELL_KNOWN_ENABLED", "false")
from mcpgateway.config import settings
settings.well_known_enabled = False
response = client.get("/.well-known/robots.txt")
assert response.status_code == 404
def test_well_known_admin_status(client: TestClient, auth_headers):
"""Test admin status endpoint."""
response = client.get("/admin/well-known", headers=auth_headers)
assert response.status_code == 200
data = response.json()
assert data["enabled"] is True
assert any(f["path"] == "/.well-known/robots.txt" for f in data["configured_files"])
6. Example Configurations
# Example 1: Basic security.txt
WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://mycompany.com/security
Encryption: https://mycompany.com/pgp-key.txt
Preferred-Languages: en, es
Canonical: https://api.mycompany.com/.well-known/security.txt"
# Example 2: Custom AI policy
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight"}
# Example 3: Allow specific crawlers
WELL_KNOWN_ROBOTS_TXT="User-agent: internal-monitor
Allow: /health
Allow: /metrics
User-agent: *
Disallow: /"
# Example 4: Multiple custom files
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight", "dnt-policy.txt": "# Do Not Track Policy\n\nWe respect the DNT header.\nNo tracking cookies are used.\nOnly essential session data is stored.", "change-password": "https://mycompany.com/account/password"}
Usage Examples
1. Basic Setup (Private API)
# Default configuration blocks all crawlers
curl https://api.example.com/.well-known/robots.txt
# Returns:
# User-agent: *
# Disallow: /
#
# MCP Gateway is a private API gateway
# Public crawling is disabled by default
2. Security Contact Configuration
# Configure security contact
export WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://example.com/security
Acknowledgments: https://example.com/security/thanks
Preferred-Languages: en, fr, es
Hiring: https://example.com/careers"
# Access security.txt
curl https://api.example.com/.well-known/security.txt
# Returns formatted security.txt with auto-generated Expires header
3. AI Usage Policy
# Configure AI policy
export WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "# AI Usage Policy\n\nAI Model: Tool orchestration only\nData Retention: No training on user data\nHuman Oversight: Required for all operations"}'
# Access AI policy
curl https://api.example.com/.well-known/ai.txt
4. Admin Monitoring
# Check well-known configuration status
curl -H "Authorization: Bearer $API_KEY" \
https://api.example.com/admin/well-known
# Returns:
{
"enabled": true,
"configured_files": [
{
"path": "/.well-known/robots.txt",
"enabled": true,
"description": "Robot exclusion standard",
"cache_max_age": 3600
},
{
"path": "/.well-known/security.txt",
"enabled": true,
"description": "Security contact information",
"cache_max_age": 3600
}
],
"supported_files": [
"robots.txt",
"security.txt",
"ai.txt",
"dnt-policy.txt",
"change-password"
],
"cache_max_age": 3600
}
Security Considerations
- Content Validation: The security.txt validator ensures proper format and adds required fields
- Cache Headers: Configurable cache control prevents excessive requests
- Path Traversal Protection: Filename normalization prevents directory traversal
- Admin-Only Status: Configuration status requires authentication
- No Dynamic Content: All content is statically configured via environment variables
Deployment Guide
Docker Deployment
# In your Docker environment
ENV WELL_KNOWN_ENABLED=true
ENV WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /api/\nAllow: /api/health"
ENV WELL_KNOWN_SECURITY_TXT="Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z"
ENV WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "AI Policy: Responsible use only"}'
ENV WELL_KNOWN_CACHE_MAX_AGE=3600
Kubernetes ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: mcp-gateway-wellknown
data:
WELL_KNOWN_ENABLED: "true"
WELL_KNOWN_ROBOTS_TXT: |
User-agent: *
Disallow: /
# Private API - No public crawling
WELL_KNOWN_SECURITY_TXT: |
Contact: mailto:[email protected]
Expires: 2025-12-31T23:59:59Z
Preferred-Languages: en
WELL_KNOWN_CUSTOM_FILES: |
{
"ai.txt": "This service uses AI for tool orchestration only.",
"dnt-policy.txt": "We honor Do Not Track headers."
}
Docker Compose
services:
mcp-gateway:
environment:
WELL_KNOWN_ENABLED: "true"
WELL_KNOWN_ROBOTS_TXT: |
User-agent: monitoring-bot
Allow: /health
User-agent: *
Disallow: /
WELL_KNOWN_SECURITY_TXT: |
Contact: [email protected]
Encryption: https://example.com/pgp
WELL_KNOWN_CUSTOM_FILES: '{"ai.txt": "AI is used for tool orchestration"}'
WELL_KNOWN_CACHE_MAX_AGE: "7200"
Monitoring and Observability
Prometheus Metrics
Add metrics to track well-known URI usage:
# In well_known.py
from prometheus_client import Counter, Histogram
well_known_requests = Counter(
'mcp_gateway_well_known_requests_total',
'Total well-known URI requests',
['filename', 'status']
)
well_known_request_duration = Histogram(
'mcp_gateway_well_known_request_duration_seconds',
'Well-known URI request duration',
['filename']
)
# In the handler
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response, request: Request):
with well_known_request_duration.labels(filename=filename).time():
# ... existing logic ...
well_known_requests.labels(filename=filename, status="found").inc()
Logging
The feature includes structured logging for security monitoring:
# Log well-known access
logger.info(
"Well-known URI accessed",
extra={
"filename": filename,
"ip": request.client.host,
"user_agent": request.headers.get("user-agent"),
"cache_hit": False
}
)
Testing Checklist
- Default robots.txt blocks all crawlers
- Security.txt auto-generates Expires header
- Custom files are served with correct content-type
- Unknown files return 404
- Cache headers are properly set
- Path traversal attempts are blocked
- Admin status endpoint requires authentication
- Disabled well-known returns 404 for all files
Future Enhancements
- Dynamic Content: Support for template variables (e.g.,
{{DOMAIN}}
,{{CONTACT_EMAIL}}
) - File Upload: Admin API to upload well-known files
- Signature Support: GPG signing for security.txt
- Rate Limiting: Specific limits for well-known endpoints
- A/B Testing: Serve different robots.txt based on user agent
- Internationalization: Multi-language support for policy files
FAQ
Q: Why disable crawling by default?
A: MCP Gateway is typically a private API gateway. Public crawling could expose API structure and endpoints.
Q: Can I serve HTML files?
A: The current implementation focuses on plain text files per well-known URI standards. HTML would require additional security considerations.
Q: How do I update well-known files?
A: Update environment variables and restart the service. For zero-downtime updates, use rolling deployments.
Q: Are there size limits?
A: Environment variable size limits apply (typically 32KB-1MB depending on platform). Large files should be served differently.
Q: Can I disable caching?
A: Set WELL_KNOWN_CACHE_MAX_AGE=0
to disable caching, though this increases server load.
References
- RFC 8615 - Well-Known Uniform Resource Identifiers (URIs)
- RFC 9309 - Robots Exclusion Protocol
- RFC 9116 - security.txt
- Well-Known URI Registry - IANA registry