Skip to content

Enhancement: Support for Modern Cloudflare Challenge Format #298

@rsforbes

Description

@rsforbes

Enhancement: Support for Modern Cloudflare Challenge Format

Summary

My API library for prosportstransactions.com stopped working because of Cloudflare protection. I integrated cloudscraper v3.0.0 hoping to solve the problem. Unfortunately, I was unsuccessful in my attempt. This led me to conduct extensive research on modern Cloudflare challenges and develop enhanced V3 handler improvements that could benefit the cloudscraper project.

Key Finding: My enhanced V3 handler successfully detects and parses new window._cf_chl_opt challenge structures, but prosportstransactions.com (and similar advanced sites) remain inaccessible due to transport-layer TLS fingerprint detection.

While my work did not solve the core TLS fingerprinting problem that blocks advanced Cloudflare protection sites like prosportstransactions.com, I identified significant improvements to challenge detection and handling that would enhance cloudscraper's compatibility with modern challenge formats.

Research Context

Target Site: prosportstransactions.com (advanced Cloudflare protection)
Research Branch: https://github.com/rsforbes/pro_sports_transactions/tree/feature/cloudflare-bypass-research
Cloudscraper Version: v3.0.0 (from tag, not PyPI)
Test Methodology: 12 systematic configurations including:

Installation Used:

cloudscraper = {git = "https://github.com/VeNoMouS/cloudscraper.git", rev = "refs/pull/295/head"}

Key Findings

New Challenge Format Discovered

Modern Cloudflare challenges now use a different JavaScript structure that current cloudscraper doesn't detect:

Traditional format (currently detected):

window._cf_chl_ctx = {...};

Modern format (not detected):

window._cf_chl_opt = {
    cvId: '3',
    cZone: 'prosportstransactions.com',
    cType: 'managed',
    cRay: '956ddeea7c0960a1',
    cH: 'R7tFH...',
    cUPMDTk: 'R7tFH...',
    cFPWv: 'b',
    cITimeS: '1751119949',
    fa: '/cdn-cgi/challenge-platform/h/b/g/orchestrate/managed/v1?...',
    md: 'MhKFh...',
    mdrd: 'cOM9...'
};

Updated URL Patterns

Traditional: /cdn-cgi/challenge-platform/h/b/
Modern: /cdn-cgi/challenge-platform/h/b/jsd/r/{complex_identifier}/{ray_id}

Where {complex_identifier} is extracted from:

__CF$cv$params = {
    r: '0.01313896590161113:1751120168:uuGQcGrMYKAbiU7S-5nWc8aWLxMzIT5mqxDn71u5s1Q',
    t: 'MTc1MTEyMDkwNy4wMDAwMDA='
};

Payload Format Changes

Traditional: Form data (application/x-www-form-urlencoded)
Modern: JSON payload (text/plain;charset=UTF-8)

Proposed Enhancements

1. Enhanced Challenge Detection

Update is_V3_Challenge() to detect modern format:

@staticmethod
def is_V3_Challenge(resp):
    try:
        return (
            resp.headers.get("Server", "").startswith("cloudflare")
            and resp.status_code in [403, 429, 503]
            and (
                # Existing patterns...
                re.search(r"""cpo\.src\s*=\s*['\"]/cdn-cgi/challenge-platform/\S+orchestrate/jsch/v3""", resp.text, re.M | re.S)
                or re.search(r"window\._cf_chl_ctx\s*=", resp.text, re.M | re.S)
                or re.search(r'<form[^>]*id="challenge-form"[^>]*action="[^"]*__cf_chl_rt_tk=', resp.text, re.M | re.S)
                or
                # NEW: Modern challenge format detection
                (
                    "Just a moment" in resp.text
                    and "/challenge-platform/" in resp.text
                    and re.search(r"window\._cf_chl_opt\s*=", resp.text)
                    and resp.headers.get("cf-mitigated") == "challenge"
                )
            )
        )
    except AttributeError:
        pass
    return False

2. JavaScript Object Parser

Many modern challenges use JavaScript object notation instead of JSON:

def parse_js_object_manually(self, js_obj_str):
    """Manually parse JavaScript object when JSON parsing fails"""
    try:
        data = {}
        patterns = [
            (r"cvId:\s*'([^']+)'", "cvId"),
            (r'cZone:\s*"([^"]+)"', "cZone"),
            (r"cType:\s*'([^']+)'", "cType"),
            (r"cRay:\s*'([^']+)'", "cRay"),
            (r'cH:\s*"([^"]+)"', "cH"),
            (r'cUPMDTk:\s*"([^"]+)"', "cUPMDTk"),
            (r"cFPWv:\s*'([^']+)'", "cFPWv"),
            (r"cITimeS:\s*'([^']+)'", "cITimeS"),
        ]
        
        for pattern, key in patterns:
            match = re.search(pattern, js_obj_str)
            if match:
                data[key] = match.group(1)
        
        return data
    except Exception:
        return {}

3. Complex URL Construction

Support for modern challenge URL patterns:

# Extract complex identifier from __CF$cv$params
cf_params_match = re.search(
    r'__CF\$cv\$params\s*=\s*\{.*?r:\s*[\'"]([^\'"]+)[\'"]',
    resp.text,
    re.DOTALL,
)

if cf_params_match and "cRay" in opt_data:
    r_param = cf_params_match.group(1)
    ray_id = opt_data["cRay"]
    form_action = f"/cdn-cgi/challenge-platform/h/b/jsd/r/{r_param}/{ray_id}"

4. JSON Payload Support

Handle modern JSON payloads instead of form data:

# For modern challenges, send JSON payload
if challenge_data.get("is_modern", False):
    payload_data = {
        "chctx": opt_data,
        "answer": challenge_answer,
    }
    
    # Add specific fields from opt_data
    for key in ["cvId", "cRay", "cType", "cZone", "cUPMDTk", "cFPWv", "cITimeS"]:
        if key in opt_data:
            payload_data[key] = opt_data[key]
    
    # Use JSON content type
    headers.update({
        "Content-Type": "text/plain;charset=UTF-8",
        "Accept": "*/*",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
    })
    
    return json.dumps(payload_data)

Implementation Reference

My complete implementation is available in the research branch:

  • Enhanced V3 Handler: temp_cloudscraper/cloudscraper/cloudflare_v3_patched.py (495 lines)
  • Integration Code: src/pro_sports_transactions/search.py (CloudscraperConfig class)
  • Test Results: docs/cloudscraper/cloudscraper-testing.md (12 systematic tests)
  • Technical Analysis: docs/cloudscraper/TECHNICAL_ANALYSIS.md

Test Results

I tested across multiple configurations with cloudscraper v3.0.0:

Configuration Base Version Result Key Finding
Standard v3.0.0 Tag release TIMEOUT Basic implementation insufficient
PR #295 (Session) Pull request FAIL Session improvements help but insufficient
PR #283 (Headers) Pull request TIMEOUT Additional headers don't solve core issue
Custom V3 Patched My enhancement FAIL Successfully detects modern challenges but blocked by TLS fingerprinting

Despite these improvements, advanced sites like prosportstransactions.com still block requests due to TLS fingerprinting - the fundamental limitation that requires browser-level solutions. However, these enhancements would improve cloudscraper's compatibility with sites using modern challenge formats.

Key Discovery: My patched V3 handler successfully detected and parsed the modern challenge format, proving the enhancement works - the failure occurs at the TLS transport layer, not the challenge handling layer.

Benefits to Cloudscraper

  1. Broader Compatibility: Support sites using modern challenge format
  2. Future-Proofing: Handle evolving Cloudflare challenge patterns
  3. Better Parsing: Robust JavaScript object handling
  4. Modern Standards: JSON payload support for current challenges

Implementation Priority

High Priority: Challenge detection improvements (#1)
Medium Priority: JavaScript object parser (#2) and URL construction (#3)
Low Priority: JSON payload support (#4) - fewer sites use this format

Notes

  • Implementation maintains backward compatibility with existing challenges
  • TLS fingerprinting remains the primary obstacle for advanced protection sites - this is the core unsolved problem

Collaboration

While I haven't solved the TLS fingerprinting challenge, these improvements would benefit cloudscraper's compatibility with modern challenge formats. For my specific use case with prosportstransactions.com, I'll be exploring Playwright and curl_cffi to see if I can bypass the current TLS fingerprinting obstacle. If there's interest on addressing TLS fingerprinting within cloudscraper itself, I'd be happy to submit a PR for the above work.


Research Branch: https://github.com/rsforbes/pro_sports_transactions/tree/feature/cloudflare-bypass-research
Documentation: docs/cloudscraper/ directory contains complete technical analysis and test results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions