-
-
Notifications
You must be signed in to change notification settings - Fork 563
Description
Enhancement: Support for Modern Cloudflare Challenge Format
Summary
My API library for prosportstransactions.com stopped working because of Cloudflare protection. I integrated cloudscraper v3.0.0 hoping to solve the problem. Unfortunately, I was unsuccessful in my attempt. This led me to conduct extensive research on modern Cloudflare challenges and develop enhanced V3 handler improvements that could benefit the cloudscraper project.
Key Finding: My enhanced V3 handler successfully detects and parses new window._cf_chl_opt
challenge structures, but prosportstransactions.com (and similar advanced sites) remain inaccessible due to transport-layer TLS fingerprint detection.
While my work did not solve the core TLS fingerprinting problem that blocks advanced Cloudflare protection sites like prosportstransactions.com, I identified significant improvements to challenge detection and handling that would enhance cloudscraper's compatibility with modern challenge formats.
Research Context
Target Site: prosportstransactions.com (advanced Cloudflare protection)
Research Branch: https://github.com/rsforbes/pro_sports_transactions/tree/feature/cloudflare-bypass-research
Cloudscraper Version: v3.0.0 (from tag, not PyPI)
Test Methodology: 12 systematic configurations including:
- Standard v3.0.0 base implementation
- PR Bug fixes to v3 #295 testing (session management improvements)
- PR cleanup Lib and CloudScraper TLS bypass is production ready #283 testing (additional headers and browser fingerprinting)
- Custom patched V3 handler for modern challenge detection
Installation Used:
cloudscraper = {git = "https://github.com/VeNoMouS/cloudscraper.git", rev = "refs/pull/295/head"}
Key Findings
New Challenge Format Discovered
Modern Cloudflare challenges now use a different JavaScript structure that current cloudscraper doesn't detect:
Traditional format (currently detected):
window._cf_chl_ctx = {...};
Modern format (not detected):
window._cf_chl_opt = {
cvId: '3',
cZone: 'prosportstransactions.com',
cType: 'managed',
cRay: '956ddeea7c0960a1',
cH: 'R7tFH...',
cUPMDTk: 'R7tFH...',
cFPWv: 'b',
cITimeS: '1751119949',
fa: '/cdn-cgi/challenge-platform/h/b/g/orchestrate/managed/v1?...',
md: 'MhKFh...',
mdrd: 'cOM9...'
};
Updated URL Patterns
Traditional: /cdn-cgi/challenge-platform/h/b/
Modern: /cdn-cgi/challenge-platform/h/b/jsd/r/{complex_identifier}/{ray_id}
Where {complex_identifier}
is extracted from:
__CF$cv$params = {
r: '0.01313896590161113:1751120168:uuGQcGrMYKAbiU7S-5nWc8aWLxMzIT5mqxDn71u5s1Q',
t: 'MTc1MTEyMDkwNy4wMDAwMDA='
};
Payload Format Changes
Traditional: Form data (application/x-www-form-urlencoded
)
Modern: JSON payload (text/plain;charset=UTF-8
)
Proposed Enhancements
1. Enhanced Challenge Detection
Update is_V3_Challenge()
to detect modern format:
@staticmethod
def is_V3_Challenge(resp):
try:
return (
resp.headers.get("Server", "").startswith("cloudflare")
and resp.status_code in [403, 429, 503]
and (
# Existing patterns...
re.search(r"""cpo\.src\s*=\s*['\"]/cdn-cgi/challenge-platform/\S+orchestrate/jsch/v3""", resp.text, re.M | re.S)
or re.search(r"window\._cf_chl_ctx\s*=", resp.text, re.M | re.S)
or re.search(r'<form[^>]*id="challenge-form"[^>]*action="[^"]*__cf_chl_rt_tk=', resp.text, re.M | re.S)
or
# NEW: Modern challenge format detection
(
"Just a moment" in resp.text
and "/challenge-platform/" in resp.text
and re.search(r"window\._cf_chl_opt\s*=", resp.text)
and resp.headers.get("cf-mitigated") == "challenge"
)
)
)
except AttributeError:
pass
return False
2. JavaScript Object Parser
Many modern challenges use JavaScript object notation instead of JSON:
def parse_js_object_manually(self, js_obj_str):
"""Manually parse JavaScript object when JSON parsing fails"""
try:
data = {}
patterns = [
(r"cvId:\s*'([^']+)'", "cvId"),
(r'cZone:\s*"([^"]+)"', "cZone"),
(r"cType:\s*'([^']+)'", "cType"),
(r"cRay:\s*'([^']+)'", "cRay"),
(r'cH:\s*"([^"]+)"', "cH"),
(r'cUPMDTk:\s*"([^"]+)"', "cUPMDTk"),
(r"cFPWv:\s*'([^']+)'", "cFPWv"),
(r"cITimeS:\s*'([^']+)'", "cITimeS"),
]
for pattern, key in patterns:
match = re.search(pattern, js_obj_str)
if match:
data[key] = match.group(1)
return data
except Exception:
return {}
3. Complex URL Construction
Support for modern challenge URL patterns:
# Extract complex identifier from __CF$cv$params
cf_params_match = re.search(
r'__CF\$cv\$params\s*=\s*\{.*?r:\s*[\'"]([^\'"]+)[\'"]',
resp.text,
re.DOTALL,
)
if cf_params_match and "cRay" in opt_data:
r_param = cf_params_match.group(1)
ray_id = opt_data["cRay"]
form_action = f"/cdn-cgi/challenge-platform/h/b/jsd/r/{r_param}/{ray_id}"
4. JSON Payload Support
Handle modern JSON payloads instead of form data:
# For modern challenges, send JSON payload
if challenge_data.get("is_modern", False):
payload_data = {
"chctx": opt_data,
"answer": challenge_answer,
}
# Add specific fields from opt_data
for key in ["cvId", "cRay", "cType", "cZone", "cUPMDTk", "cFPWv", "cITimeS"]:
if key in opt_data:
payload_data[key] = opt_data[key]
# Use JSON content type
headers.update({
"Content-Type": "text/plain;charset=UTF-8",
"Accept": "*/*",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
})
return json.dumps(payload_data)
Implementation Reference
My complete implementation is available in the research branch:
- Enhanced V3 Handler:
temp_cloudscraper/cloudscraper/cloudflare_v3_patched.py
(495 lines) - Integration Code:
src/pro_sports_transactions/search.py
(CloudscraperConfig class) - Test Results:
docs/cloudscraper/cloudscraper-testing.md
(12 systematic tests) - Technical Analysis:
docs/cloudscraper/TECHNICAL_ANALYSIS.md
Test Results
I tested across multiple configurations with cloudscraper v3.0.0:
Configuration | Base Version | Result | Key Finding |
---|---|---|---|
Standard v3.0.0 | Tag release | TIMEOUT | Basic implementation insufficient |
PR #295 (Session) | Pull request | FAIL | Session improvements help but insufficient |
PR #283 (Headers) | Pull request | TIMEOUT | Additional headers don't solve core issue |
Custom V3 Patched | My enhancement | FAIL | Successfully detects modern challenges but blocked by TLS fingerprinting |
Despite these improvements, advanced sites like prosportstransactions.com still block requests due to TLS fingerprinting - the fundamental limitation that requires browser-level solutions. However, these enhancements would improve cloudscraper's compatibility with sites using modern challenge formats.
Key Discovery: My patched V3 handler successfully detected and parsed the modern challenge format, proving the enhancement works - the failure occurs at the TLS transport layer, not the challenge handling layer.
Benefits to Cloudscraper
- Broader Compatibility: Support sites using modern challenge format
- Future-Proofing: Handle evolving Cloudflare challenge patterns
- Better Parsing: Robust JavaScript object handling
- Modern Standards: JSON payload support for current challenges
Implementation Priority
High Priority: Challenge detection improvements (#1)
Medium Priority: JavaScript object parser (#2) and URL construction (#3)
Low Priority: JSON payload support (#4) - fewer sites use this format
Notes
- Implementation maintains backward compatibility with existing challenges
- TLS fingerprinting remains the primary obstacle for advanced protection sites - this is the core unsolved problem
Collaboration
While I haven't solved the TLS fingerprinting challenge, these improvements would benefit cloudscraper's compatibility with modern challenge formats. For my specific use case with prosportstransactions.com, I'll be exploring Playwright and curl_cffi to see if I can bypass the current TLS fingerprinting obstacle. If there's interest on addressing TLS fingerprinting within cloudscraper itself, I'd be happy to submit a PR for the above work.
Research Branch: https://github.com/rsforbes/pro_sports_transactions/tree/feature/cloudflare-bypass-research
Documentation: docs/cloudscraper/
directory contains complete technical analysis and test results