feat: add OCR text filtering and content hiding for API endpoints #1816
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
name: pull request
about: submit changes to the project
title: "[pr] feat: add OCR text filtering and content hiding for API endpoints"
labels: 'security, enhancement'
assignees: ''
description
Adds comprehensive content filtering system to prevent sensitive information in OCR text from being exposed through API endpoints. This addresses a significant privacy gap where user data like passwords, API
keys, credit card numbers, and other sensitive information could be inadvertently leaked through screenpipe's API responses.
The Problem:
/search
,/frames/:frame_id
, and/stream/frames
endpoints could expose passwords, API keys, SSNs, etc.The Solution:
hide_window_keywords
"[REDACTED]"
insteadTechnical Implementation:
create_time_series_frame()
function to apply filtering before sending dataAppState
should_hide_content()
function for keyword detectionAdditional Improvements:
test_extract_frames_and_ocr
performance bottleneck (4+ minutes → 0.30s)Security Keywords Examples:
related issue: N/A (proactive security enhancement)
how to test
cargo test -p screenpipe-server --test content_hiding_test
cargo test -p screenpipe-server --test video_utils_test test_extract_frames_and_ocr
- Start screenpipe server with configuration: hide_window_keywords: ["password", "api key", "credit card"]
- Connect to websocket endpoint /stream/frames
- Send OCR text containing "Enter your password: secret123"
- Verify response shows "[REDACTED]" instead of actual sensitive text
- Configure keywords as above
- Make search request: GET /search?q=password&limit=10
- Verify search results with sensitive keywords show filtered content
- Request specific frame: GET /frames/{frame_id}
- Verify OCR text containing keywords is properly redacted
Expected Test Results:
manual cli testing:
Screenshots/Evidence: Test suite output showing 11/11 tests passing demonstrates the content filtering system works reliably across all scenarios, saving maintainer review time by providing comprehensive


verification.
Files Modified:
🔐 This PR transforms screenpipe from potentially leaking sensitive user data to providing robust privacy protection across all API endpoints. Critical for user trust and data security.