Skip to content

Conversation

perfectra1n
Copy link
Member

@perfectra1n perfectra1n commented Sep 2, 2025

this definitely isn't being merged, but keeping it open for Elian. Here's what the annoying AI said the fix was for below lol:

CKEditor Generic Type Syntax Fix

The Problem

We discovered a critical bug where CKEditor would crash when switching from read-only to edit mode on notes containing programming language generic type syntax. Here's what was happening:

The Crash

When a note contained code like this:

HashMap<String, PromptTemplate> templates = new HashMap<>();
List<String, Value> items = getItems();

Or Rust code like:

fn process(error: Box<dyn Error>) -> Result<()> {
    // ...
}

CKEditor would crash with this error:

InvalidCharacterError: Failed to execute 'createElement' on 'Document': 
The tag name provided ('string,') is not a valid name.

Why It Crashed

CKEditor was trying to parse the generic type syntax as HTML tags! When it saw <String, PromptTemplate>, it thought:

  • "Oh, this must be an HTML tag called String,"
  • "Let me create that element..."
  • 💥 Crash! Because String, isn't a valid HTML tag name (it has a comma!)

The same happened with:

  • Box<dyn → tried to create a box<dyn element
  • <OpenAIClient> → tried to create an openaiclient element

Why Existing Solutions Didn't Work

Server-side HTML Sanitizer

Trilium already has an HTML sanitizer that uses an allowedHtmlTags setting. You might think, "Why not just use that?" Here's why that wouldn't work:

The sanitizer removes non-allowed tags entirely:

// What the sanitizer does:
sanitizeHtml("Here is <String, Type> code", { allowedTags: ['div', 'p'] })
// Result: "Here is  code"  
// ❌ The generic type syntax is DELETED!

We don't want to delete the code - we want to preserve it as text!

The Key Insight

  • Sanitizing: Removes unwanted HTML for security (data loss)
  • Escaping: Converts angle brackets to HTML entities (data preserved)

We need escaping, not sanitizing!

Our Solution

We implemented a smart escaping mechanism that:

  1. Identifies what's actually HTML using the user's allowedHtmlTags setting
  2. Escapes everything else so it displays as text instead of being parsed as HTML

Here's how it works:

private escapeGenericTypeSyntax(content: string): string {
    // Get allowed HTML tags from user settings
    const allowedTags = JSON.parse(options.get("allowedHtmlTags"));
    const htmlTags = new Set(allowedTags.map(tag => tag.toLowerCase()));
    
    // Step 1: Protect real HTML tags with placeholders
    const htmlProtectionMap = new Map<string, string>();
    let protectedContent = content.replace(/<\/?([a-zA-Z][a-zA-Z0-9-]*)\b[^>]*>/gi, (match, tagName) => {
        if (htmlTags.has(tagName.toLowerCase())) {
            const placeholder = `__HTML_PROTECTED_${counter++}__`;
            htmlProtectionMap.set(placeholder, match);
            return placeholder;
        }
        return match;
    });
    
    // Step 2: Escape ALL remaining angle brackets (these are code, not HTML!)
    protectedContent = protectedContent
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;');
    
    // Step 3: Restore the real HTML tags
    htmlProtectionMap.forEach((original, placeholder) => {
        protectedContent = protectedContent.replace(placeholder, original);
    });
    
    return protectedContent;
}

Example Transformation

Input:

<p>Here's a Java class:</p>
<pre>
HashMap<String, PromptTemplate> templates;
Box<dyn Error> error;
</pre>
<div>More content</div>

After Step 1 (Protect HTML):

__HTML_0__Here's a Java class:__HTML_1__
__HTML_2__
HashMap<String, PromptTemplate> templates;
Box<dyn Error> error;
__HTML_3__
__HTML_4__More content__HTML_5__

After Step 2 (Escape remaining brackets):

__HTML_0__Here's a Java class:__HTML_1__
__HTML_2__
HashMap&lt;String, PromptTemplate&gt; templates;
Box&lt;dyn Error&gt; error;
__HTML_3__
__HTML_4__More content__HTML_5__

Final Output:

<p>Here's a Java class:</p>
<pre>
HashMap&lt;String, PromptTemplate&gt; templates;
Box&lt;dyn Error&gt; error;
</pre>
<div>More content</div>

CKEditor receives this and:

  • ✅ Parses <p>, <pre>, <div> as HTML tags
  • ✅ Displays &lt;String, PromptTemplate&gt; as visible text: <String, PromptTemplate>
  • ✅ No crash!

Why Our Fix is Correct

  1. Preserves Data: Unlike sanitization which deletes unknown tags, escaping preserves the code content
  2. Respects User Settings: Uses the allowedHtmlTags option so users can customize what's treated as HTML
  3. No False Positives: Only escapes things that aren't in the allowed tags list
  4. Bidirectional: We unescape when saving, so the database content remains unchanged
  5. Fail-Safe: Has fallback error handling if escaping fails

Configuration

Users can customize which tags are treated as HTML through:
Settings → Other → Allowed HTML Tags

The default list includes common HTML tags (div, span, p, etc.) and Trilium-specific elements (includenote).

Testing

The fix handles all these patterns correctly:

  • ✅ Java/C#: HashMap<String, List<Item>>
  • ✅ TypeScript: Record<string, unknown>
  • ✅ Rust: Box<dyn Error + Send>
  • ✅ C++: std::vector<int>
  • ✅ Incomplete: <String, (missing closing bracket)
  • ✅ Mixed: HTML and code in the same note

Summary

The bug occurred because CKEditor tried to parse programming language generic syntax as HTML tags. Our fix intelligently escapes non-HTML angle brackets while preserving real HTML tags, preventing crashes while maintaining all content. The solution respects user preferences through the configurable allowedHtmlTags setting and ensures no data loss occurs.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant