Rebase merge of AST-breaking changes #11065

kdy1 · 2025-09-06T01:42:02Z

Description:

changeset-bot · 2025-09-06T01:42:11Z

🦋 Changeset detected

Latest commit: 1db02a1

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

claude · 2025-09-06T01:42:28Z

Claude finished @kdy1's task —— View job

PR Review: Rebase merge of AST-breaking changes

I'll provide comprehensive feedback on this large PR focusing on code quality, potential bugs, performance, security, and test coverage.

Review Tasks:

Analyze AST changes and breaking modifications
Review Unicode handling improvements in lexer
Examine string literal processing changes
Check node ID additions to identifiers
Assess potential security implications
Evaluate test coverage for new functionality
Review performance impact of changes
Provide comprehensive feedback

🔍 Comprehensive Review Results

✅ Strengths

Robust Unicode Handling: The improvements to Unicode lone surrogate handling are well-implemented:
- Proper distinction between valid Unicode characters and lone surrogates
- Safe encoding using replacement character \u{FFFD}
- Comprehensive test coverage including edge cases like \uD800
AST Breaking Changes Are Well-Structured:
- NodeId addition to Ident provides proper node tracking
- Clean NodeId wrapper around u32 with sensible defaults
- Proper serialization support across all formats (serde, rkyv, arbitrary)
Backward Compatibility Measures:
- Default values maintain existing behavior where possible
- All string literal constructors default lone_surrogates: false
- Raw string preservation logic maintains performance for common cases

🔧 Code Quality Issues

Inconsistent Error Handling (crates/swc_ecma_codegen/src/lit.rs:56):
```
// Good: Checks lone_surrogates before using raw
&& !self.lone_surrogates
```
However, this pattern should be consistently applied everywhere raw strings are used.
Magic Numbers (crates/swc_common/src/node_id.rs:14):
```
pub const DUMMY: NodeId = NodeId(u32::MAX);
```
Consider adding documentation explaining why u32::MAX was chosen as the dummy value.

⚠️ Potential Security Concerns

Unicode Normalization: The lexer properly handles lone surrogates but doesn't address Unicode normalization attacks. However, this is likely outside the scope of this PR.
String Processing Safety: The new lone_surrogates field correctly prevents invalid Unicode from being passed through as raw strings, which is a security improvement.

🚀 Performance Considerations

Memory Impact: Each Ident now carries an additional 4 bytes (NodeId). With thousands of identifiers in large files, this could add up.
Codegen Performance: The lone surrogate checks add minimal overhead but are properly optimized with early returns.
CodSpeed Results: The PR shows no performance regression, which is excellent given the scope of changes.

🧪 Test Coverage Assessment

Excellent coverage observed:

Comprehensive Unicode test cases including edge cases (unicodeExtendedEscapesInStrings*)
Lone surrogate specific tests (lone_surrogates: true cases in lexer tests)
Both ES5 and ES6 target compatibility tests
JSX string handling with Unicode characters

🏗️ Breaking Change Impact

This is properly marked as a major version bump due to:

Ident struct changes requiring recompilation of all dependent code
String literal structure changes affecting AST consumers
New required fields in various constructors

📝 Minor Recommendations

Documentation: Consider adding examples of lone surrogate handling to the module docs.
Consistency: Ensure all Str::from() implementations consistently set lone_surrogates: false.
Error Messages: Consider improving error messages when lone surrogates are encountered to help developers understand the issue.

🎯 Overall Assessment: APPROVED ✅

This PR represents a significant and well-executed improvement to Unicode handling in SWC. The breaking changes are necessary and properly implemented, with excellent test coverage and no performance regressions. The code quality is high with only minor areas for improvement.

Key strengths:

Proper Unicode security handling
Comprehensive test suite
Clean API design
Performance-conscious implementation
Proper handling of edge cases

The changes successfully address Unicode lone surrogate handling while maintaining backward compatibility where possible.

codspeed-hq · 2025-09-06T02:00:58Z

CodSpeed Performance Report

Merging #11065 will degrade performances by 2.32%

_{Comparing dev/rust (1db02a1) with main (ceb0aff)¹}

Summary

❌ 2 regressions
✅ 138 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`es2015_destructuring`	347.9 µs	355.1 µs	-2.03%
❌	`es/target/es2017`	275.6 µs	282.1 µs	-2.32%

No successful run was found on main (d6e0a49) during the generation of this report, so ceb0aff was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

**Description:** Cherry pick e39a909 from #10739 because we are going to make a Wasm-breaking change anyway --------- Co-authored-by: magic-akari <[email protected]>

feat(es/ast): Add node id to Ident (#10973)

6dd6f9c

kdy1 added this to the Planned milestone Sep 6, 2025

kdy1 self-assigned this Sep 6, 2025

kdy1 and others added 2 commits September 6, 2025 05:03

refactor(es/ast): Cherry-pick #10763 (#11060)

a99a9ec

**Description:** Cherry pick e39a909 from #10739 because we are going to make a Wasm-breaking change anyway --------- Co-authored-by: magic-akari <[email protected]>

feat(es/ast): Reapply #10987 (#11066)

1db02a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rebase merge of AST-breaking changes #11065

Rebase merge of AST-breaking changes #11065

kdy1 commented Sep 6, 2025

Uh oh!

changeset-bot bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

claude bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Rebase merge of AST-breaking changes #11065

Are you sure you want to change the base?

Rebase merge of AST-breaking changes #11065

Conversation

kdy1 commented Sep 6, 2025

Uh oh!

changeset-bot bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

claude bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Rebase merge of AST-breaking changes

Review Tasks:

🔍 Comprehensive Review Results

✅ Strengths

🔧 Code Quality Issues

⚠️ Potential Security Concerns

🚀 Performance Considerations

🧪 Test Coverage Assessment

🏗️ Breaking Change Impact

📝 Minor Recommendations

🎯 Overall Assessment: APPROVED ✅

Uh oh!

codspeed-hq bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #11065 will degrade performances by 2.32%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

Uh oh!

changeset-bot bot commented Sep 6, 2025 •

edited

Loading

claude bot commented Sep 6, 2025 •

edited

Loading

codspeed-hq bot commented Sep 6, 2025 •

edited

Loading