Skip to content

Conversation

kg
Copy link
Member

@kg kg commented Jun 17, 2025

  • Wire up basic support for intrinsics in the CoreCLR interpreter
  • Add new DOTNET_InterpMode environment variable with 4 possible values (documented in the comments), where each higher value increases the amount of code we try to run in the interpreter, up to 'everything'

@Copilot Copilot AI review requested due to automatic review settings June 17, 2025 23:20
@kg kg requested review from BrzVlad and janvorli as code owners June 17, 2025 23:20
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 17, 2025
@kg
Copy link
Member Author

kg commented Jun 17, 2025

cc @davidwrighton @jkotas still not really sure what to do here, this is the best idea I could come up with.

Copilot

This comment was marked as outdated.

@jkotas
Copy link
Member

jkotas commented Jun 18, 2025

The (non-optimizing) interpreter JIT only needs to recognize "must expand" intrinsic. It should be a small fraction of the intrinsic recognized by the optimizing JIT. I do not think it is worth sharing the infrastructure to recognize intrinsics with the optimizing JIT. It would be just a bunch of extra useless code that does unnecessary work.

It would be useful to create list of "must expand" intrinsic that have to be implemented by the (non-optimizing) interpreter so that we do not need to chase one at a time. The canonical pattern for must expand intrinsics is trivial self-recursive call like this:

internal static unsafe MethodTable* GetMethodTable(object obj) => GetMethodTable(obj);

A large set of "must expand" instrinsic are hardware and SIMD intrinsics. We should ignore those for now and disable hardware and SIMD acceleration in fully interpreted mode for now.

@kg
Copy link
Member Author

kg commented Jun 18, 2025

The (non-optimizing) interpreter JIT only needs to recognize "must expand" intrinsic. It should be a small fraction of the intrinsic recognized by the optimizing JIT. I do not think it is worth sharing the infrastructure to recognize intrinsics with the optimizing JIT. It would be just a bunch of extra useless code that does unnecessary work.

It would be useful to create list of "must expand" intrinsic that have to be implemented by the (non-optimizing) interpreter so that we do not need to chase one at a time. The canonical pattern for must expand intrinsics is trivial self-recursive call like this:

internal static unsafe MethodTable* GetMethodTable(object obj) => GetMethodTable(obj);

A large set of "must expand" instrinsic are hardware and SIMD intrinsics. We should ignore those for now and disable hardware and SIMD acceleration in fully interpreted mode for now.

I think it's inevitable that we will eventually want to share the jit's NamedIntrinsic enum and name-to-enum mapping, the Mono interpreter ended up having to implement lots of intrinsics. IIRC the reasoning for this is a mix of 'it's too slow without it' and 'mixed mode requires consistency with the AOT'd code' but vlad would know for sure.

I agree that we're not ever going to implement all of them, and in some cases we can just make IsSupported/IsHardwareAccelerated return false. We could NIH a much simpler intrinsic lookup for the interpreter-only startup scenario, but it feels like we'd have to throw it out not too long from now.

@jkotas
Copy link
Member

jkotas commented Jun 18, 2025

I think it's inevitable that we will eventually want to share the jit's NamedIntrinsic enum and name-to-enum mapping

I am not sure about that.

Mono interpreter ended up having to implement lots of intrinsics. IIRC the reasoning for this is a mix of 'it's too slow without it'

I do not want us to turn the interpreter JIT into an optimizing JIT like it was in Mono. If we find that we need to build an optimizing interpreter JIT, it has to be built as RyuJIT backend so that we get all the RyuJIT optimizations for free (including all the intrinsic optimizations).

The primary goal of this project is to minimize our long-term maintenance costs. Building multiple optimizing code generators would go against that goal.

@kg
Copy link
Member Author

kg commented Jun 18, 2025

I've updated this PR with an implementation for a few intrinsics along with logic to bypass a bunch of ones that aren't must expand. The comments should also make it clear why must expand doesn't currently work (those intrinsics just get invoked via the JIT/R2R code since it's already there).

Suggestions on how to fix that so I can restrict it to must expand intrinsics would be super helpful.

@kg
Copy link
Member Author

kg commented Jun 19, 2025

I think the current rev of this is closer to what people wanted. It currently contains a workaround for a bug (somewhere) in our exception handling, and not all of the intrinsics are actually implemented, but I believe it identifies all of the must expand intrinsics from the list I created by auditing the source code manually.

I'm not sure what a good test for this feature looks like. It might not be possible to write one.

@kg
Copy link
Member Author

kg commented Jun 20, 2025

Interestingly, this seems to cause the arm64 lanes to crash and coredump:
https://github.com/dotnet/runtime/pull/116769/files#diff-8e77125f64f72db99da148e08ff6ef1cd6667681ec7ccac3662cb2808363f911R900

I wonder if that means we've got something wrong about our calling convention for native methods on arm64?

EDIT: I suppose it's also possible that calling this intrinsic as if it were a method is just broken, since it has special handling in the JIT:

    // The vast majority of "special" intrinsics are Vector64/Vector128 methods.
    // The only exception is ArmBase.Yield which should be treated differently.
    if (intrinsic == NI_ArmBase_Yield)
    {
        assert(sig->numArgs == 0);
        assert(JITtype2varType(sig->retType) == TYP_VOID);
        assert(simdSize == 0);

        return gtNewScalarHWIntrinsicNode(TYP_VOID, intrinsic);
    }

@janvorli
Copy link
Member

janvorli commented Jun 20, 2025

Interestingly, this seems to cause the arm64 lanes to crash and coredump

@kg The error message of the failure you've mentioned looks like this:

Unhandled exception. System.PlatformNotSupportedException: Operation is not supported on this platform.
at System.Runtime.Intrinsics.X86.X86Base.Pause()
at InterpreterTest.TestIntrinsics()
at InterpreterTest.RunInterpreterTests()
at InterpreterTest.Main(String[] args)

So it looks like that old EH issue. Thinking about the added nop, I have realized that the compiler actually removes nops IIRC, soe we may need to use e.g. INTOP_SAFEPOINT instead.

Edit: And I actually already understand why the EH behaves like that and I am figuring out the best way to fix that. It happens because it compensates the instruction pointer like it does for JITted/AOTed code (it subtracts a constant from it). That is done because software exceptions are compiled as calls in JIT/AOT and so the address we get for the first frame is after the call. For interpreter though, the address is at the INTOP_THROW opcode, so no compensation is needed. That means that it behaves like a hardware exception with JIT/AOT where we are also on the faulting instruction and thus we don't do any compensation.

@kg
Copy link
Member Author

kg commented Jun 20, 2025

Interestingly, this seems to cause the arm64 lanes to crash and coredump

@kg The error message of the failure you've mentioned looks like this:

Unhandled exception. System.PlatformNotSupportedException: Operation is not supported on this platform.
at System.Runtime.Intrinsics.X86.X86Base.Pause()
at InterpreterTest.TestIntrinsics()
at InterpreterTest.RunInterpreterTests()
at InterpreterTest.Main(String[] args)

So it looks like that old EH issue. Thinking about the added nop, I have realized that the compiler actually removes nops IIRC, soe we may need to use e.g. INTOP_SAFEPOINT instead.

Edit: And I actually already understand why the EH behaves like that and I am figuring out the best way to fix that. It happens because it compensates the instruction pointer like it does for JITted/AOTed code (it subtracts a constant from it). That is done because software exceptions are compiled as calls in JIT/AOT and so the address we get for the first frame is after the call. For interpreter though, the address is at the INTOP_THROW opcode, so no compensation is needed. That means that it behaves like a hardware exception with JIT/AOT where we are also on the faulting instruction and thus we don't do any compensation.

Thanks for digging in to the EH issue and spotting that in the log. Somehow I overlooked the exception traceback when I dug through the console output.

@kg
Copy link
Member Author

kg commented Jun 20, 2025

I've added and wired up DOTNET_InterpMode (not InterpreterMode, to match the other vars we have) with the 4 options as suggested by @jkotas. Right now none of the modes other than 0 are sufficient to successfully run the interpreter test suite, but the failures look roughly correct (missing opcodes, etc.)

I think once more of our open PRs land, mode 1 will probably work. Mode 2 and 3 are further off.

@kg
Copy link
Member Author

kg commented Jun 20, 2025

Something about how we define InterpConfig is intentionally very weird, it seems like they're intentionally gated on DEBUG instead of FEATURE_INTERPRETER. Is that a mistake? As a result I have to gate the mode config variable on DEBUG as well and have intrinsics not work in release (or checked?) configurations.

@kg kg requested a review from Copilot June 20, 2025 19:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds foundational support for interpreting hardware intrinsics in the CoreCLR interpreter and exposes a new DOTNET_InterpMode configuration to control interpreter usage. It also introduces a basic test to verify intrinsic behavior.

  • Added TestIntrinsics in the JIT interpreter tests to validate PNSE behavior for hardware intrinsics.
  • Implemented GetNamedIntrinsic and wiring for intrinsic dispatch (including INTOP_THROW_PNSE) in the interpreter.
  • Extended interpreter configuration (InterpMode) with four interpreter modes to control fallback behavior.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tests/JIT/interpreter/Interpreter.cs New TestIntrinsics test and import for System.Runtime.Intrinsics.
src/coreclr/vm/interpexec.cpp Pass CORINFO_ACCESS_ANY to GetMultiCallableAddrOfCode; add INTOP_THROW_PNSE handling.
src/coreclr/interpreter/intrinsics.h Header for GetNamedIntrinsic.
src/coreclr/interpreter/intrinsics.cpp Mapping from method metadata to NamedIntrinsic.
src/coreclr/interpreter/intops.def Defined INTOP_THROW_PNSE opcode.
src/coreclr/interpreter/interpconfigvalues.h Added InterpMode configuration and comments describing modes.
src/coreclr/interpreter/eeinterp.cpp Integrated InterpMode into interpreter decision logic.
src/coreclr/interpreter/compiler.h Declared EmitNamedIntrinsicCall.
src/coreclr/interpreter/compiler.cpp Implemented EmitNamedIntrinsicCall and hooked it into calls.
src/coreclr/interpreter/CMakeLists.txt Included intrinsics.cpp in the interpreter build.
Comments suppressed due to low confidence (2)

src/coreclr/interpreter/interpconfigvalues.h:30

  • The comment for mode 3 ends with a trailing comma and reads like it’s incomplete. Consider finishing the sentence or removing the trailing comma to clearly document all implied environment variable changes.
// 3: use interpreter for everything, the full interpreter-only mode, no fallbacks to R2R or JIT whatsoever. Implies DOTNET_ReadyToRun=0, DOTNET_EnableHWIntrinsic=0,

src/tests/JIT/interpreter/Interpreter.cs:891

  • [nitpick] The HACK and FIXME comments here signal temporary workarounds; consider clarifying next steps or removing commented-out code to keep the test implementation clean.
        // HACK: Try block that should always throw

@kg kg added the NO-REVIEW Experimental/testing PR, do NOT review it label Jul 8, 2025
@kg kg force-pushed the interp-intrinsics-1 branch from 23671c5 to 63b8767 Compare July 8, 2025 18:04
@kg
Copy link
Member Author

kg commented Jul 8, 2025

At present setting InterpMode to anything other than 0 causes a crash, recording it here for reference:

PS Z:\runtime> $env:DOTNET_InterpMode=1
PS Z:\runtime> Z:\runtime\artifacts\tests\coreclr\windows.x64.Debug\JIT\Interpreter\InterpreterTester\InterpreterTester.cmd
BEGIN EXECUTION
 "Z:\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe" -p "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true"  InterpreterTester.dll

Assert failure(PID 54356 [0x0000d454], Thread: 15356 [0x3bfc]): Consistency check failed: AV in clr at this callstack:
------
CORECLR! InterpreterStub + 0x84 (0x00007ff9`ca07bde4)
CORECLR! CallJittedMethodRetI8 + 0x17 (0x00007ff9`ca07c4e7)
CORECLR! InvokeCompiledMethod + 0x5AE (0x00007ff9`c9cb644e)
CORECLR! InterpExecMethod + 0x8320 (0x00007ff9`c9cb1de0)
CORECLR! ExecuteInterpretedMethod + 0x11B (0x00007ff9`c9a57f6b)
CORECLR! InterpreterStubRetI8 + 0x12 (0x00007ff9`ca07be32)
CORECLR! InterpreterStub + 0x87 (0x00007ff9`ca07bde7)
CORECLR! CallDescrWorkerInternal + 0x83 (0x00007ff9`ca07c6e3)
CORECLR! CallDescrWorkerWithHandler + 0x130 (0x00007ff9`c9b70cf0)
CORECLR! MethodDescCallSite::CallTargetWorker + 0xB8B (0x00007ff9`c9b7188b)
CORECLR! MethodDescCallSite::Call_RetArgSlot + 0x11F (0x00007ff9`c976891f)
CORECLR! RunMainInternal + 0x286 (0x00007ff9`c97757d6)
CORECLR! ``RunMain'::`21'::__Body::Run'::`5'::__Body::Run + 0x5A (0x00007ff9`c97751aa)
CORECLR! `RunMain'::`21'::__Body::Run + 0xA3 (0x00007ff9`c97752b3)
CORECLR! RunMain + 0x1B5 (0x00007ff9`c97754d5)
CORECLR! Assembly::ExecuteMainMethod + 0x542 (0x00007ff9`c976d6d2)
CORECLR! CorHost2::ExecuteAssembly + 0x555 (0x00007ff9`c984ac25)
CORECLR! coreclr_execute_assembly + 0x130 (0x00007ff9`ca0b5ce0)
CORERUN! run + 0x138A (0x00007ff6`659ee2ba)
-----
.AV on tid=0x3bfc (15356), cxr=000000CAE957A3C0, exr=000000CAE957A8B0

FAILED: false

CORECLR! CHECK::Trigger + 0x1F7 (0x00007ff9`c94d9097)
CORECLR! CLRVectoredExceptionHandlerPhase3 + 0x36C (0x00007ff9`c989417c)
CORECLR! CLRVectoredExceptionHandlerPhase2 + 0xA6 (0x00007ff9`c9893b16)
CORECLR! CLRVectoredExceptionHandler + 0x387 (0x00007ff9`c9893a47)
CORECLR! CLRVectoredExceptionHandlerShim + 0x21D (0x00007ff9`c98944cd)
NTDLL! RtlWow64GetCurrentCpuArea + 0x486 (0x00007ffa`b7d93e46)
NTDLL! RtlWow64GetCurrentCpuArea + 0x9E6 (0x00007ffa`b7d943a6)
NTDLL! KiUserExceptionDispatcher + 0x2E (0x00007ffa`b7e85b7e)
CORECLR! InterpreterStub + 0x84 (0x00007ff9`ca07bde4)
CORECLR! CallJittedMethodRetI8 + 0x17 (0x00007ff9`ca07c4e7)
    File: Z:\runtime\src\coreclr\vm\excep.cpp:6744
    Image: Z:\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe

Interpreted App returned -1073740286
System.Exception: Interpreted App failed execution
   at InterpreterTester.RunTests() in Z:\runtime\src\tests\JIT\interpreter\InterpreterTester.cs:line 30
   at __GeneratedMainWrapper.Main() in Z:\runtime\artifacts\tests\coreclr\obj\windows.x64.Debug\Managed\JIT\interpreter\InterpreterTester\XUnitWrapperGenerator\XUnitWrapperGenerator.XUnitWrapperGenerator\SimpleRunner.g.cs:line 7
Expected: 100
Actual: 101
END EXECUTION - FAILED
FAILED

@kg kg force-pushed the interp-intrinsics-1 branch from 63b8767 to 9505080 Compare July 8, 2025 20:59
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@kg kg removed NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Jul 8, 2025
kg and others added 7 commits July 8, 2025 17:48
Checkpoint

Extract named intrinsic lookup out of JIT Compiler class

Maybe fix zbb bit

Basic wire-up of interpreter intrinsics

Bypass more intrinsics

Don't assert on unimplemented intrinsic by default, just don't handle it

Add more intrinsics to the switch

Fix linux build

Revert changes to JIT side

Checkpoint intrinsics rework

Checkpoint

Change EH workaround to something with a smaller blast radius

Fix linux build

Checkpoint: Add DOTNET_InterpMode config var and pipe it through

Minor adjustments suggested offline

Address PR feedback + maybe fix CI build

Rearrange intrinsics test so it won't fail on ARM64 CI
@kg kg force-pushed the interp-intrinsics-1 branch from 6aa4b1b to 1db3566 Compare July 9, 2025 00:49
@kg
Copy link
Member Author

kg commented Jul 10, 2025

Windows x86 debug base64 seems to just actually be broken on main, that or we have multiple defective x86 CI machines. Not sure which is the most likely explanation, but 'checked coreclr windows x86 debug' keeps failing even if I rerun it in base64/xml stuff, which I haven't touched in this PR.

@kg kg merged commit 9650084 into dotnet:main Jul 10, 2025
101 of 103 checks passed
@janvorli janvorli mentioned this pull request Jul 10, 2025
65 tasks
@github-actions github-actions bot locked and limited conversation to collaborators Aug 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants