Add missing CUDA_ARCH guard for `__nanosleep` in example #2558

Flamefire · 2025-08-11T09:02:32Z

Without this (similary done for other instances) compilation on pre-7.0 CCCs fill fail as the function is not defined.

hwu36 · 2025-08-13T01:40:57Z

hwu36 · 2025-08-13T02:02:15Z

examples/common/dist_gemm_helpers.h


 __global__ void delay_kernel(const AtomicBoolean* atomic_flag_ptr) {
  while (not atomic_flag_ptr->load()) {
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700


is it okay if the while loop body is empty here?

I think that should be fine; example 41 implements the spin wait without the sleep instruction. nanosleep is in the ISA for sm70 and later only.

This same pattern is in several files, and other files don't have guards when they probably should. Can you move this code into a helper function cutlass::nanosleep_if_supported() and replace all direct __nanosleep() calls with a call to that new function?

#2567 adds guards for this header. But I guess the guards could be placed in this header to avoid the duplication at the includers

nanosleep_if_supported is a good idea in any case.

alihassanijr · 2025-08-13T02:34:57Z

Thanks for your contribution.

This header file is specific to DistGEMM, and DistGEMM is only implemented for Hopper and Blackwell, so I don't see why this kernel needs to be compiled for older architectures?

Flamefire · 2025-08-13T07:54:18Z

The CMake configure process doesn't distinguish SMs required for examples so compiling the examples with a specific arch set will fail the whole build. See also #2559 (comment)

This fix is small enough to get it a step further at least.

alihassanijr · 2025-08-13T13:26:42Z

I think I see the issue. You're probably running just make after setting up cmake. That's not recommended, because it will try to build everything, even the targets that are incompatible, and while it can succeed, it will just take up a lot of time.

We can go ahead and merge this, but I'm not sure if this is the only instance that needs fixing for avoiding all such issues when building all targets.

alihassanijr · 2025-08-13T17:34:21Z

I recommend merging #2567 first -- including this header file should be guarded by the example, and that is addressed in that PR.

github-actions · 2025-09-12T18:07:21Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Add missing CUDA_ARCH guard for __nanosleep in example

62b9a6c

Without this (similary done for other instances) compilation on pre-7.0 CCCs fill fail as the function is not defined.

hwu36 reviewed Aug 13, 2025

View reviewed changes

alihassanijr mentioned this pull request Aug 13, 2025

Fix arch guards in a few examples #2567

Open

Flamefire mentioned this pull request Aug 13, 2025

[BUG] 88_hopper_fmha_fp8 example fails to compile on some CUDA archs #2559

Open

d-k-b approved these changes Aug 13, 2025

View reviewed changes

github-actions bot added the inactive-30d label Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add missing CUDA_ARCH guard for `__nanosleep` in example #2558

Add missing CUDA_ARCH guard for `__nanosleep` in example #2558

Uh oh!

Flamefire commented Aug 11, 2025

Uh oh!

hwu36 commented Aug 13, 2025

Uh oh!

hwu36 Aug 13, 2025

Uh oh!

alihassanijr Aug 13, 2025

Uh oh!

d-k-b Sep 15, 2025

Uh oh!

Flamefire Sep 16, 2025

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

Flamefire commented Aug 13, 2025

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

Uh oh!

Add missing CUDA_ARCH guard for __nanosleep in example #2558

Are you sure you want to change the base?

Add missing CUDA_ARCH guard for __nanosleep in example #2558

Uh oh!

Conversation

Flamefire commented Aug 11, 2025

Uh oh!

hwu36 commented Aug 13, 2025

Uh oh!

hwu36 Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

alihassanijr Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

d-k-b Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Flamefire Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

Flamefire commented Aug 13, 2025

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

Uh oh!

Add missing CUDA_ARCH guard for `__nanosleep` in example #2558

Add missing CUDA_ARCH guard for `__nanosleep` in example #2558