Skip to content

Conversation

Flamefire
Copy link

Without this (similary done for other instances) compilation on pre-7.0 CCCs fill fail as the function is not defined.

Without this (similary done for other instances) compilation on pre-7.0 CCCs fill fail as the function is not defined.
@hwu36
Copy link
Collaborator

hwu36 commented Aug 13, 2025

@alihassanijr


__global__ void delay_kernel(const AtomicBoolean* atomic_flag_ptr) {
while (not atomic_flag_ptr->load()) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it okay if the while loop body is empty here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be fine; example 41 implements the spin wait without the sleep instruction. nanosleep is in the ISA for sm70 and later only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same pattern is in several files, and other files don't have guards when they probably should. Can you move this code into a helper function cutlass::nanosleep_if_supported() and replace all direct __nanosleep() calls with a call to that new function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2567 adds guards for this header. But I guess the guards could be placed in this header to avoid the duplication at the includers

nanosleep_if_supported is a good idea in any case.

@alihassanijr
Copy link
Contributor

Thanks for your contribution.

This header file is specific to DistGEMM, and DistGEMM is only implemented for Hopper and Blackwell, so I don't see why this kernel needs to be compiled for older architectures?

@Flamefire
Copy link
Author

The CMake configure process doesn't distinguish SMs required for examples so compiling the examples with a specific arch set will fail the whole build. See also #2559 (comment)

This fix is small enough to get it a step further at least.

@alihassanijr
Copy link
Contributor

I think I see the issue. You're probably running just make after setting up cmake. That's not recommended, because it will try to build everything, even the targets that are incompatible, and while it can succeed, it will just take up a lot of time.

We can go ahead and merge this, but I'm not sure if this is the only instance that needs fixing for avoiding all such issues when building all targets.

@alihassanijr
Copy link
Contributor

I recommend merging #2567 first -- including this header file should be guarded by the example, and that is addressed in that PR.

Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants