-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add missing CUDA_ARCH guard for __nanosleep
in example
#2558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Without this (similary done for other instances) compilation on pre-7.0 CCCs fill fail as the function is not defined.
|
||
__global__ void delay_kernel(const AtomicBoolean* atomic_flag_ptr) { | ||
while (not atomic_flag_ptr->load()) { | ||
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it okay if the while loop body is empty here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be fine; example 41 implements the spin wait without the sleep instruction. nanosleep
is in the ISA for sm70 and later only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This same pattern is in several files, and other files don't have guards when they probably should. Can you move this code into a helper function cutlass::nanosleep_if_supported()
and replace all direct __nanosleep()
calls with a call to that new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2567 adds guards for this header. But I guess the guards could be placed in this header to avoid the duplication at the includers
nanosleep_if_supported
is a good idea in any case.
Thanks for your contribution. This header file is specific to DistGEMM, and DistGEMM is only implemented for Hopper and Blackwell, so I don't see why this kernel needs to be compiled for older architectures? |
The CMake configure process doesn't distinguish SMs required for examples so compiling the examples with a specific arch set will fail the whole build. See also #2559 (comment) This fix is small enough to get it a step further at least. |
I think I see the issue. You're probably running just We can go ahead and merge this, but I'm not sure if this is the only instance that needs fixing for avoiding all such issues when building all targets. |
I recommend merging #2567 first -- including this header file should be guarded by the example, and that is addressed in that PR. |
This PR has been labeled |
Without this (similary done for other instances) compilation on pre-7.0 CCCs fill fail as the function is not defined.