-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[GPU] Fix accuracy degradation issue for hbonet0.5/hbonet1.0/nanodet-m-1.5x-416 models caused by eltwise kernels #31140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Fix accuracy degradation issue for hbonet0.5/hbonet1.0/nanodet-m-1.5x-416 models caused by eltwise kernels #31140
Conversation
The added b_fs_yx_fsv16 & fp16 onednn concatenation testcase can be passed with older onednn gpu commit e7d51221ff8aa4698c4dd63fffc136ce7522ef62, but will be failed with new onednn gpu commit a42b47ff2cb81df552887dd4a3575f964386b25e (which is introduced into OpenVINO from d7f0f34). Note the onednn gpu commit change will not affect ocl concatenation, and this testcase can be passed when setting ocl implementation for concatenation. |
Note: CI test will be passed only when onednn concatenation issue is fixed.
|
src/plugins/intel_gpu/tests/unit/test_cases/concatenation_gpu_test.cpp
Outdated
Show resolved
Hide resolved
bbf21eb
to
91cb174
Compare
The CI tests will be passed after onednn PR 3630 merged to master. |
362986c
to
950cc63
Compare
8ea6458
to
a2a3a7b
Compare
93dbeb0
to
76a2721
Compare
|
e784eff
to
508feb8
Compare
@wilson-seok I changed the behavior to only zero-pad blocked format memory when with eltwise_mode::ASSIGN now, and updated the PR descriptions. The performance test shown that the performance degradation for eltwise_blocked_opt is not big, but for generic_eltwise_ref it is very large.
|
c4eebc0
to
9287634
Compare
Further optimized the performance for generic_eltwise_ref kernel using multiple threads for zero-padding, and its performance degradation (eltwise kernel itself, not E2E test) dropped from about 250% ~ 290% (difference shapes) to about 150% ~ 100%.
|
src/plugins/intel_gpu/src/kernel_selector/kernels/eltwise/eltwise_kernel_ref.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
…crop testcase Signed-off-by: yuan.xiong <[email protected]>
…rmat output memory Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
…ise_ref kernel Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
Signed-off-by: yuan.xiong <[email protected]>
ab46403
to
e6ab99a
Compare
The performance test result for models with benchmark_app shows slightly performance drop with this PR:
|
997b5c4
…m-1.5x-416 models caused by eltwise kernels (openvinotoolkit#31140) ### Details - Fix accuracy degradation issue for hbonet0.5/hbonet1.0/nanodet-m-1.5x-416 models caused by none-zeros values existed in the blocked format (b_fs_yx_fsv16/b_fs_yx_fsv32) padded memory area in eltwise kernels. ### Description of the issue #### Symptom Onednn Concatenation got different output values before/after commit d7f0f34 when with blocked format memory. #### Root cause - eltwise_blocked_opt kernel will copy extra values from input (to pad features to 16/32 block) to output for the blocked format (fsv16/fsv32) padded memory area, instead of using zeros. - Onednn Concatenation requires blocked format padded memory area to be padded with zeros [link](https://uxlfoundation.github.io/oneDNN/dev_guide_understanding_memory_formats.html#what-if-channels-are-not-multiples-of-8-or-16). When the Concatenation's input comes from eltwise_blocked_opt or generic_eltwise_ref kernel's output and its blocked format padded memory area is not zero-padded, the onednn concatenation will have wrong output values, and cause the model's accuracy degradation. - When running model inference for the second time, crop's output memory is shared with other primitives and thus may have non-zero values in the padded memory area, and since it will not be filled with zeros in the eltwise_blocked_opt or generic_eltwise_ref kernel by default, the onednn concatenation after crop then will get wrong outputs. #### How to fix it - Skip copying extra values from input in eltwise_blocked_opt kernel. - Fill zeros to crop if has blocked format memory and shaved with others, and followed by onednn concatenation. #### The code and line that caused this issue https://github.com/openvinotoolkit/openvino/blob/0dcc5adfd89dc9151f0c4448e346d0ec030f70e6/src/plugins/intel_gpu/src/kernel_selector/cl_kernels/eltwise_blocked_opt.cl#L64 #### Reproduction step and snapshot - For hbonet-1.0 FP16-INT8 `python accuracy_check.py --target_framework openvino --target_devices GPU --config ./hbonet-1.0-onnx.yml --target_tags FP16-INT8 --models ./local_models --source ./datasets --annotations ./annotations --definitions ./dataset_definitions.yml --undefined_shapes_resolving_policy default --sub_evaluation true --use_new_api True` #### Problematic graph - eltwise_blocked_opt and generic_eltwise_ref kernels used in hbonot1.0 model <img width="1590" height="588" alt="image" src="https://github.com/user-attachments/assets/f558ca3c-bc7f-4a2e-a669-03606af54e7a" /> #### Checklist - [x] Is it a proper fix? (not a workaround) - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? No testcase can cover this issue, so added a new crop_gpu testcase "basic_in1x176x52x52_crop_b_fs_yx_fsv16". ### Tickets: - CVS-169075 --------- Signed-off-by: yuan.xiong <[email protected]>
nullptr error in node sleelcted_impl is null. Related regression: openvinotoolkit#31140 ### Tickets: - *173291* Signed-off-by: hyunback <[email protected]>
### Details - Fix performance regression introduced by PR #31140. ### Description of the issue #### Symptom manual_yolo11 model performance dropped from 362.4 FPS to 318.25 FPS on GPU. #### Root cause - previous PR will force all crop primitives followed by onednn concatenation to clean its GPU memory by filling with zeros if it is blocked format. - manual_yolo11 model also has many such crop primitives, so its performance will drop. #### How to fix it - Found that filling GPU memory with zeros can be skipped if the crop primitive uses eltwise_blocked_opt kernel and is not dynamic, so just skip it by checking crop primitive's kernel name. #### The code and line that caused this issue https://github.com/openvinotoolkit/openvino/blob/453c8ee337f4a1cadebb66551bb40d6a216c1001/src/plugins/intel_gpu/src/graph/primitive_inst.cpp#L2036 #### Reproduction step and snapshot - benchmark_app `benchmark_app -inference_only false -b 1 -t 60 -nireq 4 -d GPU.0 -hint none -nstreams 2 -m INT8/1/ov/optimized/manual_yolo11.xml` #### Problematic graph - crop primitive (eltwise_blocked_opt kernel) followed by onednn concatenation in manual_yolo11 <img width="570" height="616" alt="image" src="https://github.com/user-attachments/assets/c3c0b598-c147-4d23-940d-9a4ac9b4649e" /> #### Checklist - [x] Is it a proper fix? (not a workaround) - [ ] Did you include test case for this fix, if necessary? No need - [ ] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - CVS-173402 --------- Signed-off-by: yuan.xiong <[email protected]>
Details
Description of the issue
Symptom
Onednn Concatenation got different output values before/after commit d7f0f34 when with blocked format memory.
Root cause
How to fix it
The code and line that caused this issue
openvino/src/plugins/intel_gpu/src/kernel_selector/cl_kernels/eltwise_blocked_opt.cl
Line 64 in 0dcc5ad
Reproduction step and snapshot
python accuracy_check.py --target_framework openvino --target_devices GPU --config ./hbonet-1.0-onnx.yml --target_tags FP16-INT8 --models ./local_models --source ./datasets --annotations ./annotations --definitions ./dataset_definitions.yml --undefined_shapes_resolving_policy default --sub_evaluation true --use_new_api True
Problematic graph
Checklist
Tickets:
CVS-169075