Add MPS kernels #7643

qqaatw · 2023-05-30T16:30:05Z

Summary:

Prerequisite PR in PyTorch repository: [MPS] Prerequisite for MPS C++ extension pytorch#102483.
This PR adds nms, roi_align, roi_pool, ps_roi_align, ps_roi_pool and their corresponding backward kernels if any. Most implementations are inspired by the CUDA implementations.
All the kernel code is placed in mps_kernels.h for the ease of sharing helper functions and macros, as well as caching PSOs.
Atomic operations are used in the backward kernels of RoI functions. Since atomic_float is supported in Metal 3 (macOS Ventura, MSL specs, section 2.6) and later only, for systems with Metal 2.x, we implement a custom atomic addition function.
Apple GPUs natively support 64 bit signed and unsigned integer types, so we unify the integer types in the kernels into 64 bits. It might have performance implications when running kernels on AMD or Intel GPUs. (relevant discussion).
MPS does not support float64. Thus, the absolute tolerances of gradcheck in RoI backward tests are adjusted accordingly.

cc @NicolasHug @pmeier @albanD @kulinseth

pytorch-bot · 2023-05-30T16:30:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7643

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures

As of commit c825c53:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug · 2023-06-26T13:57:32Z

Hi @qqaatw , I saw this PR isn't in draft state anymore. Is this ready for review?

qqaatw · 2023-06-26T14:23:45Z

Hi @qqaatw , I saw this PR isn't in draft state anymore. Is this ready for review?

Hi @NicolasHug, yes, please.

There is an issue with f16 inputs for RoI ops, which doesn't have test coverage. Otherwise the added ops are tested.

NicolasHug

Thanks a lot @qqaatw. I gave a quick first glance at the tests and made some minor comments / suggestions, but this looks great overall.

As discussed offline with @albanD , we're OK to introduce these new MPS kernels in torchvision, with the shared understanding that the MPS-related support (typically bug reports and fixes) will be on the responsibility of the MPS team.

There is an issue with f16 inputs for RoI ops, which doesn't have test coverage. Otherwise the added ops are tested.

What's the issue? If float16 isn't supported for MPS that's OK, but maybe we should write a small test asserting the error message?

setup.py

test/conftest.py

NicolasHug · 2023-07-03T14:35:53Z

torchvision/csrc/ops/mps/roi_align_kernel.mm

+  int64_t w_stride = grad.stride(3);
+  int64_t output_size = grad.numel();
+
+  at::globalContext().alertNotDeterministic("roi_align_backward_kernel");


I'm curious, what makes this kernel and the other roi align / pool kernels non-deterministic?

For the CUDA kernels, it's the calls to atomicAdd, but I'm curious what the reason is here.

MPS kernels also make use of atomic addition which is either provided by the Metal library or the custom implementation depending on the Metal version (See the atomic_add_float function in mps_kernels.h).

I've added a note in the PR description. Hope it properly explains the non-determinism.

torchvision/ops/roi_align.py

test/test_ops.py

NicolasHug · 2023-07-03T14:55:44Z

test/test_ops.py

@@ -271,6 +277,8 @@ def test_jit_boxes_list(self):


 class TestPSRoIPool(RoIOpTester):
+    mps_backward_atol = 5e-2


@albanD , any thought regarding this atol value for gradcheck()?

For ref we typically use 1e-5 for CPU/CUDA, although we seem to be testing on float64 while the MPS tests are currently running on float32.

The gradcheck is a bit tricky here as we usually only run it in fp64 precision to get accurate results.
Unfortunately, MPS doesn't support fp64 so we can only resolve to comparing with CPU results or increasing the tolerance significantly.

test/test_ops.py

…MPS roi backward kernels

qqaatw

Thank you for reviewing @NicolasHug!

What's the issue? If float16 isn't supported for MPS that's OK, but maybe we should write a small test asserting the error message?

The issue is that the atomic operations on MPS do not support half, and the RoI backward kernels make use of atomic addition. Added checks to the RoI backward kernels. The forward kernels work fine!

test/test_ops.py

qqaatw · 2023-07-04T08:12:52Z

torchvision/csrc/ops/mps/roi_align_kernel.mm

+  int64_t w_stride = grad.stride(3);
+  int64_t output_size = grad.numel();
+
+  at::globalContext().alertNotDeterministic("roi_align_backward_kernel");


MPS kernels also make use of atomic addition which is either provided by the Metal library or the custom implementation depending on the Metal version (See the atomic_add_float function in mps_kernels.h).

I've added a note in the PR description. Hope it properly explains the non-determinism.

qqaatw · 2023-07-04T10:51:05Z

torchvision/ops/roi_align.py

@@ -158,12 +158,12 @@ def from_K(t):
    y = (
        from_K(roi_start_h)
        + ph[None, :, None] * from_K(bin_size_h)
-        + (iy[None, None, :] + 0.5) * from_K(bin_size_h / roi_bin_grid_h)
+        + (iy[None, None, :] + 0.5).to(input.dtype) * from_K(bin_size_h / roi_bin_grid_h)


0.5 is by default f32, casting to the input dtype.

qqaatw · 2023-07-04T10:51:37Z

torchvision/ops/roi_align.py

    )  # [K, PH, IY]
    x = (
        from_K(roi_start_w)
        + pw[None, :, None] * from_K(bin_size_w)
-        + (ix[None, None, :] + 0.5) * from_K(bin_size_w / roi_bin_grid_w)
+        + (ix[None, None, :] + 0.5).to(input.dtype) * from_K(bin_size_w / roi_bin_grid_w)


Same as above.

qqaatw · 2023-07-17T14:31:13Z

Hi @NicolasHug, sorry for the delayed update. I've applied all the suggestions.

qqaatw · 2023-07-31T15:56:01Z

Gently pinging @NicolasHug.

NicolasHug · 2023-07-31T15:57:35Z

Sorry for the delay @qqaatw . I'll provide another round tomorrow

NicolasHug

Thanks a lot @qqaatw , I took a last look at the tests and this LGTM.

@albanD was there anything you wanted to check before merge this?

kulinseth

Looks great , thanks @qqaatw

NicolasHug · 2023-08-01T14:52:36Z

Thanks @qqaatw !!

github-actions · 2023-08-01T14:52:45Z

Hey @NicolasHug!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Reviewed By: matteobettini Differential Revision: D48642285 fbshipit-source-id: 00534d4080565eb66ed6b2dbb8416f8d7526687e Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <nh.nicolas.hug@gmail.com>

facebook-github-bot added the cla signed label May 30, 2023

qqaatw closed this May 30, 2023

qqaatw reopened this May 30, 2023

qqaatw changed the title ~~Add MPS kernels~~ [Draft] Add MPS kernels May 30, 2023

qqaatw marked this pull request as draft May 30, 2023 16:33

qqaatw force-pushed the add_mps_kernels branch from 51014fe to 60e2990 Compare May 30, 2023 16:47

qqaatw mentioned this pull request May 30, 2023

[MPS] Prerequisite for MPS C++ extension pytorch/pytorch#102483

Closed

qqaatw force-pushed the add_mps_kernels branch 2 times, most recently from dd5f42a to 6f32285 Compare June 13, 2023 14:42

qqaatw mentioned this pull request Jun 19, 2023

Add objc clang format #7677

Merged

qqaatw added 18 commits June 20, 2023 17:00

Draft

da9c2de

NMS f32

7f0d4ce

roi_align fw

c7c43dc

roi_align bw (failed)

ccde29c

roi_pool fw

c930e54

roi_pool bw (failed prec)

3305cc1

ps_roi_align fw

0f8d2c3

ps_roi_align bw (failed prec)

e157c7c

Several improvements

160d5b5

ps_roi_pool fw

40ea525

ps_roi_pool bw

2c20036

Rename kernels header

a427c2a

Add atol to RoI backward tests

8036dc2

mps kernels formatting

0ae9124

binaryPSO -> visionPSO

1d21cfc

Formatting

3018b25

Testing

d609da4

Rename cpu_and_gpu to cpu_and_cuda

256bd56

qqaatw force-pushed the add_mps_kernels branch from 143d86c to 256bd56 Compare June 20, 2023 10:09

Merge branch 'main' into add_mps_kernels

24109d4

NicolasHug reviewed Jul 3, 2023

View reviewed changes

qqaatw added 2 commits July 4, 2023 18:39

Apply suggestions from code review

efbb52e

Test more dtypes for roi forward functions and assert half inputs in …

b36cafa

…MPS roi backward kernels

qqaatw commented Jul 4, 2023

View reviewed changes

qqaatw added 4 commits July 7, 2023 16:07

Add mps error inputs check

fad54f6

Clean up headers

66a00fc

Fix dtype parameters

70f3906

parameterize nms gpu test

b1cf619

qqaatw force-pushed the add_mps_kernels branch from 3b15705 to b1cf619 Compare July 17, 2023 11:13

qqaatw requested a review from NicolasHug July 17, 2023 14:13

ligaz mentioned this pull request Jul 20, 2023

General MPS op coverage tracking issue pytorch/pytorch#77764

Open

NicolasHug and others added 2 commits August 1, 2023 12:20

Merge branch 'main' into add_mps_kernels

3f82ee4

Allow to skip MPS tests internally on non-MPS machines

108bc15

NicolasHug approved these changes Aug 1, 2023

View reviewed changes

kulinseth reviewed Aug 1, 2023

View reviewed changes

Merge branch 'main' into add_mps_kernels

c825c53

NicolasHug merged commit 16d62e3 into pytorch:main Aug 1, 2023
49 of 60 checks passed

NicolasHug added module: ops new feature labels Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPS kernels #7643

Add MPS kernels #7643

qqaatw commented May 30, 2023 •

edited

Loading

pytorch-bot bot commented May 30, 2023 •

edited

Loading

NicolasHug commented Jun 26, 2023

qqaatw commented Jun 26, 2023

NicolasHug left a comment

NicolasHug Jul 3, 2023

qqaatw Jul 4, 2023

NicolasHug Jul 3, 2023

albanD Jul 5, 2023

qqaatw left a comment •

edited

Loading

qqaatw Jul 4, 2023

qqaatw Jul 4, 2023

qqaatw Jul 4, 2023

qqaatw commented Jul 17, 2023

qqaatw commented Jul 31, 2023

NicolasHug commented Jul 31, 2023

NicolasHug left a comment

kulinseth left a comment

NicolasHug commented Aug 1, 2023

github-actions bot commented Aug 1, 2023

		@@ -271,6 +277,8 @@ def test_jit_boxes_list(self):


		class TestPSRoIPool(RoIOpTester):
		mps_backward_atol = 5e-2

Add MPS kernels #7643

Add MPS kernels #7643

Conversation

qqaatw commented May 30, 2023 • edited Loading

pytorch-bot bot commented May 30, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7643

❌ 11 New Failures

NicolasHug commented Jun 26, 2023

qqaatw commented Jun 26, 2023

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Jul 3, 2023

Choose a reason for hiding this comment

qqaatw Jul 4, 2023

Choose a reason for hiding this comment

NicolasHug Jul 3, 2023

Choose a reason for hiding this comment

albanD Jul 5, 2023

Choose a reason for hiding this comment

qqaatw left a comment • edited Loading

Choose a reason for hiding this comment

qqaatw Jul 4, 2023

Choose a reason for hiding this comment

qqaatw Jul 4, 2023

Choose a reason for hiding this comment

qqaatw Jul 4, 2023

Choose a reason for hiding this comment

qqaatw commented Jul 17, 2023

qqaatw commented Jul 31, 2023

NicolasHug commented Jul 31, 2023

NicolasHug left a comment

Choose a reason for hiding this comment

kulinseth left a comment

Choose a reason for hiding this comment

NicolasHug commented Aug 1, 2023

github-actions bot commented Aug 1, 2023

qqaatw commented May 30, 2023 •

edited

Loading

pytorch-bot bot commented May 30, 2023 •

edited

Loading

qqaatw left a comment •

edited

Loading