[RFC] Specification of quantized reduction operation #1664

sdasgup3 · 2023-06-30T23:09:17Z

From the discussion emerged in #1538 about the specification of quantized reduce op, we zeroed in to a proposal which need some structural changes in the op (adding additional blocks). This proposed RFC is to further explore the proposal in detail.

Please let me know your feedback.

rfcs/20230622-quantized-reduction.md

cormacmb · 2023-09-13T02:56:59Z

Detail: today Reduce has the following constraint:

(C3) 0 < size(inputs) = size(init_values) = size(results) = N.

Useful for the RFC to document out how this will change...

Hi Cormac IMO, the above constraint should be unaffected with the proposal as the the number of input operand, number of init operands and number of results should remain the same.

The current proposal allows operand or init value types to differ from that of reduce block arguments. In that light, the following constraint should get affected.
(C6) body has type tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>) where Ei = element_type(inputs[i]).
I have proposed the changes to this constraint under ### Revised specification of reduce op.

Please let me know if I am misinterpreting your comment.

sgtm

rfcs/20230622-quantized-reduction.md

…or reduce op

…proposal

GleasonK

Per the discussion on OpenXLA Discuss this RFC is now approved! Thanks to all who provided feedback on this design!

rfcs/20230622-quantized-reduction.md

From the discussion emerged in openxla#1538 about the specification of quantized `reduce` op, we zeroed in to a proposal which need some structural changes in the op (adding additional blocks). This proposed RFC is to further explore the proposal in detail. Please let me know your feedback. See also the [OpenXLA Discuss post](https://groups.google.com/a/openxla.org/g/openxla-discuss/c/iwE9is49SS4).

The PR implements the approved [RFC for reduce op](#1664) by proposing the specification related changes for `reduce`, `reduce_window` and `select_and_scatter` ops. In #1647, we talked about some of the other ops which will depend on the quantized specification of reduce op. Initially, I thought about creating separate PRs for them, but for the interest of time and the fact that their handling is going to be very similar to how `reduce` op is handled, I propose to include their PR in the current PR. Here are the additional ops (other than `reduce`, `reduce_window`, `select_and_scatter`) whose specification is added - Ops with explicit computation regions: `all_reduce`, `reduce_scatter`, `scatter`: They are handled similar to how `reduce` op is handled. - Ops w/o explicit computation region: `batch_norm_grad`, `batch_norm_training`: For these ops, the semantics of the operation implicitly does reduction with a custom computation function. As there is no explicit computation function in the IR, the proposal in the RFC cannot be applied. I propose (A) to handle these ops similar to how `batch_norm_inference` is handled with `dequant-op-quant` strategy, (B) we can revisit this op later if there is use case to do that implicit reduction using higher accumulation type. One implementation detail: The fact that `batch_norm_grad`, `batch_norm_training` ops returns multiple outputs and current `dequantize_op_quantize` returns a single output, is handled using a dedicated meta function for these two ops. Next steps: Once the PR is approved the plan is to propose the corresponding changes in the verifier/shape functions for these ops. **Only support promotion of reduction operands to reduction-block arguments** The specific question bought up in the discussion is: should we promote demotion as well? It seems that there aren't any use-cases for that. For example, consider the following cases of implicit conversion for reduction operations. The examples are shown using integer type but can be generalized to other types as well. | input element type | accumulator element type | result element type | use-cases | |:--:|:--:|:--:|:--| | i8| i8 |i8| Reduction using min/max| | i8| i32 |i32| Reduction using average/sum| | i8| i32| i8 | Demotion from accumulation type to result type. (No use-case)| | i8| i32| i64 | Extra promotion from accumulation type to result type. (No use-case) | | i32 | i8 | i8 | Demotion from input value type to accumulation type. (No use case) | With that in mind, we propose to develop the specification based on the scenario we have the use-case for (just for promotion: _none of the cases of demotion or extra promotion will be supported in the specification_.

…le semantics [RFC](openxla/stablehlo#1664), to MHLO. The CL does two things: 1. The upstream change openxla/stablehlo#1869 in StableHLO updates various API related to shape inference. MHLO shape inference functions uses those APIs. The CL fixes the invocation of those APIs in MHLO codebase. 2. There exists canonicalization passes like `group-reduction-dimensions` and `hlo-canonicalize-reduction` which create reduce operation using builder methods that calls type inference of reduce op with empty reduction region [example](https://github.com/openxla/xla/blob/a91877b9c9aa1edf307c5927782111b1a81cd81d/xla/mlir_hlo/mhlo/transforms/group_reduction_dimensions/group_reduction_dimensions.cc#L228). This is problematic as, with the [change](openxla/stablehlo#1869), the type inference of reduce op is now dependent on the reduction body. The CL updates all the calls sites of the problematic builder (the one which calls type inference with empty reduction block) with the invocation of a new custom builder method introduced for mhlo::Reduce operation. Note that at the moment we do not need similar custom builder for other reduction based operations (like scatter, reduce_scatter, all_reduce, select_and_scatter, reduce_window) as they are presently created using a builder version take result type as an input and hence does not call inference from within. Also, the CL adds verification tests for the operations with promotable semantics. PiperOrigin-RevId: 597407271

The upstream change #1869 in StableHLO updates various API related to shape inference. MHLO shape inference functions in [hlo_ops.cc](https://github.com/openxla/xla/blob/main/xla/mlir_hlo/mhlo/IR/hlo_ops.cc) uses those APIs. The PR updates the visibility and signature of those API for a clearer integration. Specifically, the PR does the followings: 1. **updates `getAccumulatorTypes` to return a error status when the input regions is empty**: This function is used in type inference of various reduction based operations ([eg](https://github.com/openxla/stablehlo/blob/d5b464925371092095ac934b46ba93ebd4284223/stablehlo/dialect/TypeInference.cpp#L2589)). This functions enables infering type based on the reduction block of the operation, which is something proposed in [RFC](#1664). However, there could be [instances](https://github.com/openxla/xla/blob/a91877b9c9aa1edf307c5927782111b1a81cd81d/xla/mlir_hlo/mhlo/transforms/group_reduction_dimensions/group_reduction_dimensions.cc#L228) when type inference can be called with empty region in which case we would like to report a meaningful diagnostic message. 2. **Allow `hlo::inferAllReduceOp` to accept multiple operands information**: In stableHLO, `all_reduce` op have a single operand ([e.g.](https://github.com/openxla/stablehlo/blob/d5b464925371092095ac934b46ba93ebd4284223/stablehlo/dialect/StablehloOps.td#L1355)), whereas in MHLO the op can take multiple operand ([e.g.](https://github.com/openxla/xla/blob/79aba0801ef75c1c2dffbb4ecc506a0d8144c9ac/xla/mlir_hlo/mhlo/IR/hlo_ops.td#L1528). The `hlo::inferAllReduceOp` signature is updated to accommodate both cases. 3. Remove unused arguments to functions `verifyReduceOpInputsAndInferShape` and `inferReduceOp`.

sdasgup3 added the RFC label Jun 30, 2023

sdasgup3 requested a review from burmako June 30, 2023 23:09

sdasgup3 self-assigned this Jun 30, 2023

sdasgup3 force-pushed the rfc-reduce branch from 7b52b9f to b3e2fa0 Compare June 30, 2023 23:10

loganchien reviewed Jun 30, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

loganchien reviewed Jun 30, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

loganchien reviewed Jun 30, 2023

View reviewed changes

sdasgup3 force-pushed the rfc-reduce branch from b3e2fa0 to 76fb6fe Compare July 3, 2023 01:08

sdasgup3 force-pushed the rfc-reduce branch from 76fb6fe to b762d5e Compare July 13, 2023 22:04

loganchien reviewed Jul 13, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Show resolved Hide resolved

sdasgup3 force-pushed the rfc-reduce branch from fb2bbaf to 465f80f Compare July 13, 2023 22:42

sdasgup3 force-pushed the rfc-reduce branch from 465f80f to 0ed95d1 Compare July 25, 2023 18:26

sdasgup3 force-pushed the rfc-reduce branch 5 times, most recently from f1f7244 to 91ad66f Compare August 14, 2023 00:13

loganchien reviewed Aug 15, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Show resolved Hide resolved

burmako mentioned this pull request Aug 18, 2023

Specification for quantized ReduceOp, ReduceWindowOp, and SelectAndScatterOp #1538

Closed

sdasgup3 force-pushed the rfc-reduce branch 4 times, most recently from 569ccbd to 9133dda Compare August 29, 2023 01:26

loganchien reviewed Aug 29, 2023

View reviewed changes

sdasgup3 force-pushed the rfc-reduce branch from 9133dda to 39e8231 Compare August 29, 2023 20:47

loganchien reviewed Aug 29, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

sdasgup3 force-pushed the rfc-reduce branch 2 times, most recently from 4c59aa8 to 07538c8 Compare August 29, 2023 21:46

sherhut reviewed Sep 6, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Show resolved Hide resolved

sdasgup3 force-pushed the rfc-reduce branch from 07538c8 to 8d9c6e4 Compare September 6, 2023 16:40

sdasgup3 force-pushed the rfc-reduce branch 2 times, most recently from 696570b to fe2b9c9 Compare October 4, 2023 22:33

loganchien approved these changes Oct 9, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

rfcs/20230622-quantized-reduction.md Outdated Show resolved Hide resolved

sdasgup3 added 11 commits October 9, 2023 16:02

rfc for reducntion ops

0fc3c40

any of on the conversion fuction can be optional + typo fixes

b6839c9

Address feedback: fix a few type checks in the code snippets provodfed

e38451e

Address feedback: fix header indentation and code block language

0eec267

fix a few typos

29ec700

revised plan

e106ae8

Iteration 2: revised proposal: drop the accumulation type attribute f…

cc03647

…or reduce op

review iteration: various typo fixes

538cb8f

elaborate on the why openxla#369 is not entirely part of the current …

1a0eabd

…proposal

make it clear in the rfc that the poposed semantics is optional

b5595da

typo fixes

4cb2066

sdasgup3 force-pushed the rfc-reduce branch from fe2b9c9 to 4cb2066 Compare October 9, 2023 16:04

sdasgup3 requested a review from loganchien October 9, 2023 16:05

loganchien approved these changes Oct 9, 2023

View reviewed changes

GleasonK approved these changes Oct 9, 2023

View reviewed changes

rfcs/20230622-quantized-reduction.md Show resolved Hide resolved

GleasonK merged commit f084a1a into openxla:main Oct 9, 2023
7 checks passed

sdasgup3 mentioned this pull request Oct 9, 2023

Specification of quantized reduce and other related ops #1796

Merged

copybara-service bot mentioned this pull request Jan 12, 2024

Migration the stablehlo reduction family of operations, with promotable semantics [RFC](https://github.com/openxla/stablehlo/pull/1664), to MHLO. openxla/xla#8416

Open

sdasgup3 mentioned this pull request Jan 12, 2024

Updates to shape functions enabling reuse from MHLO #1918

Merged

sdasgup3 added the Quantization label Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Specification of quantized reduction operation #1664

[RFC] Specification of quantized reduction operation #1664

sdasgup3 commented Jun 30, 2023

cormacmb commented Sep 13, 2023

GleasonK left a comment

[RFC] Specification of quantized reduction operation #1664

[RFC] Specification of quantized reduction operation #1664

Conversation

sdasgup3 commented Jun 30, 2023

cormacmb commented Sep 13, 2023

GleasonK left a comment

Choose a reason for hiding this comment