From 0fc3c4069baf4d09f7a3d56d1c2b901693f19f4a Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Thu, 22 Jun 2023 22:16:49 +0000 Subject: [PATCH 01/11] rfc for reducntion ops --- rfcs/20230622-quantized-reduction.md | 521 +++++++++++++++++++++++++++ 1 file changed, 521 insertions(+) create mode 100644 rfcs/20230622-quantized-reduction.md diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md new file mode 100644 index 00000000000..e38db43203f --- /dev/null +++ b/rfcs/20230622-quantized-reduction.md @@ -0,0 +1,521 @@ +# RFC: StableHLO quantization for reduction ops + +Status: Review
+Initial version: 06/22/2023
+Last updated:
+Discussion thread: [GitHub](https://github.com/openxla/stablehlo/pull/1664) + +## Version log + +* 06/22/2023: Initial version. + +## Introduction + +The [reduce](https://github.com/openxla/stablehlo/blob/main/docs/spec.md#reduce) +op, for non-quantized types, has constraints like + +```python +(C2) element_type(inputs...) = element_type(init_values...) = element_type(results...). +(C6) body has type tensor, ..., tensor, tensor, ..., tensor) -> (tensor, ..., tensor) where Ei = element_type(inputs[i]). +``` + +which constrained the signature of reduce op and its associated reducer function +`body` to have the same element types for `inputs`, `results` and arguments and +return for `body`. For reducer function performing an accumulative operation like +add, this means that the the result of accumulation can overflow in which case +the result will be implementation defined (e.g., +[saturated](https://en.wikipedia.org/wiki/Saturation_arithmetic) or +[wrap around](https://en.wikipedia.org/wiki/Integer_overflow)). +From the conversation with customers it seems a reasonable behavior for non +quantized data types. However, with quantized data types, such loss in precision +is not acceptable and hence the motivation is to perform the accumulation in +some higher data type. + +The RFC highlights some of the options emerged out of discussion in the +[thread](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) +along with their tradeoffs. The proposal option #1 looks promising at this +point, but we are open to further discussion on this. + +## Option 1: Introduce additional conversion functions + +[The thread](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) +discuses an option, proposed by @loganchien, on how to achieve the structural +changes as mentioned above. We note that some of the examples/diagrams presented +here are borrowed from an internal doc @loganchien authored. + +The proposed options introduces on-the-fly type conversions, which (1) convert +the input type to the type of the `body` function argument and (2) convert the +result type of the `body` function to the output type. Following is the code +snippet with the proposed syntax of reduce op: + +```mlir +%result = "stablehlo.reduce"(%input, %init_value) ({ + ^input_conversion( + %input: tensor>): + %input_rescaled = "stablehlo.uniform_quantize"(%input) + : (tensor>) + -> tensor> + "stablehlo.return"(%input_rescaled) + : (tensor>) -> () + + }, { + ^reduce_computation( + %lhs: tensor>, + %rhs: tensor>): + %add = "stablehlo.add"(%lhs, %rhs) + : (tensor>, + tensor>) + -> tensor> + "stablehlo.return"(%add) + : (tensor>) -> () + }, { + ^output_conversion( + %intermediate_result: tensor>): + %output_rescaled = "stablehlo.uniform_quantize"(%intermediate_result) + : (tensor>) + -> tensor> + "stablehlo.return"(%output_rescaled) + : (tensor>) -> () + }) { + dimensions = dense<...> : tensor<1xi64> + } : (tensor<... x !quant.uniform>, + tensor<... x !quant.uniform>) + -> tensor<... x !quant.uniform> +``` + +### Semantics + +Here we will informally propose the semantics of the additional functions +`input_conversion` and `output_conversion` introduced. + +```python ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +|init_value| |input[0]| |input[1]| |init_value| |input[2]| |input[3]| ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ + | | | | | | ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +|input | |input | |input | |input | |input | |input | +|convert | |convert | |convert | |convert | |convert | |convert | ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ + \ / / \ / / + +-------+ / +-------+ / + |compute| / |compute| / + +-------+ / +-------+ / + \ / \ / + +-------+ +-------+ + |compute| |compute| + +-------+ +-------+ + \___________ ___________/ + \ / + +-------+ + |compute| + +-------+ + | + +-------+ + |output | + |convert| + +-------+ +``` + +### Semantics of `input_conversion` block + +The `input_conversion` block is applied selectively to the leaf nodes of a +schedule tree as shown in above diagram. Note that the `input_conversion` cannot +be applied to the non-leaf nodes of the schedule tree. + +### Semantics of `output_conversion` block + +The `output_conversion` block is applied just after the `result` for a particular +index is computed as shown in the above diagram. + +Please refer to the [formal spec](#specification-of-reduce-op) of the proposed +reduce op. + +### Implementation details + +From the implementation POV of the proposed spec, we note that +`input_conversion` and `output_conversion` can very well be optional with +default values as identity functions. For example, the following code snippet + +```mlir +%result = "stablehlo.reduce"(%input, %init_value) ({ + ^reduce_computation( + %lhs: tensor>, + %rhs: tensor>): + %add = "stablehlo.add"(%lhs, %rhs) + : (tensor>, + tensor>) + -> tensor> + "stablehlo.return"(%add) + : (tensor>) -> () + }) { + dimensions = dense<...> : tensor<1xi64> + } : (tensor<... x !quant.uniform>, + tensor<... x !quant.uniform>) + -> tensor<... x !quant.uniform> +``` + +should be interpreted as + +```mlir +%result = "stablehlo.reduce"(%input, %init_value) ({ + ^input_conversion( + %input: tensor>): + "stablehlo.return"(%input) + : (tensor>) -> () + + }, { + ^reduce_computation( + %lhs: tensor>, + %rhs: tensor>): + %add = "stablehlo.add"(%lhs, %rhs) + : (tensor>, + tensor>) + -> tensor> + "stablehlo.return"(%add) + : (tensor>) -> () + }, { + ^output_conversion( + %intermediate_result: tensor>): + "stablehlo.return"(%intermediate_result) + : (tensor>) -> () + }) { + dimensions = dense<...> : tensor<1xi64> + } : (tensor<... x !quant.uniform>, + tensor<... x !quant.uniform>) + -> tensor<... x !quant.uniform> +``` + +Note that with default values, the input/result type of `reduce` op matches +with the argument or the result type of the `reduce_computation`, including the +quantization parameters. + +It is important to enforce the following aspect in the implementations: both +conversion functions must be provided or both must be missing. If this is not +followed, and there are two functions provided in the IR, it would be difficult +to determine which function is intended for which purpose. + +The existing pretty printing is currently producing the following output +`stablehlo.reduce(%input init: %init_value) applies stablehlo.add across +dimensions = [1] : (tensor<1x6xi64>, tensor) -> tensor<1xi64>`. IMO, +modifying the above format, with the default conversion function, will create +clutter. My proposal here is to follow the existing pretty printing when the +conversion functions are "not provided". In the event, the conversion functions +are explicitly provided, then the pretty printers will fall back to default +generic printing, +**even if the explicitly provided conversion functions are identity function**: +To avoid identification of identity functions which could be tricky in general. + +#### Tradeoffs + +* (+) Enables programmers to program at (almost) baremetal. If the hardware + can support reduction computation in wider type (e.g. in the SIMD + instruction set, we typically do widening/compute/narrowing within the + kernel to save the memory bandwidth), the programmer can explicitly request + for that. +* (-) The disadvantage of this representation is that the syntax is more + verbose and requires significant changes to the specification. + +## Option 2: re-scale input to accumulation type + +This option is the simplest from the POV for specification of quantized `reduce` +op. This is adding `stablehlo.uniform_quantize` and `stablehlo.dequantize` ops +respectively before and after reduce op which operates on the "accumulator" +type. + +```mlir +%widen = "stablehlo.uniform_quantize"(%input) + : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> + +%reduce = "stablehlo.reduce"(%widen) { + ^reduce_computation(%lhs: !quant.uniform, %rhs: !qunat.uniform): + // reduce_computation_block + } + : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> + +%narrowed = "stablehlo.uniform_dequantize"(%reduce) + : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> +``` + +### Tradeoffs + +* (+) An advantage of this option is that we only need minor changes to the + specification (i.e. to allow quantized types). +* (-) The compiler must pattern match 3 operations and map them into some + internal representation before their compilation or execution. +* (-) The compiler must ensure that the `stablehlo.uniform_quantize` (or + `stablehlo.convert` in the case of `bf16` or `f16`) is not folded before the + backend matches the pattern. + [for more information](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) + +## Other options considered + +There is another option considered which did not fly well because of limited +extensibility. Adding it just for completeness purposes. + +### Option 3: allow accumulator type to be different from input type + +The idea here is to convey the accumulator type using the `init_value` operand +of `reduce` op. The code snippet for `reduce` looks like: + +```mlir +%result = "stablehlo.reduce"(%input, %init_value) ({ + ^reduce_computation( + %elem: tensor>, + %acc: tensor>): + %elem_rescaled = "stablehlo.uniform_quantize"(%elem) + : (tensor>) + -> tensor> + %add = "stablehlo.add"(%elem_rescaled, %acc) + : (tensor>, + tensor>) + -> tensor> + "stablehlo.return"(%0) + : (tensor>) -> () + }) { + dimensions = dense<1> : tensor<1xi64> + } : (tensor<... x !quant.uniform>, + tensor<... x !quant.uniform>) + -> tensor<... x !quant.uniform> +``` + +In this option, the `init_value` type and the `result` type can be different +from the input type. The first argument of the compute block is fixed for the +traversed element and the second argument is fixed for the intermediate +(accumulation) result. + +#### Tradeoffs + +* (+) Make the accumulation type explicit in the IR. +* (-) This representation imposes a limitation on the evaluation order. + Since we can’t express the computation between two intermediate (accumulation) + results, we can not arbitrarily insert `init_value` and start the + computation at an arbitrary location. The following shows the restricted + evaluation order with the method. + +```python ++----------+ +--------+ +--------+ +--------+ +--------+ +|init_value| |input[0]| |input[1]| |input[2]| |input[3]| ++----------+ +--------+ +--------+ +--------+ +--------+ + \ / / / / + +-------+ / / / + |compute| / / / + +-------+ / / / + \ / / / + +-------+ / / + |compute| / / + +-------+ / / + \ / / + +-------+ / + |compute| / + +-------+ / + \ / + +-------+ + |compute| + +-------+ +``` + +## Open Question + +### Should we restrict the proposal #1 to quantized types only? + +The above proposal #1 of introducing the additional functions is theoretically +not limited to quantized `reduce` op, but also can be applied to `reduce` op with +non-quantized types. For example, + +```mlir +%result = "stablehlo.reduce"(%input, %init_value) ({ + ^input_conversion(%arg0: tensor): + %0 = "stablehlo.convert"(%arg0): (tensor) -> (tensor) + "stablehlo.return"(%0) : (tensor) -> (tensor) + }, { + ^bb0(%arg0: tensor, %arg1: tensor): + %0 = "stablehlo.add"(%arg0, %arg1) : (tensor, tensor) -> + tensor + "stablehlo.return"(%0) : (tensor) -> () + }, + ^output_conversion(%arg0: tensor): + %0 = "stablehlo.convert"(%arg0): (tensor) -> (tensor) + "stablehlo.return"(%0) : (tensor) -> (tensor) + }) { + dimensions = dense<1> : tensor<1xbf16> +} : (tensor<1x6xbf16>, tensor) -> tensor<1xbf16> +``` + +However, it is not clear how such operations will be lowered to other IR +representations, like HLO, which does not support such additional computation +blocks. IMO there is no additional benefit to support such conversion +functions for regular type given that there already exists infrastructure +(backend support, lowering passes) to support regular types w/o conversion +functions. My proposal here would be to restrict the support to only quantized +types. + +## Appendix + +To provide an estimate of specification changes needed to implement option #1 +I have attempted to provide the blueprint here. + +### Specification of reduce op + +```python +#### Semantics + +Applies a reduction functions `input_conversion`, `body`, and +`output_conversion` to `inputs` and `init_values` along the `dimensions` and +produces `results` tensors. + +The order of reductions is implementation-defined, which means that `body` and +`init_values` must form a monoid to guarantee that the operation produces the +same results for all inputs on all implementations. However, this condition +doesn't hold for many popular reductions. E.g. floating-point addition for +`body` and zero for `init_values` don't actually form a monoid because +floating-point addition is not associative. + +More formally, `results...[j0, ..., jR-1] = +map(output_conversion, reduce(input_slices_converted))` where: + +* `input_slices = inputs...[j0, ..., :, ..., jR-1]`, where `:` are inserted + at `dimensions`. +* `input_slices_converted = map(input_conversion, input_slices...)`. +* `reduce(input_slices_converted) = exec(schedule)` for some binary tree + `schedule` where: + * `exec(node) = body(exec(node.left), exec(node.right))`. + * `exec(leaf) = leaf.value`. +* `schedule` is an implementation-defined full binary tree whose in-order + traversal consists of: + * `input_slices_converted...[index]` values, for all `index` in + `index_space(input_slices_converted)` in the ascending lexicographic order + of `index`. + * Interspersed with an implementation-defined amount of `init_values` + at implementation-defined positions. + +#### Inputs + +| Label | Name | Type | Constraints | +|-------|---------------------|----------------------------------------------|-------------| +| (I?) | `inputs` | variadic number of tensors | | +| (I?) | `init_values` | variadic number of 0-dimensional tensors | | +| (I?) | `dimensions` | 1-dimensional tensor constant of type `si64` | | +| (I?) | `input_conversion` | function | | +| (I?) | `body` | function | | +| (I?) | `output_conversion` | function | | + +#### Outputs + +| Name | Type | Constraints | +|-----------|----------------------------|-------------| +| `results` | variadic number of tensors | | + +#### Constraints + +* (C?) `same(shape(inputs...))`. +* (C?) `element_type(inputs...) = element_type(init_values...)`. +* (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. +* (C?) `0 < size(inputs) = size(init_values) = size(results) = N`. +* (C?) `0 <= dimensions < rank(inputs[0])`. +* (C?) `is_unique(dimensions)`. +* (C?) `input_conversion` has type `tensor, ..., tensor -> + (tensor, ..., tensor)` where `Ei = element_type(inputs[i])`. +* (C?) `body` has type `tensor, ..., tensor, tensor, ...,` + `tensor) -> (tensor, ..., tensor)` where + `Ei = element_type(output_types(input_conversion)[i])`. +* (C?) `output_conversion` has type `tensor, ..., tensor -> + (tensor, ..., tensor)` where + `E'i = element_type(results[i])`. +* (C?) `element_type(output_types(input_conversion)...) = + element_type(input_types(output_conversion)...)`. +* (C?) `shape(results...) = shape(inputs...)` except that the dimension + sizes of `inputs...` corresponding to `dimensions` are not included. +``` + +The above specification of `reduce` op can be used to define the specification +of other ops as shown below. For brevity, we are only presenting the relevant +portions of the spec which needs modification. + +### reduce_window + +```python +#### Semantics + +Applies a reduction functions `input_conversion`, `body`, and +`output_conversion` to windows of `inputs` and `init_values` and produces +`results`. + +... + +More formally, +`results...[result_index] = reduce(windows, init_values, axes(inputs...), + input_conversion, body, output_conversion)` +where: +.... + +#### Inputs + +| Label | Name | Type | +|-------|---------------------|----------| +| (I?) | `input_conversion` | function | +| (I8) | `body` | function | +| (I?) | `output_conversion` | function | + +#### Constraints + +* (C?) `element_type(inputs...) = element_type(init_values...)`. +* (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. +* (C?) `input_conversion` has type `tensor, ..., tensor -> + (tensor, ..., tensor)` where `Ei = element_type(inputs[i])`. +* (C?) `body` has type `tensor, ..., tensor, tensor, ...,` + `tensor) -> (tensor, ..., tensor)` where + `Ei = element_type(output_types(input_conversion)[i])`. +* (C?) `output_conversion` has type `tensor, ..., tensor -> + (tensor, ..., tensor)` where + `E'i = element_type(results[i])`. +* (C?) `element_type(output_types(input_conversion)...) = + element_type(input_types(output_conversion)...)`. +``` + +### select_and_scatter + +This op originally takes two function arguments `select` and `scatter`. As the +`select` function is supposed to perform a non-accumulative operation, we may +not need additional conversion functions associated with `select`. But the +`scatter` function needs be accompanied with `input_conversion` and +`output_conversion` functions. + +```python +#### Semantics + +Scatters the values from the `source` tensor using `scatter` based on the +outcome of `reduce_window` of the `input` tensor using `select` and produces +a `result` tensor. + +More formally: +... +* `result[result_index] = reduce([source_values], [init_value], [0], + input_conversion, scatter, output_conversion)` + where: + ... + +#### Inputs + +| Label | Name | Type | +|-------|---------------------|----------| +| (I8) | `input_conversion` | function | +| (I8) | `scatter` | function | +| (I8) | `output_conversion` | function | + +#### Constraints + + +* (C1) `element_type(operand) = element_type(source)`. +* (C3) `element_type(init_value) = element_type(operand)`. +* (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. +* (C?) `input_conversion` has type `tensor -> (tensor)` where + `Ei = element_type(operand)`. +* (C10) `scatter` has type `(tensor, tensor) -> tensor` where + `E = element_type(output_types(input_conversion))`. +* (C?) `output_conversion` has type `tensor -> (tensor)` where + `E'i = element_type(result)`. +* (C?) `element_type(output_types(input_conversion)) = + element_type(input_types(output_conversion))`. +* (C11) `shape(operand) = shape(result)`. + From b6839c9cac716d720204c86a2343bf67a4f37fa3 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Mon, 3 Jul 2023 01:07:49 +0000 Subject: [PATCH 02/11] any of on the conversion fuction can be optional + typo fixes --- rfcs/20230622-quantized-reduction.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index e38db43203f..c5b1be8c4d6 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -2,7 +2,7 @@ Status: Review
Initial version: 06/22/2023
-Last updated:
+Last updated: 07/02/2023
Discussion thread: [GitHub](https://github.com/openxla/stablehlo/pull/1664) ## Version log @@ -134,7 +134,7 @@ reduce op. ### Implementation details From the implementation POV of the proposed spec, we note that -`input_conversion` and `output_conversion` can very well be optional with +`input_conversion` or `output_conversion` can very well be optional with default values as identity functions. For example, the following code snippet ```mlir @@ -190,10 +190,10 @@ Note that with default values, the input/result type of `reduce` op matches with the argument or the result type of the `reduce_computation`, including the quantization parameters. -It is important to enforce the following aspect in the implementations: both -conversion functions must be provided or both must be missing. If this is not -followed, and there are two functions provided in the IR, it would be difficult -to determine which function is intended for which purpose. +Also, note that the relative order of `input_conversion` or `output_conversion` +w.r.t the `reduce_computation` can be used to identify the appropriate +conversion function when any one of `input_conversion` or `output_conversion` is +missing. The existing pretty printing is currently producing the following output `stablehlo.reduce(%input init: %init_value) applies stablehlo.add across @@ -333,7 +333,7 @@ non-quantized types. For example, %0 = "stablehlo.add"(%arg0, %arg1) : (tensor, tensor) -> tensor "stablehlo.return"(%0) : (tensor) -> () - }, + }, { ^output_conversion(%arg0: tensor): %0 = "stablehlo.convert"(%arg0): (tensor) -> (tensor) "stablehlo.return"(%0) : (tensor) -> (tensor) @@ -519,3 +519,4 @@ More formally: element_type(input_types(output_conversion))`. * (C11) `shape(operand) = shape(result)`. +``` From e38451e9149a855bb81d48efbfb7119f1e322149 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Thu, 13 Jul 2023 22:04:27 +0000 Subject: [PATCH 03/11] Address feedback: fix a few type checks in the code snippets provodfed --- rfcs/20230622-quantized-reduction.md | 32 ++++++++++++++-------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index c5b1be8c4d6..812a4503fb0 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -162,21 +162,21 @@ should be interpreted as ^input_conversion( %input: tensor>): "stablehlo.return"(%input) - : (tensor>) -> () + : (tensor>) -> () }, { ^reduce_computation( - %lhs: tensor>, - %rhs: tensor>): + %lhs: tensor>, + %rhs: tensor>): %add = "stablehlo.add"(%lhs, %rhs) - : (tensor>, - tensor>) - -> tensor> + : (tensor>, + tensor>) + -> tensor> "stablehlo.return"(%add) - : (tensor>) -> () + : (tensor>) -> () }, { ^output_conversion( - %intermediate_result: tensor>): + %intermediate_result: tensor>): "stablehlo.return"(%intermediate_result) : (tensor>) -> () }) { @@ -262,21 +262,21 @@ of `reduce` op. The code snippet for `reduce` looks like: %result = "stablehlo.reduce"(%input, %init_value) ({ ^reduce_computation( %elem: tensor>, - %acc: tensor>): + %acc: tensor>): %elem_rescaled = "stablehlo.uniform_quantize"(%elem) : (tensor>) - -> tensor> + -> tensor> %add = "stablehlo.add"(%elem_rescaled, %acc) - : (tensor>, - tensor>) - -> tensor> + : (tensor>, + tensor>) + -> tensor> "stablehlo.return"(%0) - : (tensor>) -> () + : (tensor>) -> () }) { dimensions = dense<1> : tensor<1xi64> } : (tensor<... x !quant.uniform>, - tensor<... x !quant.uniform>) - -> tensor<... x !quant.uniform> + tensor<... x !quant.uniform>) + -> tensor<... x !quant.uniform> ``` In this option, the `init_value` type and the `result` type can be different From 0eec267bc50916e6c1c2d13cbc90e380c660ad86 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Thu, 13 Jul 2023 22:36:05 +0000 Subject: [PATCH 04/11] Address feedback: fix header indentation and code block language --- rfcs/20230622-quantized-reduction.md | 31 +++++++++++----------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 812a4503fb0..24249697778 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -2,12 +2,13 @@ Status: Review
Initial version: 06/22/2023
-Last updated: 07/02/2023
+Last updated: 07/13/2023
Discussion thread: [GitHub](https://github.com/openxla/stablehlo/pull/1664) ## Version log * 06/22/2023: Initial version. +* 07/13/2023: Fixed typo in code blocks, header indentation. ## Introduction @@ -128,7 +129,7 @@ be applied to the non-leaf nodes of the schedule tree. The `output_conversion` block is applied just after the `result` for a particular index is computed as shown in the above diagram. -Please refer to the [formal spec](#specification-of-reduce-op) of the proposed +Please refer to the [formal spec](#revised-specification-of-reduce-op) of the proposed reduce op. ### Implementation details @@ -206,7 +207,7 @@ generic printing, **even if the explicitly provided conversion functions are identity function**: To avoid identification of identity functions which could be tricky in general. -#### Tradeoffs +### Tradeoffs * (+) Enables programmers to program at (almost) baremetal. If the hardware can support reduction computation in wider type (e.g. in the SIMD @@ -248,13 +249,10 @@ type. backend matches the pattern. [for more information](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) -## Other options considered - -There is another option considered which did not fly well because of limited -extensibility. Adding it just for completeness purposes. - -### Option 3: allow accumulator type to be different from input type +## Option 3: allow accumulator type to be different from input type +This is another option we considered which does not fly well because of limited +expressibility. Adding it just for completeness purposes. The idea here is to convey the accumulator type using the `init_value` operand of `reduce` op. The code snippet for `reduce` looks like: @@ -284,7 +282,7 @@ from the input type. The first argument of the compute block is fixed for the traversed element and the second argument is fixed for the intermediate (accumulation) result. -#### Tradeoffs +### Tradeoffs * (+) Make the accumulation type explicit in the IR. * (-) This representation imposes a limitation on the evaluation order. @@ -355,9 +353,8 @@ types. To provide an estimate of specification changes needed to implement option #1 I have attempted to provide the blueprint here. -### Specification of reduce op +### Revised specification of reduce op -```python #### Semantics Applies a reduction functions `input_conversion`, `body`, and @@ -426,15 +423,13 @@ map(output_conversion, reduce(input_slices_converted))` where: element_type(input_types(output_conversion)...)`. * (C?) `shape(results...) = shape(inputs...)` except that the dimension sizes of `inputs...` corresponding to `dimensions` are not included. -``` The above specification of `reduce` op can be used to define the specification of other ops as shown below. For brevity, we are only presenting the relevant portions of the spec which needs modification. -### reduce_window +### Revised specification of reduce_window op -```python #### Semantics Applies a reduction functions `input_conversion`, `body`, and @@ -471,9 +466,8 @@ where: `E'i = element_type(results[i])`. * (C?) `element_type(output_types(input_conversion)...) = element_type(input_types(output_conversion)...)`. -``` -### select_and_scatter +### Revised specification of select_and_scatter op This op originally takes two function arguments `select` and `scatter`. As the `select` function is supposed to perform a non-accumulative operation, we may @@ -481,7 +475,6 @@ not need additional conversion functions associated with `select`. But the `scatter` function needs be accompanied with `input_conversion` and `output_conversion` functions. -```python #### Semantics Scatters the values from the `source` tensor using `scatter` based on the @@ -490,6 +483,7 @@ a `result` tensor. More formally: ... + * `result[result_index] = reduce([source_values], [init_value], [0], input_conversion, scatter, output_conversion)` where: @@ -519,4 +513,3 @@ More formally: element_type(input_types(output_conversion))`. * (C11) `shape(operand) = shape(result)`. -``` From 29ec7003c942b5232fe5bc55e3f23ac260a3dbd9 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Tue, 25 Jul 2023 18:26:21 +0000 Subject: [PATCH 05/11] fix a few typos --- rfcs/20230622-quantized-reduction.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 24249697778..1251a59f4d6 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -220,9 +220,8 @@ To avoid identification of identity functions which could be tricky in general. ## Option 2: re-scale input to accumulation type This option is the simplest from the POV for specification of quantized `reduce` -op. This is adding `stablehlo.uniform_quantize` and `stablehlo.dequantize` ops -respectively before and after reduce op which operates on the "accumulator" -type. +op. This is adding `stablehlo.uniform_quantize`ops before and after reduce op +which operates on the "accumulator" type. ```mlir %widen = "stablehlo.uniform_quantize"(%input) @@ -234,7 +233,7 @@ type. } : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> -%narrowed = "stablehlo.uniform_dequantize"(%reduce) +%narrowed = "stablehlo.uniform_quantize"(%reduce) : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> ``` From e106ae8c6267f1965a5e71035145c0f61c6bb257 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Fri, 11 Aug 2023 22:07:38 +0000 Subject: [PATCH 06/11] revised plan --- rfcs/20230622-quantized-reduction.md | 93 +++++++++++++++++++++++++++- 1 file changed, 92 insertions(+), 1 deletion(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 1251a59f4d6..2152a5cc9ea 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -2,7 +2,9 @@ Status: Review
Initial version: 06/22/2023
-Last updated: 07/13/2023
+updated: 07/13/2023
: Minor refactoring of the examples. +Last updated: 08/11/2023
: Revision of the proposal to introduce an +attribute to capture accumulation type.
Discussion thread: [GitHub](https://github.com/openxla/stablehlo/pull/1664) ## Version log @@ -512,3 +514,92 @@ More formally: element_type(input_types(output_conversion))`. * (C11) `shape(operand) = shape(result)`. + +## [11 Aug'23] Revised proposal + +### Context + +Option #2 should be avoided because it is hard to control the transformation +which might disrupt the pattern to be matched. The option #1 sounds good except +that the extra input/output conversion blocks are surplus information. The +specification would benefit if the intent of the conversion blocks can be +expressed precisely. The conversion blocks provides a way to capture the +accumulation type needed to compute the accumulative operation on. + +The revised proposal is: + +* To capture the accumulation type via an additional StableHLO attribute like + `accumulation_element_type`. +* The attribute seems beneficial for other ops as well like `dot_general` and + `convolution`. +* `precision_config`, currently used for `dot_general` and `convolution`, is + used to override the precision specified by the input parameters, allowing the + choice of low precision vs high precision computation. We should consider + adding `precision_config` to all reduction based op as well. + +### Few implementation details + +#### On StableHLO side + +The reduce syntax to be augmented with a optional [type +attribute](https://github.com/llvm/llvm-project/blob/51a57074bc63842970c4c160b05c1a7e42db7523/mlir/include/mlir/IR/OpBase.td#L1466) +as follows: + +```mlir +%0 = stablehlo.reduce(%arg0 init: %arg1) across dimensions = [0] { + accumulation_type = tensor> +} : (tensor<16x!quant.uniform>, tensor>) -> tensor> + reducer(%arg2: tensor>, %arg3: tensor>) { + %1 = stablehlo.add %arg2, %arg3 : tensor> + stablehlo.return %1 : tensor> + } + +// using tablegen specification like +// OptionalAttr>:$accumulation_type +``` + +Note that the main difference between this option and the option #1 is that the +input and output conversion blocks are no longer used as their intent is +specified via the `accumulation_type` attribute. However, the reducer block +still needs to express the computation in accumulation type only. + +**Why optional attribute?** + +* At times, it might be desirable not to hard-code the accumulation type. For + example, when we would like to write a generic code and let the downstream + compilation tools to decide the exact accumulation type based on the hardware + of choice. +* It allows the stablehlo, used in various existing pipelines, to remain + largely unaffected by this change. + +Next, the StableHLO specification should be updated with the syntax and +semantics aspects of this attribute. + +#### On StableHLO Consumers side + +The consumers can pattern match the op taking the accumulation type in account +if the targeted hardware supports accumulation at higher type. +There are still to explore things about maintaining StableHLO-HLO parity which +needs to be addresses as well. + +### Action Plan + +I propose to follow the action plan (order matters): + +* Update the specification of ReduceOp, ReduceWindowOp, and SelectAndScatterOp + op, taking the accumulation type into account, via [open + pr](https://github.com/openxla/stablehlo/pull/1538). +* Finalize the quantized specification of AllReduceOp, BatchNormTrainingOp, + BatchNormGradOp and ReduceScatterOp, whose semantics depend on ReduceOp, + via [open ticket](https://github.com/openxla/stablehlo/issues/1666). +* Add implementation for additional attribute in the above ops. This includes +updating the tablegen spec/verifiers/type inferencers. [Need a ticket for this]. +* Address the disparity between StableHLO and HLO because of the introduction of +this new attribute in StableHLO: Should/How XLA should consume this additional +attribute? [Need a ticket for this]. +* Spec the behavior of `precision_config` in DotGeneralOp. [open +issue](https://github.com/openxla/stablehlo/issues/755) +* Consider adding `precision_config` in reduction op. [need a ticket for this +* Consider adding `accumulation_type` to `dot_general`/`convolution op`. +[need a ticket for this item]. +item]. From cc036473dddd640d1f2ad47d3589cfb46315ec66 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Mon, 28 Aug 2023 02:08:03 +0000 Subject: [PATCH 07/11] Iteration 2: revised proposal: drop the accumulation type attribute for reduce op --- rfcs/20230622-quantized-reduction.md | 723 +++++++++------------------ 1 file changed, 245 insertions(+), 478 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 2152a5cc9ea..248d97c097a 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -2,8 +2,8 @@ Status: Review
Initial version: 06/22/2023
-updated: 07/13/2023
: Minor refactoring of the examples. -Last updated: 08/11/2023
: Revision of the proposal to introduce an +updated: 07/13/2023: Minor refactoring of the examples.
+Last updated: 08/11/2023: Revision of the proposal to introduce an attribute to capture accumulation type.
Discussion thread: [GitHub](https://github.com/openxla/stablehlo/pull/1664) @@ -11,6 +11,9 @@ Discussion thread: [GitHub](https://github.com/openxla/stablehlo/pull/1664) * 06/22/2023: Initial version. * 07/13/2023: Fixed typo in code blocks, header indentation. +* 08/11/2023: Revision of the proposal to introduce an attribute to capture + accumulation type. +* 08/25/2023: The additional attribute is redundant. ## Introduction @@ -24,44 +27,32 @@ op, for non-quantized types, has constraints like which constrained the signature of reduce op and its associated reducer function `body` to have the same element types for `inputs`, `results` and arguments and -return for `body`. For reducer function performing an accumulative operation like -add, this means that the the result of accumulation can overflow in which case -the result will be implementation defined (e.g., -[saturated](https://en.wikipedia.org/wiki/Saturation_arithmetic) or -[wrap around](https://en.wikipedia.org/wiki/Integer_overflow)). -From the conversation with customers it seems a reasonable behavior for non -quantized data types. However, with quantized data types, such loss in precision -is not acceptable and hence the motivation is to perform the accumulation in -some higher data type. - -The RFC highlights some of the options emerged out of discussion in the +return for `body`. For reducer function performing an accumulative operation +like add, this means that the the result of accumulation can overflow in which +case the result will be implementation defined (e.g., + [saturated](https://en.wikipedia.org/wiki/Saturation_arithmetic) or + [wrap around](https://en.wikipedia.org/wiki/Integer_overflow)). From +the conversation with customers it seems a reasonable behavior for non quantized +data types. However, with quantized data types, such loss in precision is not +acceptable and hence the motivation is to perform the accumulation in some +higher data type. + +The RFC introduces the following proposal, emerged out of discussion in the [thread](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) -along with their tradeoffs. The proposal option #1 looks promising at this -point, but we are open to further discussion on this. - -## Option 1: Introduce additional conversion functions - -[The thread](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) -discuses an option, proposed by @loganchien, on how to achieve the structural -changes as mentioned above. We note that some of the examples/diagrams presented -here are borrowed from an internal doc @loganchien authored. - -The proposed options introduces on-the-fly type conversions, which (1) convert -the input type to the type of the `body` function argument and (2) convert the -result type of the `body` function to the output type. Following is the code -snippet with the proposed syntax of reduce op: +, along with their tradeoffs. + +The proposal allows the reducer block to express the computation in a different +element type (preferably higher accumulation type) than the one used in reduce +op's ops arguments and return type. For illustrative purposes, in the following +example, the operand element type `tensor>` is different from the element type for + reduction region's block arguments. Similarly, the element type of the + reduce op's result `!quant.uniform>` is + different from that of block return (`tensor>`). ```mlir %result = "stablehlo.reduce"(%input, %init_value) ({ - ^input_conversion( - %input: tensor>): - %input_rescaled = "stablehlo.uniform_quantize"(%input) - : (tensor>) - -> tensor> - "stablehlo.return"(%input_rescaled) - : (tensor>) -> () - - }, { ^reduce_computation( %lhs: tensor>, %rhs: tensor>): @@ -71,310 +62,63 @@ snippet with the proposed syntax of reduce op: -> tensor> "stablehlo.return"(%add) : (tensor>) -> () - }, { - ^output_conversion( - %intermediate_result: tensor>): - %output_rescaled = "stablehlo.uniform_quantize"(%intermediate_result) - : (tensor>) - -> tensor> - "stablehlo.return"(%output_rescaled) - : (tensor>) -> () }) { - dimensions = dense<...> : tensor<1xi64> - } : (tensor<... x !quant.uniform>, - tensor<... x !quant.uniform>) - -> tensor<... x !quant.uniform> + dimensions = dense<1> : tensor + } : (tensor<5 x 1 x !quant.uniform>, + tensor>) + -> tensor<5 x !quant.uniform> ``` ### Semantics -Here we will informally propose the semantics of the additional functions -`input_conversion` and `output_conversion` introduced. - -```python -+----------+ +--------+ +--------+ +----------+ +--------+ +--------+ -|init_value| |input[0]| |input[1]| |init_value| |input[2]| |input[3]| -+----------+ +--------+ +--------+ +----------+ +--------+ +--------+ - | | | | | | -+----------+ +--------+ +--------+ +----------+ +--------+ +--------+ -|input | |input | |input | |input | |input | |input | -|convert | |convert | |convert | |convert | |convert | |convert | -+----------+ +--------+ +--------+ +----------+ +--------+ +--------+ - \ / / \ / / - +-------+ / +-------+ / - |compute| / |compute| / - +-------+ / +-------+ / - \ / \ / - +-------+ +-------+ - |compute| |compute| - +-------+ +-------+ - \___________ ___________/ - \ / - +-------+ - |compute| - +-------+ - | - +-------+ - |output | - |convert| - +-------+ -``` - -### Semantics of `input_conversion` block - -The `input_conversion` block is applied selectively to the leaf nodes of a -schedule tree as shown in above diagram. Note that the `input_conversion` cannot -be applied to the non-leaf nodes of the schedule tree. - -### Semantics of `output_conversion` block - -The `output_conversion` block is applied just after the `result` for a particular -index is computed as shown in the above diagram. - -Please refer to the [formal spec](#revised-specification-of-reduce-op) of the proposed -reduce op. - -### Implementation details - -From the implementation POV of the proposed spec, we note that -`input_conversion` or `output_conversion` can very well be optional with -default values as identity functions. For example, the following code snippet - -```mlir -%result = "stablehlo.reduce"(%input, %init_value) ({ - ^reduce_computation( - %lhs: tensor>, - %rhs: tensor>): - %add = "stablehlo.add"(%lhs, %rhs) - : (tensor>, - tensor>) - -> tensor> - "stablehlo.return"(%add) - : (tensor>) -> () - }) { - dimensions = dense<...> : tensor<1xi64> - } : (tensor<... x !quant.uniform>, - tensor<... x !quant.uniform>) - -> tensor<... x !quant.uniform> -``` - -should be interpreted as - -```mlir -%result = "stablehlo.reduce"(%input, %init_value) ({ - ^input_conversion( - %input: tensor>): - "stablehlo.return"(%input) - : (tensor>) -> () - - }, { - ^reduce_computation( - %lhs: tensor>, - %rhs: tensor>): - %add = "stablehlo.add"(%lhs, %rhs) - : (tensor>, - tensor>) - -> tensor> - "stablehlo.return"(%add) - : (tensor>) -> () - }, { - ^output_conversion( - %intermediate_result: tensor>): - "stablehlo.return"(%intermediate_result) - : (tensor>) -> () - }) { - dimensions = dense<...> : tensor<1xi64> - } : (tensor<... x !quant.uniform>, - tensor<... x !quant.uniform>) - -> tensor<... x !quant.uniform> -``` - -Note that with default values, the input/result type of `reduce` op matches -with the argument or the result type of the `reduce_computation`, including the -quantization parameters. - -Also, note that the relative order of `input_conversion` or `output_conversion` -w.r.t the `reduce_computation` can be used to identify the appropriate -conversion function when any one of `input_conversion` or `output_conversion` is -missing. - -The existing pretty printing is currently producing the following output -`stablehlo.reduce(%input init: %init_value) applies stablehlo.add across -dimensions = [1] : (tensor<1x6xi64>, tensor) -> tensor<1xi64>`. IMO, -modifying the above format, with the default conversion function, will create -clutter. My proposal here is to follow the existing pretty printing when the -conversion functions are "not provided". In the event, the conversion functions -are explicitly provided, then the pretty printers will fall back to default -generic printing, -**even if the explicitly provided conversion functions are identity function**: -To avoid identification of identity functions which could be tricky in general. - -### Tradeoffs - -* (+) Enables programmers to program at (almost) baremetal. If the hardware - can support reduction computation in wider type (e.g. in the SIMD - instruction set, we typically do widening/compute/narrowing within the - kernel to save the memory bandwidth), the programmer can explicitly request - for that. -* (-) The disadvantage of this representation is that the syntax is more - verbose and requires significant changes to the specification. - -## Option 2: re-scale input to accumulation type - -This option is the simplest from the POV for specification of quantized `reduce` -op. This is adding `stablehlo.uniform_quantize`ops before and after reduce op -which operates on the "accumulator" type. - -```mlir -%widen = "stablehlo.uniform_quantize"(%input) - : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> - -%reduce = "stablehlo.reduce"(%widen) { - ^reduce_computation(%lhs: !quant.uniform, %rhs: !qunat.uniform): - // reduce_computation_block - } - : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> - -%narrowed = "stablehlo.uniform_quantize"(%reduce) - : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> -``` - -### Tradeoffs - -* (+) An advantage of this option is that we only need minor changes to the - specification (i.e. to allow quantized types). -* (-) The compiler must pattern match 3 operations and map them into some - internal representation before their compilation or execution. -* (-) The compiler must ensure that the `stablehlo.uniform_quantize` (or - `stablehlo.convert` in the case of `bf16` or `f16`) is not folded before the - backend matches the pattern. - [for more information](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) - -## Option 3: allow accumulator type to be different from input type - -This is another option we considered which does not fly well because of limited -expressibility. Adding it just for completeness purposes. -The idea here is to convey the accumulator type using the `init_value` operand -of `reduce` op. The code snippet for `reduce` looks like: - -```mlir -%result = "stablehlo.reduce"(%input, %init_value) ({ - ^reduce_computation( - %elem: tensor>, - %acc: tensor>): - %elem_rescaled = "stablehlo.uniform_quantize"(%elem) - : (tensor>) - -> tensor> - %add = "stablehlo.add"(%elem_rescaled, %acc) - : (tensor>, - tensor>) - -> tensor> - "stablehlo.return"(%0) - : (tensor>) -> () - }) { - dimensions = dense<1> : tensor<1xi64> - } : (tensor<... x !quant.uniform>, - tensor<... x !quant.uniform>) - -> tensor<... x !quant.uniform> -``` - -In this option, the `init_value` type and the `result` type can be different -from the input type. The first argument of the compute block is fixed for the -traversed element and the second argument is fixed for the intermediate -(accumulation) result. - -### Tradeoffs - -* (+) Make the accumulation type explicit in the IR. -* (-) This representation imposes a limitation on the evaluation order. - Since we can’t express the computation between two intermediate (accumulation) - results, we can not arbitrarily insert `init_value` and start the - computation at an arbitrary location. The following shows the restricted - evaluation order with the method. - -```python -+----------+ +--------+ +--------+ +--------+ +--------+ -|init_value| |input[0]| |input[1]| |input[2]| |input[3]| -+----------+ +--------+ +--------+ +--------+ +--------+ - \ / / / / - +-------+ / / / - |compute| / / / - +-------+ / / / - \ / / / - +-------+ / / - |compute| / / - +-------+ / / - \ / / - +-------+ / - |compute| / - +-------+ / - \ / - +-------+ - |compute| - +-------+ -``` - -## Open Question - -### Should we restrict the proposal #1 to quantized types only? - -The above proposal #1 of introducing the additional functions is theoretically -not limited to quantized `reduce` op, but also can be applied to `reduce` op with -non-quantized types. For example, - -```mlir -%result = "stablehlo.reduce"(%input, %init_value) ({ - ^input_conversion(%arg0: tensor): - %0 = "stablehlo.convert"(%arg0): (tensor) -> (tensor) - "stablehlo.return"(%0) : (tensor) -> (tensor) - }, { - ^bb0(%arg0: tensor, %arg1: tensor): - %0 = "stablehlo.add"(%arg0, %arg1) : (tensor, tensor) -> - tensor - "stablehlo.return"(%0) : (tensor) -> () - }, { - ^output_conversion(%arg0: tensor): - %0 = "stablehlo.convert"(%arg0): (tensor) -> (tensor) - "stablehlo.return"(%0) : (tensor) -> (tensor) - }) { - dimensions = dense<1> : tensor<1xbf16> -} : (tensor<1x6xbf16>, tensor) -> tensor<1xbf16> -``` - -However, it is not clear how such operations will be lowered to other IR -representations, like HLO, which does not support such additional computation -blocks. IMO there is no additional benefit to support such conversion -functions for regular type given that there already exists infrastructure -(backend support, lowering passes) to support regular types w/o conversion -functions. My proposal here would be to restrict the support to only quantized -types. +Depending on (1) the input operand type is different from the reduction block +argument type or (2) the op result type is different from the reduction block +return type, there will be implicit type conversion defined by either +`stablehlo.convert`, `stablehlo.uniform_quantize`, or +`stablehlo.uniform_dequantize`. For example, + + | Implicit type conversion op | element type of operand or result type | element type of block argument or block return type | + |-----------------------------------|----------------------------------------|-----------------------------------------------------| + | (A) `stablehlo.uniorm_quantize` | quantized tensor | quantized tensor | + | (B) `stablehlo.uniorm_quantize` | floating point | quantized tensor | + | (C) `stablehlo.uniorm_dequantize` | quantized tensor | floating point | + | (D) `stablehlo.convert` | floating-point | integer | + | (E) `stablehlo.convert` | integer | floating-point | + | (F) `stablehlo.convert` | floating-point | floating-point | + | (G) `stablehlo.convert` | integer | integer | + | (G) `stablehlo.convert` | complex | complex | + +At this point there is no use for cases other than (A), (F), and (G). My +proposal here would be to address (A), (F), and (G) only. Note that the (F) + partially addresses [Decide on mixed + precision](https://github.com/openxla/stablehlo/issues/369) for reduce op in + that it allows the the input or init value to differ from the corresponding + block arguments w.r.t the precision of floating-point types. However, the + mixed precision implementation in HLO seems more detailed in the sense that + even allows `inputs` and `init_values` to differ in floating-point + precision. My proposal would be to treat the above ticket separately. ## Appendix -To provide an estimate of specification changes needed to implement option #1 -I have attempted to provide the blueprint here. +To provide an estimate of specification changes needed to implement the +proposal, I have attempted to provide the blueprint here. ### Revised specification of reduce op -#### Semantics +Here we include only the relevant portions of the spec with the proposed update. -Applies a reduction functions `input_conversion`, `body`, and -`output_conversion` to `inputs` and `init_values` along the `dimensions` and -produces `results` tensors. +#### Semantics -The order of reductions is implementation-defined, which means that `body` and -`init_values` must form a monoid to guarantee that the operation produces the -same results for all inputs on all implementations. However, this condition -doesn't hold for many popular reductions. E.g. floating-point addition for -`body` and zero for `init_values` don't actually form a monoid because -floating-point addition is not associative. +... More formally, `results...[j0, ..., jR-1] = -map(output_conversion, reduce(input_slices_converted))` where: +reduce_implicit_convert(reduce(input_slices_converted), + type(func_outputs(body)...), type(results...)))` where: * `input_slices = inputs...[j0, ..., :, ..., jR-1]`, where `:` are inserted at `dimensions`. -* `input_slices_converted = map(input_conversion, input_slices...)`. +* `input_slices_converted = reduce_implicit_convert(input_slices..., + type(inputs...), type(func_inputs(body)...)`. * `reduce(input_slices_converted) = exec(schedule)` for some binary tree `schedule` where: * `exec(node) = body(exec(node.left), exec(node.right))`. @@ -384,89 +128,52 @@ map(output_conversion, reduce(input_slices_converted))` where: * `input_slices_converted...[index]` values, for all `index` in `index_space(input_slices_converted)` in the ascending lexicographic order of `index`. - * Interspersed with an implementation-defined amount of `init_values` + * Interspersed with an implementation-defined amount of + `reduce_implicit_convert(init_values..., type(init_values...), type(func_inputs(body)[:len(func_inputs(body)//2)])...)` at implementation-defined positions. -#### Inputs - -| Label | Name | Type | Constraints | -|-------|---------------------|----------------------------------------------|-------------| -| (I?) | `inputs` | variadic number of tensors | | -| (I?) | `init_values` | variadic number of 0-dimensional tensors | | -| (I?) | `dimensions` | 1-dimensional tensor constant of type `si64` | | -| (I?) | `input_conversion` | function | | -| (I?) | `body` | function | | -| (I?) | `output_conversion` | function | | - -#### Outputs - -| Name | Type | Constraints | -|-----------|----------------------------|-------------| -| `results` | variadic number of tensors | | - #### Constraints * (C?) `same(shape(inputs...))`. * (C?) `element_type(inputs...) = element_type(init_values...)`. * (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. -* (C?) `0 < size(inputs) = size(init_values) = size(results) = N`. -* (C?) `0 <= dimensions < rank(inputs[0])`. -* (C?) `is_unique(dimensions)`. -* (C?) `input_conversion` has type `tensor, ..., tensor -> - (tensor, ..., tensor)` where `Ei = element_type(inputs[i])`. * (C?) `body` has type `tensor, ..., tensor, tensor, ...,` `tensor) -> (tensor, ..., tensor)` where - `Ei = element_type(output_types(input_conversion)[i])`. -* (C?) `output_conversion` has type `tensor, ..., tensor -> - (tensor, ..., tensor)` where - `E'i = element_type(results[i])`. -* (C?) `element_type(output_types(input_conversion)...) = - element_type(input_types(output_conversion)...)`. + `is_integer(element_type(inputs[i])) = is_integer(element_type(Ei]` or + `is_float(element_type(inputs[i])) = is_float(element_type(Ei]` or + `is_complex(element_type(inputs[i])) = is_complex(element_type(Ei]` or + `is_quantized(element_type(inputs[i])) = is_quantized(element_type(Ei]`. * (C?) `shape(results...) = shape(inputs...)` except that the dimension sizes of `inputs...` corresponding to `dimensions` are not included. +`reduce_implicit_convert` is defined as + +```python +def reduce_implicit_convert(x: Value, source_type: Type, destination_type: + Type): + if source_type == destination_type: + return x + if is_quantized(source_type) and is_quantized(destination_type): + return quantize(x, destination_type) + return convert(x, destination_type) +``` + The above specification of `reduce` op can be used to define the specification -of other ops as shown below. For brevity, we are only presenting the relevant +of other ops as shown below. As before, we are only presenting the relevant portions of the spec which needs modification. ### Revised specification of reduce_window op -#### Semantics - -Applies a reduction functions `input_conversion`, `body`, and -`output_conversion` to windows of `inputs` and `init_values` and produces -`results`. - -... - -More formally, -`results...[result_index] = reduce(windows, init_values, axes(inputs...), - input_conversion, body, output_conversion)` -where: -.... - -#### Inputs - -| Label | Name | Type | -|-------|---------------------|----------| -| (I?) | `input_conversion` | function | -| (I8) | `body` | function | -| (I?) | `output_conversion` | function | - #### Constraints * (C?) `element_type(inputs...) = element_type(init_values...)`. * (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. -* (C?) `input_conversion` has type `tensor, ..., tensor -> - (tensor, ..., tensor)` where `Ei = element_type(inputs[i])`. * (C?) `body` has type `tensor, ..., tensor, tensor, ...,` `tensor) -> (tensor, ..., tensor)` where - `Ei = element_type(output_types(input_conversion)[i])`. -* (C?) `output_conversion` has type `tensor, ..., tensor -> - (tensor, ..., tensor)` where - `E'i = element_type(results[i])`. -* (C?) `element_type(output_types(input_conversion)...) = - element_type(input_types(output_conversion)...)`. + `is_integer(element_type(inputs[i])) = is_integer(element_type(Ei]` or + `is_float(element_type(inputs[i])) = is_float(element_type(Ei]` or + `is_complex(element_type(inputs[i])) = is_complex(element_type(Ei]` or + `is_quantized(element_type(inputs[i])) = is_quantized(element_type(Ei]`. ### Revised specification of select_and_scatter op @@ -476,74 +183,171 @@ not need additional conversion functions associated with `select`. But the `scatter` function needs be accompanied with `input_conversion` and `output_conversion` functions. -#### Semantics - -Scatters the values from the `source` tensor using `scatter` based on the -outcome of `reduce_window` of the `input` tensor using `select` and produces -a `result` tensor. - -More formally: -... - -* `result[result_index] = reduce([source_values], [init_value], [0], - input_conversion, scatter, output_conversion)` - where: - ... - -#### Inputs - -| Label | Name | Type | -|-------|---------------------|----------| -| (I8) | `input_conversion` | function | -| (I8) | `scatter` | function | -| (I8) | `output_conversion` | function | - #### Constraints * (C1) `element_type(operand) = element_type(source)`. * (C3) `element_type(init_value) = element_type(operand)`. * (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. -* (C?) `input_conversion` has type `tensor -> (tensor)` where - `Ei = element_type(operand)`. * (C10) `scatter` has type `(tensor, tensor) -> tensor` where - `E = element_type(output_types(input_conversion))`. -* (C?) `output_conversion` has type `tensor -> (tensor)` where - `E'i = element_type(result)`. -* (C?) `element_type(output_types(input_conversion)) = - element_type(input_types(output_conversion))`. -* (C11) `shape(operand) = shape(result)`. + `is_integer(element_type(operand)) = is_integer(element_type(E]` or + `is_float(element_type(operand)) = is_float(element_type(E]` or + `is_complex(element_type(operand)) = is_complex(element_type(E]` or + `is_quantized(element_type(operand)) = is_quantized(element_type(E]`. -## [11 Aug'23] Revised proposal +### Action Plan + +I propose to follow the action plan (order matters): + +* Update the specification of ReduceOp, ReduceWindowOp, and SelectAndScatterOp + op, taking the accumulation type into account, via [open + pr](https://github.com/openxla/stablehlo/pull/1538). +* Finalize the quantized specification of AllReduceOp, BatchNormTrainingOp, + BatchNormGradOp and ReduceScatterOp, whose semantics depend on ReduceOp, + via [open ticket](https://github.com/openxla/stablehlo/issues/1666). +* Spec the behavior of `precision_config` in DotGeneralOp. [open +issue](https://github.com/openxla/stablehlo/issues/755) +* Consider adding `precision_config` in reduction op. `precision_config`, +currently used for `dot_general` and `convolution`, to override the precision +specified by the input parameters, allowing the choice of low precision vs high +precision computation. We should consider adding `precision_config` to all +reduction based op as well. [need a ticket for this] +* Consider adding `accumulation_type` to `dot_general`/`convolution op`. The +attribute seems beneficial for ops like `dot_general` and `convolution` which +does not have an explicit reduction function. [need a ticket for this item]. + +## Summary of previous proposals + +For completeness of the presentation, let me provide the proposals which are +evaluated previously and help shape the current proposal. + +### Re-scale input to accumulation type + +This option is the simplest from the POV for specification of quantized `reduce` +op. This is adding `stablehlo.uniform_quantize`ops before and after reduce op +which operates on the "accumulator" type. + +```mlir +%widen = "stablehlo.uniform_quantize"(%input) + : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> + +%reduce = "stablehlo.reduce"(%widen) { + ^reduce_computation(%lhs: !quant.uniform, %rhs: !qunat.uniform): + // reduce_computation_block + } + : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> + +%narrowed = "stablehlo.uniform_quantize"(%reduce) + : (tensor<... x !quant.uniform>) -> tensor<... x !quant.uniform> +``` + +#### Tradeoffs + +* (+) An advantage of this option is that we only need minor changes to the + specification (i.e. to allow quantized types). +* (-) The compiler must pattern match 3 operations and map them into some + internal representation before their compilation or execution. +* (-) The compiler must ensure that the `stablehlo.uniform_quantize` (or + `stablehlo.convert` in the case of `bf16` or `f16`) is not folded before the + backend matches the pattern. + [for more information](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) + +This proposal should be avoided because it is hard to control the transformation +which might disrupt the pattern to be matched. + +### Introduce on-the-fly type conversions + +Proposes addition two regions in reduce op to (1) convert the input type to the +type of the `body` function argument and (2) convert the result type of the +`body` function to the output type. Following is the code snippet with the +proposed syntax of reduce op: + +```mlir +%result = "stablehlo.reduce"(%input, %init_value) ({ + ^input_conversion( + %input: tensor>): + %input_rescaled = "stablehlo.uniform_quantize"(%input) + : (tensor>) + -> tensor> + "stablehlo.return"(%input_rescaled) + : (tensor>) -> () -### Context + }, { + ^reduce_computation( + %lhs: tensor>, + %rhs: tensor>): + %add = "stablehlo.add"(%lhs, %rhs) + : (tensor>, + tensor>) + -> tensor> + "stablehlo.return"(%add) + : (tensor>) -> () + }, { + ^output_conversion( + %intermediate_result: tensor>): + %output_rescaled = "stablehlo.uniform_quantize"(%intermediate_result) + : (tensor>) + -> tensor> + "stablehlo.return"(%output_rescaled) + : (tensor>) -> () + }) { + dimensions = dense<...> : tensor<1xi64> + } : (tensor<... x !quant.uniform>, + tensor<... x !quant.uniform>) + -> tensor<... x !quant.uniform> +``` -Option #2 should be avoided because it is hard to control the transformation -which might disrupt the pattern to be matched. The option #1 sounds good except -that the extra input/output conversion blocks are surplus information. The -specification would benefit if the intent of the conversion blocks can be -expressed precisely. The conversion blocks provides a way to capture the -accumulation type needed to compute the accumulative operation on. +Here we will informally propose the semantics of the additional functions +`input_conversion` and `output_conversion` introduced. -The revised proposal is: +```python ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +|init_value| |input[0]| |input[1]| |init_value| |input[2]| |input[3]| ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ + | | | | | | ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ +|input | |input | |input | |input | |input | |input | +|convert | |convert | |convert | |convert | |convert | |convert | ++----------+ +--------+ +--------+ +----------+ +--------+ +--------+ + \ / / \ / / + +-------+ / +-------+ / + |compute| / |compute| / + +-------+ / +-------+ / + \ / \ / + +-------+ +-------+ + |compute| |compute| + +-------+ +-------+ + \___________ ___________/ + \ / + +-------+ + |compute| + +-------+ + | + +-------+ + |output | + |convert| + +-------+ +``` -* To capture the accumulation type via an additional StableHLO attribute like - `accumulation_element_type`. -* The attribute seems beneficial for other ops as well like `dot_general` and - `convolution`. -* `precision_config`, currently used for `dot_general` and `convolution`, is - used to override the precision specified by the input parameters, allowing the - choice of low precision vs high precision computation. We should consider - adding `precision_config` to all reduction based op as well. +### Tradeoffs -### Few implementation details +* (+) Enables programmers to program at (almost) baremetal. If the hardware + can support reduction computation in wider type (e.g. in the SIMD + instruction set, we typically do widening/compute/narrowing within the + kernel to save the memory bandwidth), the programmer can explicitly request + for that. +* (-) The disadvantage of this representation is that the syntax is more + verbose and requires significant changes to the specification. +* (-) The extra input/output conversion blocks are surplus information. The +intent of conversion blocks is to capture the accumulation type needed to +compute the accumulative operation on. The specification would benefit if the +intent can be expressed succinctly. -#### On StableHLO side +### Introduce accumulation type attribute -The reduce syntax to be augmented with a optional [type -attribute](https://github.com/llvm/llvm-project/blob/51a57074bc63842970c4c160b05c1a7e42db7523/mlir/include/mlir/IR/OpBase.td#L1466) -as follows: +Instead of using additional input and output conversion blocks, use a type +attribute `accumulation type` to capture the accumulation type. As an example, ```mlir %0 = stablehlo.reduce(%arg0 init: %arg1) across dimensions = [0] { @@ -558,48 +362,11 @@ as follows: // OptionalAttr>:$accumulation_type ``` -Note that the main difference between this option and the option #1 is that the -input and output conversion blocks are no longer used as their intent is -specified via the `accumulation_type` attribute. However, the reducer block -still needs to express the computation in accumulation type only. - -**Why optional attribute?** - -* At times, it might be desirable not to hard-code the accumulation type. For - example, when we would like to write a generic code and let the downstream - compilation tools to decide the exact accumulation type based on the hardware - of choice. -* It allows the stablehlo, used in various existing pipelines, to remain - largely unaffected by this change. - -Next, the StableHLO specification should be updated with the syntax and -semantics aspects of this attribute. - -#### On StableHLO Consumers side - -The consumers can pattern match the op taking the accumulation type in account -if the targeted hardware supports accumulation at higher type. -There are still to explore things about maintaining StableHLO-HLO parity which -needs to be addresses as well. - -### Action Plan - -I propose to follow the action plan (order matters): +Note that the main difference between this option and the previous option is +that the input and output conversion blocks are no longer used and their intent +is specified via the `accumulation_type` attribute. However, the reducer block +needs to express the computation in accumulation type only. -* Update the specification of ReduceOp, ReduceWindowOp, and SelectAndScatterOp - op, taking the accumulation type into account, via [open - pr](https://github.com/openxla/stablehlo/pull/1538). -* Finalize the quantized specification of AllReduceOp, BatchNormTrainingOp, - BatchNormGradOp and ReduceScatterOp, whose semantics depend on ReduceOp, - via [open ticket](https://github.com/openxla/stablehlo/issues/1666). -* Add implementation for additional attribute in the above ops. This includes -updating the tablegen spec/verifiers/type inferencers. [Need a ticket for this]. -* Address the disparity between StableHLO and HLO because of the introduction of -this new attribute in StableHLO: Should/How XLA should consume this additional -attribute? [Need a ticket for this]. -* Spec the behavior of `precision_config` in DotGeneralOp. [open -issue](https://github.com/openxla/stablehlo/issues/755) -* Consider adding `precision_config` in reduction op. [need a ticket for this -* Consider adding `accumulation_type` to `dot_general`/`convolution op`. -[need a ticket for this item]. -item]. +This options is discarded because, for reduce op, the additional attribute seems +redundant and can be inferred based on the differences in element type of +operand and reduction block arguments (as described in the current proposal). From 538cb8f5bc0a683da203e0c560b14303ebbb6ce6 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Tue, 29 Aug 2023 20:46:53 +0000 Subject: [PATCH 08/11] review iteration: various typo fixes --- rfcs/20230622-quantized-reduction.md | 96 ++++++++++++++-------------- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 248d97c097a..3ec7a8a26af 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -42,14 +42,14 @@ The RFC introduces the following proposal, emerged out of discussion in the , along with their tradeoffs. The proposal allows the reducer block to express the computation in a different -element type (preferably higher accumulation type) than the one used in reduce +element type (preferably wider accumulation type) than the one used in reduce op's ops arguments and return type. For illustrative purposes, in the following -example, the operand element type `tensor>` is different from the element type for - reduction region's block arguments. Similarly, the element type of the - reduce op's result `!quant.uniform>` is - different from that of block return (`tensor>`). +example, the operand element type +`tensor>` is different from the +element type for reduction region's block arguments. Similarly, the element +type of the reduce op's result +`!quant.uniform>` is different from that of +block return (`tensor>`). ```mlir %result = "stablehlo.reduce"(%input, %init_value) ({ @@ -71,32 +71,32 @@ example, the operand element type `tensor, ..., tensor, tensor, ...,` `tensor) -> (tensor, ..., tensor)` where - `is_integer(element_type(inputs[i])) = is_integer(element_type(Ei]` or - `is_float(element_type(inputs[i])) = is_float(element_type(Ei]` or - `is_complex(element_type(inputs[i])) = is_complex(element_type(Ei]` or - `is_quantized(element_type(inputs[i])) = is_quantized(element_type(Ei]`. + `is_integer(element_type(inputs[i])) = is_integer(element_type(E[i]))` or + `is_float(element_type(inputs[i])) = is_float(element_type(E[i]))` or + `is_complex(element_type(inputs[i])) = is_complex(element_type(E[i]))` or + `is_quantized(element_type(inputs[i])) = is_quantized(element_type(E[i]))`. * (C?) `shape(results...) = shape(inputs...)` except that the dimension sizes of `inputs...` corresponding to `dimensions` are not included. @@ -170,10 +170,10 @@ portions of the spec which needs modification. * (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. * (C?) `body` has type `tensor, ..., tensor, tensor, ...,` `tensor) -> (tensor, ..., tensor)` where - `is_integer(element_type(inputs[i])) = is_integer(element_type(Ei]` or - `is_float(element_type(inputs[i])) = is_float(element_type(Ei]` or - `is_complex(element_type(inputs[i])) = is_complex(element_type(Ei]` or - `is_quantized(element_type(inputs[i])) = is_quantized(element_type(Ei]`. + `is_integer(element_type(inputs[i])) = is_integer(element_type(E[i]))` or + `is_float(element_type(inputs[i])) = is_float(element_type(E[i]))` or + `is_complex(element_type(inputs[i])) = is_complex(element_type(E[i]))` or + `is_quantized(element_type(inputs[i])) = is_quantized(element_type(E[i]))`. ### Revised specification of select_and_scatter op @@ -190,10 +190,10 @@ not need additional conversion functions associated with `select`. But the * (C3) `element_type(init_value) = element_type(operand)`. * (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. * (C10) `scatter` has type `(tensor, tensor) -> tensor` where - `is_integer(element_type(operand)) = is_integer(element_type(E]` or - `is_float(element_type(operand)) = is_float(element_type(E]` or - `is_complex(element_type(operand)) = is_complex(element_type(E]` or - `is_quantized(element_type(operand)) = is_quantized(element_type(E]`. + `is_integer(element_type(operand)) = is_integer(element_type(E))` or + `is_float(element_type(operand)) = is_float(element_type(E))` or + `is_complex(element_type(operand)) = is_complex(element_type(E))` or + `is_quantized(element_type(operand)) = is_quantized(element_type(E))`. ### Action Plan @@ -204,18 +204,18 @@ I propose to follow the action plan (order matters): op, taking the accumulation type into account, via [open pr](https://github.com/openxla/stablehlo/pull/1538). * Finalize the quantized specification of AllReduceOp, BatchNormTrainingOp, - BatchNormGradOp and ReduceScatterOp, whose semantics depend on ReduceOp, - via [open ticket](https://github.com/openxla/stablehlo/issues/1666). + BatchNormGradOp and ReduceScatterOp, whose semantics depend on ReduceOp, + via [open ticket](https://github.com/openxla/stablehlo/issues/1666). * Spec the behavior of `precision_config` in DotGeneralOp. [open issue](https://github.com/openxla/stablehlo/issues/755) * Consider adding `precision_config` in reduction op. `precision_config`, -currently used for `dot_general` and `convolution`, to override the precision -specified by the input parameters, allowing the choice of low precision vs high -precision computation. We should consider adding `precision_config` to all -reduction based op as well. [need a ticket for this] + currently used for `dot_general` and `convolution`, to override the precision + specified by the input parameters, allowing the choice of low precision vs + high precision computation. We should consider adding `precision_config` to + all reduction based op as well. [need a ticket for this] * Consider adding `accumulation_type` to `dot_general`/`convolution op`. The -attribute seems beneficial for ops like `dot_general` and `convolution` which -does not have an explicit reduction function. [need a ticket for this item]. + attribute seems beneficial for ops like `dot_general` and `convolution` which + does not have an explicit reduction function. [need a ticket for this item]. ## Summary of previous proposals @@ -340,9 +340,9 @@ Here we will informally propose the semantics of the additional functions * (-) The disadvantage of this representation is that the syntax is more verbose and requires significant changes to the specification. * (-) The extra input/output conversion blocks are surplus information. The -intent of conversion blocks is to capture the accumulation type needed to -compute the accumulative operation on. The specification would benefit if the -intent can be expressed succinctly. + intent of conversion blocks is to capture the accumulation type needed to + compute the accumulative operation on. The specification would benefit if the + intent can be expressed succinctly. ### Introduce accumulation type attribute From 1a0eabd8c30283934c766c6e93d93901a976b95d Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Mon, 11 Sep 2023 18:25:37 +0000 Subject: [PATCH 09/11] elaborate on the why #369 is not entirely part of the current proposal --- rfcs/20230622-quantized-reduction.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 3ec7a8a26af..e0418cddb52 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -95,8 +95,18 @@ partially addresses for reduce op in that it allows the input or init value to differ from the corresponding block arguments w.r.t the precision of floating-point types. However, the mixed precision implementation in HLO seems more detailed in the -sense that even allows `inputs` and `init_values` to differ in floating-point -precision. My proposal would be to treat the above ticket separately. +following sense: + +* [Decide on mixed precision](https://github.com/openxla/stablehlo/issues/369) +allows `inputs` and `init_values` to differ in floating-point precision. +Whereas, the current proposal considers them to have the same element type. +* [Decide on mixed precision](https://github.com/openxla/stablehlo/issues/369) +allows the element type of block arguments to differ from that of the block +return value. The current proposal considers them to have the same element type. +* There are other ops (than reduce) which need support for mixed precision (here +is the [list of ops](https://github.com/tensorflow/tensorflow/blob/1d69ba72834b963b72075a82c10959f6bb74e473/tensorflow/compiler/xla/service/hlo_verifier.cc#L1681-L1714)). + +Having said that, my proposal would be to treat the above ticket separately. ## Appendix From b5595da2b4aa9fcdb35acd5d1c991f6764fa598e Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Wed, 4 Oct 2023 22:27:24 +0000 Subject: [PATCH 10/11] make it clear in the rfc that the poposed semantics is optional --- rfcs/20230622-quantized-reduction.md | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index e0418cddb52..739b80eec57 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -41,15 +41,14 @@ The RFC introduces the following proposal, emerged out of discussion in the [thread](https://github.com/openxla/stablehlo/pull/1538#issuecomment-1599476906) , along with their tradeoffs. -The proposal allows the reducer block to express the computation in a different -element type (preferably wider accumulation type) than the one used in reduce -op's ops arguments and return type. For illustrative purposes, in the following -example, the operand element type -`tensor>` is different from the -element type for reduction region's block arguments. Similarly, the element -type of the reduce op's result -`!quant.uniform>` is different from that of -block return (`tensor>`). +The proposal optionally allows the reducer block to express the computation in a +different element type (preferably wider accumulation type) than the one used in +reduce op's ops arguments and return type. For illustrative purposes, in the +following example, the operand element type `tensor>` is different from the element type for reduction +region's block arguments. Similarly, the element type of the reduce op's +result `!quant.uniform>` is different from +that of block return (`tensor>`). ```mlir %result = "stablehlo.reduce"(%input, %init_value) ({ @@ -71,11 +70,14 @@ block return (`tensor>`). ### Semantics -If (1) the input operand type is different from the reduction block -argument type or (2) the op result type is different from the reduction block -return type, there will be implicit type conversion defined by either +If (1) the input operand type is different from the reduction block argument +type or (2) the op result type is different from the reduction block return +type, there will be implicit type conversion defined by either `stablehlo.convert`, `stablehlo.uniform_quantize`, or -`stablehlo.uniform_dequantize`. For example, +`stablehlo.uniform_dequantize`. When the types are not differnet, i.e., when (1) +and (2) does not hold true, then no implicit convertion is needed. + +For example, | Implicit type conversion op | element type of operand or block return | element type of block argument or op return | |-----------------------------------|-----------------------------------------|---------------------------------------------| From 4cb20669bacdd7634d16b622a6cc132e475c4079 Mon Sep 17 00:00:00 2001 From: Sandeep Dasgupta Date: Mon, 9 Oct 2023 16:04:41 +0000 Subject: [PATCH 11/11] typo fixes --- rfcs/20230622-quantized-reduction.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/20230622-quantized-reduction.md b/rfcs/20230622-quantized-reduction.md index 739b80eec57..678390ec4db 100644 --- a/rfcs/20230622-quantized-reduction.md +++ b/rfcs/20230622-quantized-reduction.md @@ -72,7 +72,7 @@ that of block return (`tensor>`). If (1) the input operand type is different from the reduction block argument type or (2) the op result type is different from the reduction block return -type, there will be implicit type conversion defined by either +type, there will be an implicit type conversion defined by either `stablehlo.convert`, `stablehlo.uniform_quantize`, or `stablehlo.uniform_dequantize`. When the types are not differnet, i.e., when (1) and (2) does not hold true, then no implicit convertion is needed. @@ -180,7 +180,7 @@ portions of the spec which needs modification. * (C?) `element_type(inputs...) = element_type(init_values...)`. * (C?) `baseline_element_type(inputs...) = baseline_element_type(results...)`. -* (C?) `body` has type `tensor, ..., tensor, tensor, ...,` +* (C?) `body` has type `(tensor, ..., tensor, tensor, ...,` `tensor) -> (tensor, ..., tensor)` where `is_integer(element_type(inputs[i])) = is_integer(element_type(E[i]))` or `is_float(element_type(inputs[i])) = is_float(element_type(E[i]))` or