From 02654f7370638889b989b4d776d35c3d47c87cdd Mon Sep 17 00:00:00 2001 From: Chris B Date: Fri, 30 Aug 2024 16:18:46 -0500 Subject: [PATCH] [HLSL][Doc] Document multi-argument resolution (#104474) This updates the expected diffferences document to capture the difference in multi-argument overload resolution between Clang and DXC. Fixes #99530 --- clang/docs/HLSL/ExpectedDifferences.rst | 121 +++++++++++++++++++++--- 1 file changed, 109 insertions(+), 12 deletions(-) diff --git a/clang/docs/HLSL/ExpectedDifferences.rst b/clang/docs/HLSL/ExpectedDifferences.rst index 4782eb3cda754a..e143c5b71575aa 100644 --- a/clang/docs/HLSL/ExpectedDifferences.rst +++ b/clang/docs/HLSL/ExpectedDifferences.rst @@ -54,6 +54,19 @@ HLSL 202x based on proposal and `0008 `_. +The largest difference between Clang and DXC's overload resolution is the +algorithm used for identifying best-match overloads. There are more details +about the algorithmic differences in the :ref:`multi_argument_overloads` section +below. There are three high level differences that should be highlighted: + +* **There should be no cases** where DXC and Clang both successfully + resolve an overload where the resolved overload is different between the two. +* There are cases where Clang will successfully resolve an overload that DXC + wouldn't because we've trimmed the overload set in Clang to remove ambiguity. +* There are cases where DXC will successfully resolve an overload that Clang + will not for two reasons: (1) DXC only generates partial overload sets for + builtin functions and (2) DXC resolves cases that probably should be ambiguous. + Clang's implementation extends standard overload resolution rules to HLSL library functionality. This causes subtle changes in overload resolution behavior between Clang and DXC. Some examples include: @@ -71,18 +84,23 @@ behavior between Clang and DXC. Some examples include: uint U; int I; float X, Y, Z; - double3 A, B; + double3 R, G; } - void twoParams(int, int); - void twoParams(float, float); + void takesSingleDouble(double); + void takesSingleDouble(vector); + + void scalarOrVector(double); + void scalarOrVector(vector); export void call() { - halfOrInt16(U); // DXC: Fails with call ambiguous between int16_t and uint16_t overloads - // Clang: Resolves to halfOrInt16(uint16_t). - halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t). half H; + halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t). + #ifndef IGNORE_ERRORS + halfOrInt16(U); // All: Fails with call ambiguous between int16_t and uint16_t + // overloads + // asfloat16 is a builtin with overloads for half, int16_t, and uint16_t. H = asfloat16(I); // DXC: Fails to resolve overload for int. // Clang: Resolves to asfloat16(int16_t). @@ -94,21 +112,28 @@ behavior between Clang and DXC. Some examples include: takesDoubles(X, Y, Z); // Works on all compilers #ifndef IGNORE_ERRORS - fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to double. + fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to + // double. // Clang: Resolves to fma(double,double,double). - #endif - double D = dot(A, B); // DXC: Resolves to dot(double3, double3), fails DXIL Validation. + double D = dot(R, G); // DXC: Resolves to dot(double3, double3), fails DXIL Validation. // FXC: Expands to compute double dot product with fmul/fadd - // Clang: Resolves to dot(float3, float3), emits conversion warnings. + // Clang: Fails to resolve as ambiguous against + // dot(half, half) or dot(float, float) + #endif #ifndef IGNORE_ERRORS tan(B); // DXC: resolves to tan(float). // Clang: Fails to resolve, ambiguous between integer types. - twoParams(I, X); // DXC: resolves twoParams(int, int). - // Clang: Fails to resolve ambiguous conversions. #endif + + double D; + takesSingleDouble(D); // All: Fails to resolve ambiguous conversions. + takesSingleDouble(R); // All: Fails to resolve ambiguous conversions. + + scalarOrVector(D); // All: Resolves to scalarOrVector(double). + scalarOrVector(R); // All: Fails to resolve ambiguous conversions. } .. note:: @@ -119,3 +144,75 @@ behavior between Clang and DXC. Some examples include: diagnostic notifying the user of the conversion rather than silently altering precision relative to the other overloads (as FXC does) or generating code that will fail validation (as DXC does). + +.. _multi_argument_overloads: + +Multi-Argument Overloads +------------------------ + +In addition to the differences in single-element conversions, Clang and DXC +differ dramatically in multi-argument overload resolution. C++ multi-argument +overload resolution behavior (or something very similar) is required to +implement +`non-member operator overloading `_. + +Clang adopts the C++ inspired language from the +`draft HLSL specification `_, +where an overload ``f1`` is a better candidate than ``f2`` if for all arguments the +conversion sequences is not worse than the corresponding conversion sequence and +for at least one argument it is better. + +.. code-block:: c++ + + cbuffer CB { + int I; + float X; + float4 V; + } + + void twoParams(int, int); + void twoParams(float, float); + void threeParams(float, float, float); + void threeParams(float4, float4, float4); + + export void call() { + twoParams(I, X); // DXC: resolves twoParams(int, int). + // Clang: Fails to resolve ambiguous conversions. + + threeParams(X, V, V); // DXC: resolves threeParams(float4, float4, float4). + // Clang: Fails to resolve ambiguous conversions. + } + +For the examples above since ``twoParams`` called with mixed parameters produces +implicit conversion sequences that are { ExactMatch, FloatingIntegral } and { +FloatingIntegral, ExactMatch }. In both cases an argument has a worse conversion +in the other sequence, so the overload is ambiguous. + +In the ``threeParams`` example the sequences are { ExactMatch, VectorTruncation, +VectorTruncation } or { VectorSplat, ExactMatch, ExactMatch }, again in both +cases at least one parameter has a worse conversion in the other sequence, so +the overload is ambiguous. + +.. note:: + + The behavior of DXC documented below is undocumented so this is gleaned from + observation and a bit of reading the source. + +DXC's approach for determining the best overload produces an integer score value +for each implicit conversion sequence for each argument expression. Scores for +casts are based on a bitmask construction that is complicated to reverse +engineer. It seems that: + +* Exact match is 0 +* Dimension increase is 1 +* Promotion is 2 +* Integral -> Float conversion is 4 +* Float -> Integral conversion is 8 +* Cast is 16 + +The masks are or'd against each other to produce a score for the cast. + +The scores of each conversion sequence are then summed to generate a score for +the overload candidate. The overload candidate with the lowest score is the best +candidate. If more than one overload are matched for the lowest score the call +is ambiguous.