-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aievec] Add new shuffle ops #1516
Conversation
c02363e
to
362ad18
Compare
os << emitter.getOrCreateName(rhs); | ||
os << ", "; | ||
} | ||
os << "eShuffleMode::shuffle_T" << &stringifyEnum(mode).data()[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can do substr(1)
here, which is easier to understand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! You caught me being lazy 😂
return emitError() << "shuffle mode '" << stringifyEnum(mode) | ||
<< "' requires vectors of " << modeBitWidth | ||
<< "-bit elements"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check if the operands and results are in 512-bit vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's being checked by the type constraints in the op definition, this exclusively verifies whether the shuffle mode is "compatible" with the element type and number of operands. It's up for debate whether we want to impose this restriction or not, since the hardware is agnostic to the mode (it just shuffles bytes around), but I'd rather be conservative for the time being.
@@ -38,7 +38,7 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness" | |||
// CHECK: %[[C16:.*]] = arith.constant 16 : index | |||
// CHECK: %[[C0:.*]] = arith.constant 0 : index | |||
// CHECK: %[[T0:.*]] = aievec.upd %[[A1]][%[[C0:.*]]] {index = 0 : i8, offset = 0 : i32} : memref<?xi8>, vector<64xi8> | |||
// CHECK: %[[T1:.*]] = aievec.shuffle %[[T0:.*]] {mode = 0 : i32} : vector<64xi8>, vector<64xi8> | |||
// CHECK: %[[T1:.*]] = aievec.legacyshuffle %[[T0:.*]] {mode = 0 : i32} : vector<64xi8>, vector<64xi8> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the mode=0 (shuffle_T8_64x2_lo
), I'm a bit confused why this can work without a second operand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Shuffle" always takes two arguments. When translating to C++, the single operand "shuffle" C++ intrinsic calls a two operand version with "undef" in the rhs. For the new shuffle
, we will have to inject the undef
when lowering to LLVM IR
. With the new constraints, I'm just trying to keep things semantically consistent.
@@ -80,7 +80,7 @@ func.func @conv2d (%A: memref<18x288xi8>, %B: memref<48xi8>, %C: memref<16x256xi | |||
// CHECK: %[[C8:.*]] = arith.constant 8 : i32 | |||
// CHECK: %[[C0:.*]] = arith.constant 0 : index | |||
// CHECK: %[[T0:.*]] = aievec.upd %[[A1]][%[[C0]]] {index = 0 : i8, offset = 0 : i32} : memref<48xi8>, vector<64xi8> | |||
// CHECK: %[[T1:.*]] = aievec.shuffle %[[T0]] {mode = 0 : i32} : vector<64xi8>, vector<64xi8> | |||
// CHECK: %[[T1:.*]] = aievec.legacyshuffle %[[T0]] {mode = 0 : i32} : vector<64xi8>, vector<64xi8> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one too...
AIEVec_ShuffleModeAttr:$mode)>, | ||
Results<(outs AnyVector:$result)> { | ||
let summary = "AIE2 shuffle"; | ||
let description = [{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice and clear explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! It's great to see the symbolic mode in the textual representation.
|
||
Shuffle Mode | Operands | Types Supported | ||
:------------------:|:------------------:|:------------------: | ||
t8_8x4 | `lhs` | `vector<2x32xi8>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entries in the 'Types Supported' seem inconsistent.
I would expect to see only 1D vectors. So for the entry here, I would expect vector<64xi8>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I'd like to discuss. I stumbled upon this "issue" when looking over each of the shuffle modes. For some of them, like this one, it performs the transposition on the two halves of the vector, as if we had two (in this case) 8x4 matrices. I'm inclined to treat it as a flat 64xi8
and this would be yet another set of "special" cases. The ISA is not particularly semantically consistent. It does what it does, except when it doesn't 😉
t8_64x2_hi | ^ | ^ | ||
t8_2x64_lo | ^ | ^ | ||
t8_2x64_hi | ^ | ^ | ||
t16_4x2 | `lhs` | `vector<4x8xi16>` or `vector<4x8xbf16>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 3rd column should read vector<8xi16>
or vector<8xbf16>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything, vector<32xi16>
/vector<32xbf16>
. See above.
b4b19db
to
27bdc9b
Compare
65aaaf3
to
61e42bf
Compare
This replaces the old `aievec.shuffle` op with a new one with a better syntax that supports all cases supported by the intrinsics, and has strong type guarantees. For legacy purpose, we leave the old shuffle instruction renamed as `aievec.legacyshuffle`.
61e42bf
to
871a5f4
Compare
This replaces the old
aievec.shuffle
op with a new one with a better syntax that supports all cases supported by the intrinsics, and has strong type guarantees.For legacy purposes, we leave the old shuffle instruction renamed as
aievec.legacyshuffle
.