-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aievec] to-llvm flow for aievec.ext op with I128.I512 extract intrinsic #1490
[aievec] to-llvm flow for aievec.ext op with I128.I512 extract intrinsic #1490
Conversation
// CHECK-SAME: (vector<16xi32>) -> vector<4xi32> | ||
// CHECK-NEXT: %[[RES0:.*]] = llvm.bitcast %[[EXT0]] : vector<4xi32> to vector<4xf32> | ||
// CHECK-NEXT: %[[UNDEF1:.*]] = "xllvm.intr.aie2.v16int32"() : () -> vector<16xi32> | ||
// CHECK-NEXT: %[[CST48:.*]] = llvm.mlir.constant(48 : i32) : i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? If I understand correctly, the process to extract a subvector from a given position is to shift the whole vector by the number of bits up to the position, and then extracting the lowest part. For 16-bit
element vectors and an extraction from position 3, that's 3 x 16 = 48 bit
, but for 32-bit
element vectors, shouldn't it be 3 x 32 = 96 bit
? Am I misunderstanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shift
amount for the shift
op/intrinsic is in the number of bytes. The I128.I512
extract intrinsic is special. It always extracts the lowest 128-bit vector from the source 512-bit vector. For index=0
vector extraction, we actually don't need any vector shift beforehand, so I have updated the conversion pattern for this scenario. As for index=3
vector extraction, the shift amount is 48 bytes = 384 bits = 3*128 bits
. After the right shift of 48 bytes, the I128.I512
intrinsic extracts the lowest 128-bit vector, which is correct. I have updated the aievec.ext/shift
op descriptions, to explicitly clarify the shift amount in bytes and the extraction index to be either 0--1 or 0--3. Let me know if there is anything else I can do :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaah! Got it, I was trying to make the maths work in my head and failing, thanks for the clarification 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
This PR add the support for aievec.ext op going through the to-llvm flow with the
extract.I128.I512
intrinsic op.Changes:
extract.I128.I512
intrinsic ops in XLLVM dialect.extract.I128.I512
intrinsic.