Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aievec] to-llvm flow for aievec.ext op with I128.I512 extract intrinsic #1490

Merged
merged 7 commits into from
May 17, 2024

Conversation

jamestcl-amd
Copy link
Collaborator

This PR add the support for aievec.ext op going through the to-llvm flow with the extract.I128.I512 intrinsic op.

Changes:

  • Update the extract.I128.I512 intrinsic ops in XLLVM dialect.
  • Add aievec-to-llvm conversion pattern/tests for the aievec.ext op for extract.I128.I512 intrinsic.
  • Add test for translating to external llvm.

// CHECK-SAME: (vector<16xi32>) -> vector<4xi32>
// CHECK-NEXT: %[[RES0:.*]] = llvm.bitcast %[[EXT0]] : vector<4xi32> to vector<4xf32>
// CHECK-NEXT: %[[UNDEF1:.*]] = "xllvm.intr.aie2.v16int32"() : () -> vector<16xi32>
// CHECK-NEXT: %[[CST48:.*]] = llvm.mlir.constant(48 : i32) : i32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? If I understand correctly, the process to extract a subvector from a given position is to shift the whole vector by the number of bits up to the position, and then extracting the lowest part. For 16-bit element vectors and an extraction from position 3, that's 3 x 16 = 48 bit, but for 32-bit element vectors, shouldn't it be 3 x 32 = 96 bit? Am I misunderstanding something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shift amount for the shift op/intrinsic is in the number of bytes. The I128.I512 extract intrinsic is special. It always extracts the lowest 128-bit vector from the source 512-bit vector. For index=0 vector extraction, we actually don't need any vector shift beforehand, so I have updated the conversion pattern for this scenario. As for index=3 vector extraction, the shift amount is 48 bytes = 384 bits = 3*128 bits. After the right shift of 48 bytes, the I128.I512 intrinsic extracts the lowest 128-bit vector, which is correct. I have updated the aievec.ext/shift op descriptions, to explicitly clarify the shift amount in bytes and the extraction index to be either 0--1 or 0--3. Let me know if there is anything else I can do :).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaah! Got it, I was trying to make the maths work in my head and failing, thanks for the clarification 🙂

Copy link
Collaborator

@jsetoain jsetoain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jamestcl-amd jamestcl-amd added this pull request to the merge queue May 16, 2024
Merged via the queue into Xilinx:main with commit 3dc49bf May 17, 2024
51 checks passed
@jamestcl-amd jamestcl-amd deleted the aievec_ext_llvm_128_512 branch May 17, 2024 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants