Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
data-tiling: introduce
upper_bound_tile_size
op to defer padding-si…
…ze choice to MaterializeEncoding. (#14349) This fixes #11632, by introducing a materializable `upper_bound_tile_size ` instead of hardcoding a fixed padding amount at Flow, and fixes it in sufficient generality to also solve the problem for narrow matmuls - let's explain that in more detail as this is an important part of what this PR is doing. For each combination of element types and each target, the MaterializeEncoding pass selects appropriate matmul tile shapes. Input tensors get padded to the next multiple of the next tile size. The padding increases the inherent arithmetic cost of the problem at hand. When, along some dimension, the original tensor size is smaller than the tile size, that can result in particulary large overhead. The extreme case, which is also a very common case, is matrix-times-vector multiplication. The "vector" right-hand side is really a matrix with one dimension size equal to 1, so if the general matmul tile shape along that dimension is 8 or 16, as is usually the case, that can be a 8x or 16x increase in the inherent arithmetic cost of the matmul op. The solution to that is to adjust MaterializeEncoding tile shapes to narrow dimensions. We had some logic in place to deal with that, but #11632 was leaving it moot: the flow-level padding of everything to the next multiple of 16 meant that our logic there never really had a chance of kicking in. With #11632 being fixed, this PR was the opportunity to also fix that along the way, and to ensure that the solution to #11632 worked also in that respect. As matrix-times-vector products were the common case that suffered the most from #11632, it would have been too bad to "solve" #11632 without addressing that. By the way, matrix-times-vector is only the extreme case, but other narrow cases matter too. When, e.g. on AVX-512, the general matmul tile size is 16, even width-8 matmuls (MxKx8) were suffering from 2x-widening. So the solution in this PR is making sure to address all narrow cases, defined as whenever a tensor dimension size is less than the general tile size. The difficulty was that when MaterializeEncoding runs on a dispatch function, it runs on an already-padded tensor; even as this PR introduces `upper_bound_tile_size`, that only makes it possible to select the right padding amount, but there's still a `tensor.pad` op and it's still getting in the way of knowing the actual, original tensor shape for the purpose of adjusting tile shapes for narrow cases. Moreover, as `MaterializeEncoding` is a type-converter pass, it can't just walk from a Value up to its defining-op to find the pre-padding tensor. There are no values there, only types. So the information about the pre-padding tensor shape has to be part of the tensor type that `MaterializeEncoding` sees, that its, the padded tensor type. The solution to that problem in this PR is to add a `original_type` field to `EncodingAttr`. Fixes #11632. Fixes a compiler issue encountered in #14398 but not the originally reported runtime crash by itself. This now also includes the removal of a now-useless VMVX pass, which was originally split into #14383 .
- Loading branch information