Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd variable/buffer names in conceptual_stmt. #7924

Open
mcourteaux opened this issue Oct 28, 2023 · 5 comments
Open

Odd variable/buffer names in conceptual_stmt. #7924

mcourteaux opened this issue Oct 28, 2023 · 5 comments

Comments

@mcourteaux
Copy link
Contributor

I get a lot of .0, .1, .2, .4, ... as suffixes in buffer names in the conceptual stmt, which would be nice if it can be cleaned up. Haven't taken the time to figure out where they come from.

A few examples:

image

When using rfactor() for a summation over a 2D RDom:

image

A bit of everything:

image

image

Where in the above screenshot, the red arrows are all the places where the buffers have the suffix, yet the producer/consumer name is clean, as indicated by the green arrows.

@antonysigma
Copy link
Contributor

Re: .0, .1 notations. Do you mean the Tuple datatype? I also saw these suffixes in the IR of complex number datatype implemented in apps/fft/complex.h.

See also the pinned issue "Halide development roadmap": #5055 . Quoting the key summary:

Generators with multiple outputs (there's a trade-off between tuples, extra channels, compute_with)

@abadams
Copy link
Member

abadams commented Oct 29, 2023

I think it's injected here: https://github.com/halide/Halide/blob/main/src/FuseGPUThreadLoops.cpp#L480

and I think the reason is that when allocations are hoisted to the gpu block level, they might have name collisions with other allocations that now have the same scope, and this numbering makes that impossible.

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Oct 30, 2023

Aha, that sounds reasonable. Any idea why the rfactor() intermediate Func allocs a float[1] array for every float, instead of a single float array for all of them.

Could it be that the Func is computed at a deeper nesting level, but the GPU lowering pass moves up allocations to the innermost GPU-thread loop, and therefore the Allocate node gets duplicated outside of the loop? I haven't really investigated, and I'm also not sure if my vague theory could hold for loops with non-statically-known bounds. EDIT: I doubt it. The schedule looks like this is not the case.

@mcourteaux
Copy link
Contributor Author

I think the reason is that when allocations are hoisted to the gpu block level, they might have name collisions with other allocations that now have the same scope

So it should be allowed to only add those numbers in case of collisions? Maybe one day I'll feel like trying to implement that cheap change with a std::map<std::string, int> and make a PR.

@mcourteaux
Copy link
Contributor Author

I can't assign people (or add labels, or ask for reviews or anything), but I'd assign myself if I could.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants