Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use and benchmark Halide autoscheduler? #8432

Open
Rehanchy opened this issue Oct 1, 2024 · 0 comments
Open

How to use and benchmark Halide autoscheduler? #8432

Rehanchy opened this issue Oct 1, 2024 · 0 comments

Comments

@Rehanchy
Copy link

Rehanchy commented Oct 1, 2024

Hi Halide developers,

I was trying to use Halide autoscheduler to generate scheduler for matmul by following the old tutorials. (btw it's really old, it seems argument last_level_cache_size and balance are no longer in use nowadays)

I found that the schedule produced by autoscheduler is not having great performance, so I wish to see if you guys can help to check whether I'm using the autoscheduler correctly.

My generator (matmul_generator.cpp) looks like this:

`class MatMulGenerator : public Halide::Generator {
public:
Input<Buffer> A{"A", 2}; // Input matrix A (m x l)
Input<Buffer> B{"B", 2}; // Input matrix B (l x n)
Output<Buffer> C{"C", 2}; // Output matrix C (m x n)

void generate() {
    Var x("x"), y("y"), k("k");
    Func result("result");
    RDom r(0, A.dim(1).extent());

    result(x, y) = Halide::Expr(0.0);
    result(x, y) += A(x, r.x) * B(r.x, y);
    C(x, y) = result(x, y);
}
void schedule() {
    if (using_autoscheduler()) {
        A.set_estimates({{0, 4096}, {0, 4096}});
        B.set_estimates({{0, 4096}, {0, 4096}});
        C.set_estimates({{0, 4096}, {0, 4096}});
    } else {
        C.compute_root();
    }
}

};

HALIDE_REGISTER_GENERATOR(MatMulGenerator, matmul_generator)
`

Then I'm using these commands to generate the schedule, following the tutorial.
g++ matmul_generator.cpp /path/to/GenGen.cpp -g -std=c++17 -fno-rtti -I/path/to/halide/include -L/path/to/halide/lib -lHalide -lpthread -ldl -o matmul_generator

./matmul_generator -o . -g matmul_generator -f matmul_autoschedule_true -e static_library,h,schedule -p /path/to/halide/lib/libautoschedule_adams2019.so target=host autoscheduler=Adams2019 autoscheduler.parallelism=8

In another cpp file, I will use this line of code to call the scheduled matrix multiplication.
matmul_autoschedule_true(A.raw_buffer(), B.raw_buffer(), C.raw_buffer());

I also have questions about how to benchmark halide autoscheduler's performance on a given kernel, I know that in test/performance/matrix_multiplication.cpp, out.realize(output); is called twice, because there will be code generation phase overhead in the first call, and we need to measure halide's performance with the second call.

To summarize, my questions are

  1. Is my way of using autoscheduler correct?
  2. I have a minor concern that when benchmarking halide using the second realize call, the cache is not cold, which may lead to performance overestimation.
  3. When using autoscheduler, and call the kernel like this matmul_autoschedule_true(A.raw_buffer(), B.raw_buffer(), C.raw_buffer());, does this function contain the code generation phase that could lead to performance underestimation?

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant