MLIR code generation from Julia SSA IR. #77

jumerckx · 2024-06-09T22:09:57Z

This PR contains an experimental way to generate MLIR code from Julia code. Similar in spirit to MLIR-python-extras, and initially based off of Pangoraw's Brutus example.
Currently it probably only works for Julia 1.11 as it uses some Core.Compiler functions that are in flux.

Example

# Regular Julia code:
struct Point{T}
    x::T
    y::T
end

struct Line{T}
    p1::Point{T}
    p2::Point{T}
end

sq_distance(l::Line) = (l.p1.x - l.p2.x)^2 + (l.p1.y - l.p2.y)^2

# Convert this to MLIR:
op2 = cg(Tuple{Point{i64},Point{i64}}) do a, b
    l = Line(a, b)
    d_2 = sq_distance(l)

    Point(((a.x, a.y) .- d_2)...)
end

generates:

  func.func @f(%arg0: i64, %arg1: i64, %arg2: i64, %arg3: i64) -> (i64, i64) {
    %0 = arith.subi %arg0, %arg2 : i64
    %1 = arith.muli %0, %0 : i64
    %2 = arith.subi %arg1, %arg3 : i64
    %3 = arith.muli %2, %2 : i64
    %4 = arith.addi %1, %3 : i64
    %5 = arith.subi %arg0, %4 : i64
    %6 = arith.subi %arg1, %4 : i64
    return %5, %6 : i64, i64
  }

I'm leaving out some details but the full code is included in examples/main.jl.

It is possible to customise Julia function <--> MLIR operation mappings by defining "intrinsic" functions (this is an awfully vague name, any alternatives?).
For example, an intrinsic function that maps addition onto arith.addi:

@intrinsic Base.:+(a::T, b::T) where {T<:MLIRInteger} = T(Dialects.arith.addi(a, b)|>result)

Again, more details can be found in examples/definitions.jl.

Internal Details

Julia code is first lowered to SSA IR using a custom AbstractInterpreter (Generate.MLIRInterpreter) that allows types other than Bool to be used for conditional gotos. This AbstractInterpreter also overwrites the inlining policy to inline everything except calls to intrinsics. (see src/Generate/absint.jl)

With this SSA IR in hand, a source2source transformation is applied that replaces all control flow statements to builder functions that build corresponding MLIR unstructured control flow operations. (see src/Generate/transform.jl)

Finally, the transformed IR is executed and produces an MLIR region.

The code generation can be nested to support generating MLIR operations containing regions, the last example in examples/main.jl shows how an scf.for operation can be generated.

Some Notes

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.
TTFG (time-to-first-generate MLIR code) is long. I haven't worked on optimising this but I believe the biggest slowdowns have to do with the AbstractInterpreter caching. CI is way slower because of this...
This PR is an updated version of what I did for my Master's thesis. The internal details compared to my thesis have changed a bit (for the better), but for my thesis, I wrote a lot more examples including generating linalg operations from einsum expressions, or generating GPU kernels. The code can be found here

vchuravy · 2024-06-10T00:42:24Z

Really nice! I like the intrinisc approach a lot!

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

In the original Brutus we had a simple worklist to allow for function calls.

https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/Brutus/src/codegen.jl#L70-L86

mofeing · 2024-06-10T09:49:10Z

This looks awesome!!

Is the full set of Julia SSA IR mappable to MLIR?

It is possible to customise Julia function <--> MLIR operation mappings by defining "intrinsic" functions (this is an awfully vague name, any alternatives?).

intrinsic looks nice to me. If you need other names, maybe primitive?

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

But it would be okay if we added an @intrinsic for these cases right?

TTFG (time-to-first-generate MLIR code) is long. I haven't worked on optimising this but I believe the biggest slowdowns have to do with the AbstractInterpreter caching. CI is way slower because of this...

Would it be posible to run the AbstractInterpreter on some warmup codes during precompilation and save the cache for later?

jumerckx · 2024-06-10T17:17:36Z

In the original Brutus we had a simple worklist to allow for function calls.

Aha, thanks for the link, this should be doable. How does this handle function names? e.g. when calling +(::Int, ::Int) and +(::Float64, Float64) in the same function?

Is the full set of Julia SSA IR mappable to MLIR?

Not completely yet. I've ignored PhiC and Upsilon nodes, and PiNodes don't generate any MLIR.

For full Julia compilation like Brutus, the current intrinsic system also doesn't suffice because Julia builtins cannot be specialized.

julia> Base.abs_float(x) = x
ERROR: cannot add methods to a builtin function
Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

Tapir.jl handles this by replacing any call to a Core.IntrinsicFunction with a call to a function that calls the builtin. (code).

The current system also doesn't allow to map existing Julia types onto MLIR operations. For example, defining an intrinsic for Base.:+(::Int, ::Int) won't work because it literally redefines this method so you can't add integers together anymore.

To fix this, intrinsic definitions should exist in a separate context. CassetteOverlay.jl could be used to achieve something like this, but during my thesis I depended on it and ran into strange errors and found the package to be quite fiddly.

Alternatively, we could again take inspiration from Tapir.jl and use something similar to their rrule (relevant code).
I.e., use a function like intrinsic(<:Tuple{Base.:+, ::Int, ::Int}) = ... instead of redefining the method. The source transformation should then replace each call to a function with a call to intrinsic with the correct signature.

Would it be posible to run the AbstractInterpreter on some warmup codes during precompilation and save the cache for later?

Yes! One annoyance is that redefining intrinsics can spoil the cache because there's no backedge from e.g. Generate.is_intrinsic(::Type{<:Tuple{typeof(+), T, T}}) to Base.:+(::T, ::T). Tapir.jl faces the same limitation.
I spent some time trying to add these backedges manually but quickly found myself out of my depth.
Still, this should speed up things a lot for the initial experience. I haven't done this before so would have to find how exactly to do this properly.

But it would be okay if we added an @intrinsic for these cases right?

Not sure I'm following here. Currently, the ability to define intrinsics can't cope with recursion and dynamic dispatch. Full inlining is required to expose all control flow that's otherwise hidden in function calls. All control flow needs to be known upfront because all basic blocks are created at the start of MLIR generation.
With a worklist algorithm that generates calls instead of inlining everything, this should be a non-issue, though.

vchuravy · 2024-06-10T17:49:41Z

We used https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/lib/Codegen/Codegen.cpp#L42

So the name included the signature.
https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/test/Codegen/translate/caller.jl#L11

mofeing · 2024-06-29T20:20:19Z

examples/generate/main.jl

+# isbitstype structs can be used as arguments or return types.
+# They will be unpacked into their constituent fields that are convertible to MLIR types.
+# e.g. a function `Point{i64}` taking and returning a point, will be converted to a MLIR
+# function that takes two `i64` arguments and returns two `i64` values.


Maybe we can do something like toy.struct? https://mlir.llvm.org/docs/Tutorials/Toy/Ch-7/

Yes, but then we're going down the path of having a custom Julia dialect that should probably be included in MLIR_jll?
(not opposed to that though, I think it's a good next step once this becomes more useable)

mmm not necessarily. I think it was agreed that the Julia dialect should be implemented with IRDL and loaded dynamically.

I'm ok with the current solution while Julia dialect lands. It would be nice to use tuple types instead, but seem like there are no ops for their manipulation https://mlir.llvm.org/docs/Rationale/Rationale/#tuple-types

A custom dialect is likely needed for GC allocations

mofeing · 2024-06-29T20:28:55Z

src/Generate/intrinsic.jl

+function intrinsic_(expr)
+    dict = splitdef(expr)
+    # TODO: this can probably be fixed:
+    length(dict[:kwargs]) == 0 || error("Intrinsic functions can't have keyword arguments\nDefine a regular function with kwargs that calls the intrinsic instead.")


Maybe we can turn the Core.kwcall method into a intrinsic instead.

jumerckx · 2024-06-30T08:21:41Z

Really nice! I like the intrinisc approach a lot!

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

In the original Brutus we had a simple worklist to allow for function calls.

JuliaLabs/brutus@c45ec5e/Brutus/src/codegen.jl#L70-L86

I'm implementing function calls but I'm having trouble understanding the difference between invokes and calls.
In the Brutus code, it seems like only invokes are being added to the worklist. But when I look at IRCode for some functions, I also see calls that would need to be added to the worklist.
Based on what I found online, "calls" are dynamic function calls while "invokes" are static. But why is sitofp a call instruction instead of an invoke, AFAICT, the method can be statically decided...

julia> only(Base.code_ircode(sin, Tuple{Int}))
1627 1 ─ %1 = Base.sitofp(Float64, _2)::Float64                                                                                                               │╻╷╷ float
1629 2 ─ %2 = invoke Base.Math.sin(%1::Float64)::Float64                                                                                                      │   
     └──      return %2                                                                                                                                       │   
      => Float64

vchuravy · 2024-06-30T11:14:28Z

Based on what I found online, "calls" are dynamic function calls while "invokes" are static.

This is generally correct, but intrinsics and builtins are calls and not invokes. But they don't need to be Bart of the worklist since we can lower them directly to an instruction.

jumerckx · 2024-06-30T16:28:38Z

I implemented the worklist + function invocations:

fibonacci(n) = n < 2 ? n : fibonacci(n-1) + fibonacci(n-2)
cg(fibonacci, Tuple{i64})

module {
  func.func private @"Tuple{typeof(fibonacci), i64}"(%arg0: i64) -> i64 {
    %c2_i64 = arith.constant 2 : i64
    %0 = arith.cmpi slt, %arg0, %c2_i64 : i64
    cf.cond_br %0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    return %arg0 : i64
  ^bb2:  // pred: ^bb0
    %c1_i64 = arith.constant 1 : i64
    %1 = arith.subi %arg0, %c1_i64 : i64
    %2 = call @"Tuple{typeof(fibonacci), i64}"(%1) : (i64) -> i64
    %c2_i64_0 = arith.constant 2 : i64
    %3 = arith.subi %arg0, %c2_i64_0 : i64
    %4 = call @"Tuple{typeof(fibonacci), i64}"(%3) : (i64) -> i64
    %5 = arith.addi %2, %4 : i64
    return %5 : i64
  }
}

vchuravy · 2024-06-30T17:34:19Z

One thing you might want to guard against is redefinition. And other names pacing issues. Right now your function name has the potential for collision since there may be multiple function f either across modules or across world (nee redefinition)

jumerckx · 2024-07-01T12:34:03Z

examples/generate/main.jl

+    p2::Point{T}
+end
+
+@inline Base.literal_pow(::typeof(^), x::MLIRInteger, ::Val{2}) = x*x


This definition didn't use to be necessary when everything was forced to be fully inlined.
I assume constant propagation doesn't travel accross function boundaries or something?

Enabling aggressive constant propagation in the inference parameters fixes this (9fdffa7)

jumerckx · 2024-07-16T15:15:43Z

With the worklist for function calls, I'm running into problems with captured variables in functions.
For example:

function h(x)
    f(x) + f(x+1)
end

function f(x)
    @noinline g() = x+1
    g()
end

2 1 ─ %1 = %new(var"#g#50"{Int64}, _2)::var"#g#50"{Int64}                │╻ f
  │   %2 = invoke %1()::Int64                                            ││
  │   %3 = (Core.Intrinsics.add_int)(_2, 1)::Int64                       │╻ +
  │   %4 = %new(var"#g#50"{Int64}, %3)::var"#g#50"{Int64}                │╻ f
  │   %5 = invoke %4()::Int64                                            ││
  │   %6 = (Core.Intrinsics.add_int)(%2, %5)::Int64                      │╻ +
  └──      return %6                                                     │ 
   => Int64

When generating MLIR code for h, the two invocations to #g#50 have the same Methodinstance. The two func.call operations that are generated will call the same MLIR function, leading to incorrect code.
This can be solved by generating a different MLIR func.func for each function instance, but this introduces quite some complexity.

Another approach could be to insert an extra argument x in the generated MLIR func for #g#50. At function call sites, all captured variables need to be passed explicitly.

Any more ideas on how to tackle this? I think it's important that these kind of higher-order functions work because they are useful to model more complex MLIR operations with nested regions.

vchuravy · 2024-07-16T16:16:00Z

Any more ideas on how to tackle this?

This is why the Julia calling convention is that there is a hidden #self# argument as the first argument that is the called function.
If that argument is a "ghost type" it disappears, but it contains the closure.

src/Generate/transform.jl

vchuravy · 2024-07-17T12:28:44Z

I would recommend passing the first argument through for now and optimizing this later.

Julia does use two calling covnentions the first one is boxed values and the second one is unboxed.
The first calls the latter, and only the latter removes ghost types.

commit 0a3c2b7 Merge: 1638254 5c35af3 Author: jumerckx <julesmerckx12@gmail.com> Date: Sat Jun 1 23:10:34 2024 +0200 Merge remote-tracking branch 'upstream/main' into jm/MLIRValueTrait commit 1638254 Author: jumerckx <julesmerckx12@gmail.com> Date: Wed Mar 27 20:08:28 2024 +0100 rename `value(::AffineExpr)`, use IR.Value type instead of API.MlirValue in jl-generator commit 912d8f5 Author: jumerckx <31353884+jumerckx@users.noreply.github.com> Date: Wed Mar 20 21:12:01 2024 +0100 Apply suggestions from code review Co-authored-by: Sergio Sánchez Ramírez <15837247+mofeing@users.noreply.github.com> commit c57464a Author: jumerckx <31353884+jumerckx@users.noreply.github.com> Date: Wed Mar 20 21:10:51 2024 +0100 Update deps/tblgen/jl-generators.cc Co-authored-by: Sergio Sánchez Ramírez <15837247+mofeing@users.noreply.github.com> commit 078f15e Author: jumerckx <31353884+jumerckx@users.noreply.github.com> Date: Wed Mar 20 21:10:08 2024 +0100 Update src/IR/Value.jl Co-authored-by: Sergio Sánchez Ramírez <15837247+mofeing@users.noreply.github.com> commit 0b5b443 Author: jumerckx <julesmerckx12@gmail.com> Date: Sun Mar 17 22:18:30 2024 +0100 MLIRValueTrait changes

…finitions

* don't generate argument values for captured values in top-level functions, these are directly passed to the code generation

* use `Base._reinterpret` instead of `reinterpret`

mofeing · 2024-10-11T17:08:22Z

We had some discussions on Reactant.jl about how we could add support for control-flow and we use an AbstractInterpreter similar to the one in this PR.

@jumerckx @vchuravy is the interpreter in here able to detect ifs and for loops and lower them to the scf dialect? I believe this might be critical for Brutus.

jumerckx · 2024-10-11T17:37:54Z

@jumerckx @vchuravy is the interpreter in here able to detect ifs and for loops and lower them to the scf dialect? I believe this might be critical for Brutus.

No, this abstract interpreter runs on (typed) IR at which point control flow has already become unstructured.
Generating structured loops can only happen starting from the julia AST as far as I'm aware.

Alternatively you could detect loops in the unstructured control flow using something like Valentin's Loops.jl.

I think having something like Scala-virtualized where behaviour of language constructs can be specialized in addition to regular function specialization would be very cool to have in Julia and that would solve this problem but that's a whole another can of worms...

I'm also curious how reactant development is going 👀. For now I'm focusing on some other work but in the future I might try get this pr in a better place in the context of my PhD.

mofeing · 2024-10-11T18:53:10Z

No, this abstract interpreter runs on (typed) IR at which point control flow has already become unstructured. Generating structured loops can only happen starting from the julia AST as far as I'm aware.

Alternatively you could detect loops in the unstructured control flow using something like Valentin's Loops.jl.

Yeah, that's what @wsmoses also proposed: due a "raising" from unstructured control flow to scf. But I guess there can be edge cases where this doesn't work.

I think having something like Scala-virtualized where behaviour of language constructs can be specialized in addition to regular function specialization would be very cool to have in Julia and that would solve this problem but that's a whole another can of worms...

That would be a deal-breaker! I guess we could possibly experiment with have some form of structured control flow in the Juliar dialect before proposing it in base Julia.

I'm also curious how reactant development is going 👀. For now I'm focusing on some other work but in the future I might try get this pr in a better place in the context of my PhD.

Uuu it's going great. We haven't abandoned MLIR.jl, it's just that we are very focused in Reactant at the moment and we're using a local copy of it instead of using upstream (we'll change to using upstream MLIR.jl at some moment).

We're discussing stuff in @wsmoses's group Slack, where we have a channel for Reactant and weekly meetings. I can send you an invitation if you want.

wsmoses · 2024-10-11T18:55:53Z

@jumerckx you’re welcome to join the reactant party, if you like!

wsmoses · 2024-10-11T18:59:41Z

And @mofeing honestly lowering to a CFG is fine. We can always use the infra built in polygeist for raising the control flow back (which recently has demonstrated from llvm all the way to linalg).

The sole question here imo is type stability, at one point in the future we can deal with type unstable IR but even when we add proper control flow first, we’re going to want to limit ourselves to concrete types.

Currently this is resolved by tracing so everything definitooballt is a mlir tensor type we know about. It’s fine to equally emit branches or scf on top of. Other types might be doable but it requires a lot more design work to make sure everything fits together

jumerckx · 2024-10-11T19:19:54Z

@jumerckx you’re welcome to join the reactant party, if you like!

Thanks, would love to join the discussions!

wsmoses · 2024-10-11T19:20:49Z

Send me your email and I’ll send a slack invite!

mofeing reviewed Jun 29, 2024

View reviewed changes

jumerckx commented Jul 1, 2024

View reviewed changes

mofeing mentioned this pull request Jul 7, 2024

Remove Cassette EnzymeAD/Reactant.jl#37

Closed

vchuravy reviewed Jul 17, 2024

View reviewed changes

src/Generate/transform.jl Outdated Show resolved Hide resolved

jumerckx force-pushed the jm/generate branch from a8fb5b1 to 7cac97f Compare July 18, 2024 10:08

jumerckx added 13 commits August 26, 2024 15:45

LLVM 14 dialect bindings using ValueTrait

a8de885

LLVM 15/16/17 dialect bindings using ValueTrait

107a1a9

abstractinterpreter for inlining everything but "intrinsics"

710c3b1

WIP: commit before rebasing on jm/MLIRValueTrait

3cc3099

unpack: the reason for ValueTrait

8fd194e

code generation works!

1e0817a

remove verbose logging and add tests

6ef7f63

worklist algorithm to support function invocations

f22e0fc

fix tests: nested code generation was broken

a57beab

make code generation test invariant to different order of function de…

d4f2df1

…finitions

enable aggressive constant propagation

4e372ea

include worldage and module in generated function name

0ee75dc

jumerckx added 5 commits August 26, 2024 15:45

handle captured variables in generated functions

8e67140

* handle const function objects.

8846d7d

* don't generate argument values for captured values in top-level functions, these are directly passed to the code generation

small fix

444584f

execute erroneous code to get the actual errors

5c98682

* fix bug with destinations of goto statements

cf22c54

* use `Base._reinterpret` instead of `reinterpret`

jumerckx force-pushed the jm/generate branch from 8ccd86b to cf22c54 Compare August 26, 2024 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLIR code generation from Julia SSA IR. #77

MLIR code generation from Julia SSA IR. #77

jumerckx commented Jun 9, 2024

vchuravy commented Jun 10, 2024

mofeing commented Jun 10, 2024 •

edited

Loading

jumerckx commented Jun 10, 2024

vchuravy commented Jun 10, 2024

mofeing Jun 29, 2024

jumerckx Jun 29, 2024 •

edited

Loading

mofeing Jun 29, 2024 •

edited

Loading

vchuravy Jun 30, 2024

mofeing Jun 29, 2024

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx Jul 1, 2024

jumerckx Jul 13, 2024

jumerckx commented Jul 16, 2024 •

edited

Loading

vchuravy commented Jul 16, 2024

vchuravy commented Jul 17, 2024

mofeing commented Oct 11, 2024

jumerckx commented Oct 11, 2024 •

edited

Loading

mofeing commented Oct 11, 2024

wsmoses commented Oct 11, 2024

wsmoses commented Oct 11, 2024

jumerckx commented Oct 11, 2024

wsmoses commented Oct 11, 2024

MLIR code generation from Julia SSA IR. #77

Are you sure you want to change the base?

MLIR code generation from Julia SSA IR. #77

Conversation

jumerckx commented Jun 9, 2024

Example

Internal Details

Some Notes

vchuravy commented Jun 10, 2024

mofeing commented Jun 10, 2024 • edited Loading

jumerckx commented Jun 10, 2024

vchuravy commented Jun 10, 2024

mofeing Jun 29, 2024

Choose a reason for hiding this comment

jumerckx Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

mofeing Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

vchuravy Jun 30, 2024

Choose a reason for hiding this comment

mofeing Jun 29, 2024

Choose a reason for hiding this comment

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx Jul 1, 2024

Choose a reason for hiding this comment

jumerckx Jul 13, 2024

Choose a reason for hiding this comment

jumerckx commented Jul 16, 2024 • edited Loading

vchuravy commented Jul 16, 2024

vchuravy commented Jul 17, 2024

mofeing commented Oct 11, 2024

jumerckx commented Oct 11, 2024 • edited Loading

mofeing commented Oct 11, 2024

wsmoses commented Oct 11, 2024

wsmoses commented Oct 11, 2024

jumerckx commented Oct 11, 2024

wsmoses commented Oct 11, 2024

mofeing commented Jun 10, 2024 •

edited

Loading

jumerckx Jun 29, 2024 •

edited

Loading

mofeing Jun 29, 2024 •

edited

Loading

jumerckx commented Jul 16, 2024 •

edited

Loading

jumerckx commented Oct 11, 2024 •

edited

Loading