Skip to content

Optimization and analysis

Eric Holk edited this page Oct 2, 2013 · 2 revisions

Harlan is a moderately high level language for GPU computing. As such, it relies on some fairly aggressive compiler optimizations to get good performance. This page catalogues the optimizations we think we will need, why we think we need them, and a rough idea of how to implement them.

Loop Invariant Code Motion

This is a standard optimization that moves things out of loops that don't change. This will help Harlan out in examples like these:


(kernel ((i (iota 10)))
  (kernel ((j (iota 10)))
    ...))

Remove nested kernels turns this into something like this (with some extra temporaries):


(for (i 0 10)
  (kernel ((j (iota 10)))
    ...))

Lifting the complex expressions gives us something like...


(for (i 0 10)
  (let t_1 (iota 10))
  (kernel ((j t_1))
    ...))

At this point, we recompute t_1 on every iteration of the loop, and we have to copy this over to the GPU for each kernel invocation.

This optimization pass would transform this into something like...


(let t_1 (iota 10))
(for (i 0 10)
  (kernel ((j t_1))
    ...))

Now we're only computing t_1 once, and the later GPU scheduling code will hopefully be smart enough to only insert one transfer instead of many.

This pass should go right after remove-nested-kernels, because then we have all the loops, but iota hasn't been expanded yet.

GPU Liveness Analysis

This is a variant of liveness analysis, but it tracks a CPU-live and GPU-live property for each variable assignment. We can use this information to place data transfers in intelligent places.