Parallelize compute_chunks_operands #294

sragss · 2024-04-15T18:20:04Z

For a 64 core machine at a cycle count of ~16M, Jolt spends ~1.8% of its time in a segment called compute_chunks_opreands here.

This segment allocates and computes C different chunks for each instruction. For example for the EQ instruction we split the input operands X, Y into 4 8-bit chunks (WORD_SIZE / C). We then can compute EQ over each chunk individually.

Idea for acceleration: Split chunks_x and chunks_y into mutable slices. Iterate over each and compute the values in parallel writing the the slice indexes directly.

It may be helpful to review the tracing strategy for performance testing.

The text was updated successfully, but these errors were encountered:

sragss · 2024-04-16T16:05:32Z

Looks like the else branch of the loop can be removed as well: https://github.com/a16z/jolt/blob/main/jolt-core/src/jolt/vm/mod.rs#L533

githubsands · 2024-04-16T17:14:39Z

Hey @sragss I'm open to taking on this issue.

sragss · 2024-04-16T17:16:34Z

Please! Feel free to ask questions here.

moodlezoup · 2024-05-08T18:32:27Z

@githubsands any update on this?

lognorman20 · 2024-06-25T08:30:59Z

Is this issue still open?

codercody · 2024-07-25T16:14:11Z

Hi @sragss, I'm having difficulty in reproducing the 1.8% (I'm only getting .09%). Right now I run

cargo run -p jolt-core --release -- trace --name sha2-chain --format chrome --pcs hyrax

and then getting stats in perfetto using SQL

select
name, dur, total_dur, (dur * 100. / total_dur) as dur_pct
from slices
cross join (
select dur as total_dur
from slices
where name='Example_E2E'
)
where name = 'compute_chunks_operands';

name | dur | total_dur | dur_pct
-- | -- | -- | --
compute_chunks_operands | 470260292 | 496995499417 | 0.09462063389942933

So this is only getting .09%. I'm running on an 8-core M1 with 242 cycle count. So I had a couple of questions:

What specific command did you use to generate your benchmarks?
Is there a better way to use perfetto for checking how much time is spent in each segment?

sragss · 2024-07-26T00:31:01Z

242 cycle count is too small to get an idea of relative asymptotic performance. Can you try a bigger example – maybe in the 128k-512k range?

Also thought sha2-chain was around 3M cycles, did you modify it to fit in a smaller amount of RAM?

codercody · 2024-07-26T14:09:45Z

Sorry, silly mistake. When I looked up how to find the cycle count, the thing it actually pointed me to was the battery cycle count 🤦 I didn't realize you were referring to trace length - 3,632,556 in my case. I ran it on the original sha2-chain.

But re: the questions above, what are the steps for reproducing your benchmarks? Or possibly do you have any ideas why I might be getting a much lower %?

mahmudsudo · 2024-10-24T00:26:13Z

Hi , can i take on this issue ?

sragss added good first issue Good for newcomers help wanted Extra attention is needed labels Apr 15, 2024

moodlezoup assigned githubsands Apr 16, 2024

moodlezoup unassigned githubsands May 29, 2024

moodlezoup assigned mahmudsudo Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize compute_chunks_operands #294

Parallelize compute_chunks_operands #294

sragss commented Apr 15, 2024 •

edited

Loading

sragss commented Apr 16, 2024

githubsands commented Apr 16, 2024 •

edited

Loading

sragss commented Apr 16, 2024

moodlezoup commented May 8, 2024

lognorman20 commented Jun 25, 2024

codercody commented Jul 25, 2024

sragss commented Jul 26, 2024

codercody commented Jul 26, 2024

mahmudsudo commented Oct 24, 2024

Parallelize compute_chunks_operands #294

Parallelize compute_chunks_operands #294

Comments

sragss commented Apr 15, 2024 • edited Loading

sragss commented Apr 16, 2024

githubsands commented Apr 16, 2024 • edited Loading

sragss commented Apr 16, 2024

moodlezoup commented May 8, 2024

lognorman20 commented Jun 25, 2024

codercody commented Jul 25, 2024

sragss commented Jul 26, 2024

codercody commented Jul 26, 2024

mahmudsudo commented Oct 24, 2024

sragss commented Apr 15, 2024 •

edited

Loading

githubsands commented Apr 16, 2024 •

edited

Loading