This repository contains the implementation of the Fast Multiplication algorithm in Bitcoin for two 254-bit numbers.
You can run all the tests by simply writing:
cargo test
However, more concretely, to verify the performance and the number of operations, you can run the following command. We also specify where you can find the corresponding unit test in the project.
Command | Description | Location |
---|---|---|
cargo test -- --nocapture test_254_bit_windowed_widening_optimized_mul |
Test our widening multiplication algorithm | test.rs |
cargo test -- --nocapture test_254_bit_narrow_mul_w_width |
Test our narrow multiplication algorithm | test.rs |
cargo test -- --nocapture test_254_bit_windowed_lazy_widening_mul |
Test BitVM's widening multiplication algorithm (extended by us) | test.rs |
cargo test -- --nocapture test_254_bit_naive_widening_mul |
Test BitVM's narrow multiplication algorithm (a bit optimized by us) | test.rs |
cargo test -- --nocapture test_255_bit_cmpeq_widening_mul |
Test cmpeq 's widening multiplication algorithm |
test.rs |
cargo test -- --nocapture --ignored debug_mul_performance_comparison |
Compare the performance of several multiplication algorithms used | test.rs |
The two primary optimizations used are:
- Using the windowed method with
w=4
for multiplication. - Improving the doubling step.
The windowed method is a well-known optimization for multiplication. It reduces the number of additions with an additional
cost to generate the lookup table. Namely, we use the base 1<<w
for the windowed method and based on the decomposition
coefficient d
at each step, we add the corresponding value from the lookup table. The lookup table is generated by
precomputing the values of d*z
for all d
in the range {0, 1, ..., 1<<w-1}
and given integer z
. This way, we only have roughly
b/(1<<w)
additions, where b
is the number of bits in the number, while the number of doubling steps remains the same.
The doubling step was easy to optimize, though: we noticed that the original implementation was not optimal since
it implemented double(a)
as add(a, a)
. However, we can do better by not zipping the same number with itself, but
simply duplicating the limb at each step and carrying the overflow. This way, we can significantly reduce the number of operations
since the doubling step is used 254 times in the multiplication algorithm.