Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some data-races into mirage-crypto #186

Closed
wants to merge 4 commits into from

Conversation

dinosaure
Copy link
Member

This PR is a draft about a possible usage of mirage-crypto into a parallel context. I noticed that we have some issues about some global states & buffers and when it come to initiate, for instance, multiple TLS connections on multiple cores, It's clearly a draft when we must check the performance implication of these changes. Moreover, it corresponds to a specific execution path.

At least, if people get some issues like MAC mismatch or something else where their codes can safely be executed in a cooperative way (lwt, async, etc.), it's probably due to something found here.

@hannesm
Copy link
Member

hannesm commented Oct 6, 2023

I don't quite understand the first commit -- why should there be a RNG per-domain (which clearly is not OCaml 4 compatible), instead of a global one?

@hannesm
Copy link
Member

hannesm commented Oct 6, 2023

The second commit I do understand, but wonder whether there is a "OCaml 4 and OCaml 5"-compatible path for having "Domain-local" variables (something like __thread in C)!? No, I won't go into conditional compilation.

This way, the performance with a single domain would be preserved, and for each other domain there'll be an overhead of 32 bytes being allocated.

@dinosaure
Copy link
Member Author

dinosaure commented Oct 8, 2023

As discussed with @hannesm, the idea behind the rng is to be global to any domains. So we decided to use Atomic instead of Domain.DLS. Moreover, the Atomic module is available for OCaml 4 which is good.

PS: Atomic is available since OCaml 4.12.

@hannesm
Copy link
Member

hannesm commented Oct 8, 2023

Thanks. I'm indeed fine with the 4.12+ support (and dropping older compilers).

I'm still unsure what the path for the globals should be (i.e. do some performance evaluation and figure it out)? Maybe domain_shims is a path forward, but I suspect (since it uses Thread), the MirageOS support would hate it...

@dinosaure
Copy link
Member Author

I launched some benchmark on aes-ctr-128 (due to this change), aes-gcm-128 (due to this change) and aes-ghash-128 (due to this change) - des-ctr-128 is missing however. This my result compared to ~main (46e71a9):

accel: XOR AES GHASH

* [aes-128-gcm]
       16:  65.256904 MB/s  (8786342 iters in 2.054 s)
       64:  238.578844 MB/s  (7941960 iters in 2.032 s)
      256:  808.170459 MB/s  (6647521 iters in 2.008 s)
     1024:  1839.861906 MB/s  (3757449 iters in 1.994 s)
     8192:  3336.179118 MB/s  (803729 iters in 1.882 s)

* [aes-128-ghash]
       16:  69.636869 MB/s  (9507976 iters in 2.083 s)
       64:  258.362054 MB/s  (8637182 iters in 2.040 s)
      256:  963.694438 MB/s  (7905357 iters in 2.003 s)
     1024:  3014.782106 MB/s  (6306502 iters in 2.043 s)
     8192:  6788.888416 MB/s  (1744836 iters in 2.008 s)

* [aes-128-ctr]
       16:  242.536654 MB/s  (33095815 iters in 2.082 s)
       64:  878.416635 MB/s  (29508578 iters in 2.050 s)
      256:  3037.865836 MB/s  (25005688 iters in 2.010 s)
     1024:  5293.985044 MB/s  (10842553 iters in 2.000 s)
     8192:  7870.457790 MB/s  (2016033 iters in 2.001 s)

~parallel

* [aes-128-gcm]
       16:  44.885331 MB/s  (5944066 iters in 2.021 s)
       64:  161.671535 MB/s  (5299711 iters in 2.001 s)
      256:  578.412348 MB/s  (4819811 iters in 2.034 s)
     1024:  1543.478945 MB/s  (3240493 iters in 2.050 s)
     8192:  3249.304875 MB/s  (816116 iters in 1.962 s)

* [aes-128-ghash]
       16:  46.297067 MB/s  (6525604 iters in 2.151 s)
       64:  176.591256 MB/s  (5665401 iters in 1.958 s)
      256:  680.229924 MB/s  (5569479 iters in 1.999 s)
     1024:  2202.700324 MB/s  (4731711 iters in 2.098 s)
     8192:  6202.747909 MB/s  (1608705 iters in 2.026 s)

* [aes-128-ctr]
       16:  210.024114 MB/s  (27641298 iters in 2.008 s)
       64:  746.705936 MB/s  (24780138 iters in 2.026 s)
      256:  2736.762543 MB/s  (22477390 iters in 2.005 s)
     1024:  5268.239211 MB/s  (10759462 iters in 1.994 s)
     8192:  7825.310160 MB/s  (1991428 iters in 1.988 s)

I made a better representation where I calculated reg (reg = (new - old) / old * 100):

encrypt/decrypt on a 16-bytes block

aes-128-gcm aes-128-ghash aes-128-ctr
main 65.256904 MB/s 69.636869 MB/s 242.536654 MB/s
parallel 44.885331 MB/s 46.297067 MB/s 210.024114 MB/s
reg -32% -33% -13%

encrypt/decrypt on a 64-bytes block

aes-128-gcm aes-128-ghash aes-128-ctr
main 238.578844 MB/s 258.362054 MB/s 878.416635 MB/s
parallel 161.671535 MB/s 176.591256 MB/s 746.705936 MB/s
reg -32% -31% -15%

encrypt/decrypt on a 256-bytes block

aes-128-gcm aes-128-ghash aes-128-ctr
main 808.170459 MB/s 963.694438 MB/s 3037.865836 MB/s
parallel 578.412348 MB/s 680.229924 MB/s 2736.762543 MB/s
reg -28% -29% -9%

encrypt/decrypt on a 1024-bytes block

aes-128-gcm aes-128-ghash aes-128-ctr
main 1839.861906 MB/s 3014.782106 MB/s 5293.985044 MB/s
parallel 1543.478945 MB/s 2202.700324 MB/s 5268.239211 MB/s
reg -16% -26% -0.4%

encrypt/decrypt on a 8192-bytes block

aes-128-gcm aes-128-ghash aes-128-ctr
main 3336.179118 MB/s 6788.888416 MB/s 7870.457790 MB/s
parallel 3249.304875 MB/s 6202.747909 MB/s 7825.310160 MB/s
reg -2% -8% -0.5%

hannesm added a commit to hannesm/mirage-crypto that referenced this pull request Jan 24, 2024
Instead, use a temporary buffer. Contradicts mirage#186
@dinosaure dinosaure marked this pull request as ready for review January 24, 2024 11:17
@dinosaure
Copy link
Member Author

dinosaure commented Jan 24, 2024

My last commit introduces a Domain_shims module which simulate Domain.DLS to allocate "global" buffers per domains. By this way, we can parallelize some cipher computations safely. A little run of bench/speed‧exe shows that we don't have big regression with my last patch on OCaml 4.14.1 and OCaml 5.

EDIT: don't take in account my last commit. It seems that even if we globalize these buffers to domains, we still have some race conditions with our HTTPS implementation.

@hannesm hannesm mentioned this pull request Jan 24, 2024
hannesm added a commit that referenced this pull request Feb 3, 2024
* Fortuna.add: don't allocate a 2 byte cstruct on each call

Instead, use a temporary buffer. Contradicts #186

* minor fix
@hannesm
Copy link
Member

hannesm commented Mar 19, 2024

since we are now using bytes/string (esp. with #205 #214), we should revisit this PR -- my gut feeling is that having temporary global small string/bytes isn't observable.

if this turns out to be true, let's just remove all the global buffers (no need to mess around with domainslib or thread/domain-local storage). WDYT? //cc @reynir @palainp

@hannesm hannesm mentioned this pull request Mar 19, 2024
23 tasks
Copy link
Member

@reynir reynir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @hannesm. I reviewed the global buffers and they seem to be in the range of 16-32 bytes. It's likely they can fit in the minor heap then and I suspect there would be very little cost to use a new buffer every time.

Comment on lines 50 to 54
let register_source name =
let n = List.length !_sources in
let n = List.length (Atomic.get _sources) in
let source = (n, name) in
_sources := source :: !_sources;
Atomic.set _sources (source :: (Atomic.get _sources));
source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suffers from race conditions if register_sources is called concurrently. Maybe use Atomic.compare_and_set?

Suggested change
let register_source name =
let n = List.length !_sources in
let n = List.length (Atomic.get _sources) in
let source = (n, name) in
_sources := source :: !_sources;
Atomic.set _sources (source :: (Atomic.get _sources));
source
let register_source name =
let rec loop sources =
let n = List.length (Atomic.get _sources) in
let source = (n, name) in
if Atomic.compare_and_set _sources (source :: sources) sources then
source
else loop (Atomic.get _sources)
in
loop (Atomic.get _sources)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please bear in mind that I haven't read up on the OCaml 5 memory model yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just move to use a set, and accept any concurrent updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a set is now in #218 (certainly there may still be races - but do we care?)

@hannesm
Copy link
Member

hannesm commented Mar 19, 2024

FWIW, another issue with mirage-crypto is the DES code which is not reentrant-safe (since it uses in C some global data for the key derivation stuff).

I'm not sure whether it is worth to rewrite the code, though.

This was referenced Mar 19, 2024
@hannesm
Copy link
Member

hannesm commented Mar 19, 2024

with #219 we do no longer use global buffers - remaining is: DES C code (key derivation); rng code where mutable global values are inside.

@hannesm hannesm mentioned this pull request Mar 19, 2024
2 tasks
@hannesm
Copy link
Member

hannesm commented Mar 19, 2024

This has been a very good starting point for discussions. Since some code has been revised so far (#219, #218), I'll close this specific PR. I also opened #220 for tracking the current state.

@hannesm hannesm closed this Mar 19, 2024
@dinosaure
Copy link
Member Author

Thanks, I will try to retest mirage-cryptowith parallelism and see if it's work!

@dinosaure dinosaure deleted the parallel branch March 20, 2024 08:16
@dinosaure
Copy link
Member Author

I completely forget but a part of this PR is to use Atomic for the RNG. And for the parallelism context, it still required to protect our initialization with an atomic value. Should I make a new PR with this specific part?

@hannesm
Copy link
Member

hannesm commented Mar 25, 2024

I completely forget but a part of this PR is to use Atomic for the RNG. And for the parallelism context, it still required to protect our initialization with an atomic value. Should I make a new PR with this specific part?

Yes, that'd be great.

hannesm added a commit to hannesm/opam-repository that referenced this pull request Jun 29, 2024
CHANGES:

### Breaking changes

* mirage-crypto: Poly1305 API now uses string (mirage/mirage-crypto#203 @hannesm)
* mirage-crypto: Poly1305 no longer has type alias "type mac = string"
  (mirage/mirage-crypto#232 @hannesm)
* mirage-crypto: the API uses string instead of cstruct (mirage/mirage-crypto#214 @reynir @hannesm)
* mirage-crypto: Hash module has been removed. Use digestif if you need hash
  functions (mirage/mirage-crypto#213 @hannesm)
* mirage-crypto: the Cipher_block and Cipher_stream modules have been removed,
  its contents is inlined:
  Mirage_crypto.Cipher_block.S -> Mirage_crypto.Block
  Mirage_crypto.Cipher_stream.S -> Mirage_crypto.Stream
  Mirage_crypto.Cipher_block.AES.CTR -> Mirage_crypto.AES.CTR
  (mirage/mirage-crypto#225 @hannesm, suggested in mirage/mirage-crypto#224 by @reynir)
* mirage-crypto-pk: s-expression conversions for private and public keys (Dh,
  Dsa, Rsa) have been removed. You can use PKCS8 for encoding and decoding
  `X509.{Private,Public}_key.{en,de}code_{der,pem}` (mirage/mirage-crypto#208 @hannesm)
* mirage-crypto-pk: in the API, Cstruct.t is no longer present. Instead,
  string is used (mirage/mirage-crypto#211 @reynir @hannesm)
* mirage-crypto-rng: the API uses string instead of Cstruct.t. A new function
  `generate_into : ?g -> bytes -> ?off:int -> int -> unit` is provided
  (mirage/mirage-crypto#212 @hannesm @reynir)
* mirage-crypto-ec: remove NIST P224 support (mirage/mirage-crypto#209 @hannesm @Firobe)
* mirage-crypto: in Uncommon.xor_into the arguments ~src_off and ~dst_off are
  required now (mirage/mirage-crypto#232 @hannesm), renamed to unsafe_xor_into
  (98f01b14f5ebf98ba0e7e9c2ba97ec518f90fddc)
* mirage-crypto-pk, mirage-crypto-rng: remove type alias "type bits = int"
  (mirage/mirage-crypto#236 @hannesm)

### Bugfixes

* mirage-crypto (32 bit systems): CCM with long adata (mirage/mirage-crypto#207 @reynir)
* mirage-crypto-ec: fix K_gen for bitlen mod 8 != 0 (reported in mirage/mirage-crypto#105 that
  P521 test vectors don't pass, re-reported mirage/mirage-crypto#228, fixed mirage/mirage-crypto#230 @Firobe)
* mirage-crypto-ec: zero out bytes allocated for Field_element.zero (reported
  mirleft/ocaml-x509#167, fixed mirage/mirage-crypto#226 @dinosaure)

### Data race free

* mirage-crypto (3DES): avoid global state in key derivation (mirage/mirage-crypto#223 @hannesm)
* mirage-crypto-rng: use atomic instead of reference to be domain-safe (mirage/mirage-crypto#221
  @dinosaure @reynir @hannesm)
* mirage-crypto, mirage-crypto-rng, mirage-crypto-pk, mirage-crypto-ec:
  avoid global buffers, use freshly allocated strings/bytes instead, avoids
  data races (mirage/mirage-crypto#186 mirage/mirage-crypto#219 @dinosaure @reynir @hannesm)

### Other changes

* mirage-crypto: add {de,en}crypt_into functions (and unsafe variants) to allow
  less buffer allocations (mirage/mirage-crypto#231 @hannesm)
* mirage-crypto-rng-miou: new package which adds rng support with miou
  (mirage/mirage-crypto#227 @dinosaure)
* PERFORMANCE mirage-crypto: ChaCha20/Poly1305 use string instead of Cstruct.t,
  ChaCha20 interface unchanged, performance improvement roughly 2x
  (mirage/mirage-crypto#203 @hannesm @reynir)
* mirage-crypto-ec, mirage-crypto-pk, mirage-crypto-rng: use digestif for
  hashes (mirage/mirage-crypto#212 mirage/mirage-crypto#215 @reynir @hannesm)
* mirage-crypto-rng: use a set for entropy sources instead of a list
  (mirage/mirage-crypto#218 @hannesm)
* mirage-crypto-rng-mirage: provide a module type S (for use instead of
  mirage-random in mirage) (mirage/mirage-crypto#234 @hannesm)
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

### Breaking changes

* mirage-crypto: Poly1305 API now uses string (mirage/mirage-crypto#203 @hannesm)
* mirage-crypto: Poly1305 no longer has type alias "type mac = string"
  (mirage/mirage-crypto#232 @hannesm)
* mirage-crypto: the API uses string instead of cstruct (mirage/mirage-crypto#214 @reynir @hannesm)
* mirage-crypto: Hash module has been removed. Use digestif if you need hash
  functions (mirage/mirage-crypto#213 @hannesm)
* mirage-crypto: the Cipher_block and Cipher_stream modules have been removed,
  its contents is inlined:
  Mirage_crypto.Cipher_block.S -> Mirage_crypto.Block
  Mirage_crypto.Cipher_stream.S -> Mirage_crypto.Stream
  Mirage_crypto.Cipher_block.AES.CTR -> Mirage_crypto.AES.CTR
  (mirage/mirage-crypto#225 @hannesm, suggested in mirage/mirage-crypto#224 by @reynir)
* mirage-crypto-pk: s-expression conversions for private and public keys (Dh,
  Dsa, Rsa) have been removed. You can use PKCS8 for encoding and decoding
  `X509.{Private,Public}_key.{en,de}code_{der,pem}` (mirage/mirage-crypto#208 @hannesm)
* mirage-crypto-pk: in the API, Cstruct.t is no longer present. Instead,
  string is used (mirage/mirage-crypto#211 @reynir @hannesm)
* mirage-crypto-rng: the API uses string instead of Cstruct.t. A new function
  `generate_into : ?g -> bytes -> ?off:int -> int -> unit` is provided
  (mirage/mirage-crypto#212 @hannesm @reynir)
* mirage-crypto-ec: remove NIST P224 support (mirage/mirage-crypto#209 @hannesm @Firobe)
* mirage-crypto: in Uncommon.xor_into the arguments ~src_off and ~dst_off are
  required now (mirage/mirage-crypto#232 @hannesm), renamed to unsafe_xor_into
  (98f01b14f5ebf98ba0e7e9c2ba97ec518f90fddc)
* mirage-crypto-pk, mirage-crypto-rng: remove type alias "type bits = int"
  (mirage/mirage-crypto#236 @hannesm)

### Bugfixes

* mirage-crypto (32 bit systems): CCM with long adata (mirage/mirage-crypto#207 @reynir)
* mirage-crypto-ec: fix K_gen for bitlen mod 8 != 0 (reported in mirage/mirage-crypto#105 that
  P521 test vectors don't pass, re-reported mirage/mirage-crypto#228, fixed mirage/mirage-crypto#230 @Firobe)
* mirage-crypto-ec: zero out bytes allocated for Field_element.zero (reported
  mirleft/ocaml-x509#167, fixed mirage/mirage-crypto#226 @dinosaure)

### Data race free

* mirage-crypto (3DES): avoid global state in key derivation (mirage/mirage-crypto#223 @hannesm)
* mirage-crypto-rng: use atomic instead of reference to be domain-safe (mirage/mirage-crypto#221
  @dinosaure @reynir @hannesm)
* mirage-crypto, mirage-crypto-rng, mirage-crypto-pk, mirage-crypto-ec:
  avoid global buffers, use freshly allocated strings/bytes instead, avoids
  data races (mirage/mirage-crypto#186 mirage/mirage-crypto#219 @dinosaure @reynir @hannesm)

### Other changes

* mirage-crypto: add {de,en}crypt_into functions (and unsafe variants) to allow
  less buffer allocations (mirage/mirage-crypto#231 @hannesm)
* mirage-crypto-rng-miou: new package which adds rng support with miou
  (mirage/mirage-crypto#227 @dinosaure)
* PERFORMANCE mirage-crypto: ChaCha20/Poly1305 use string instead of Cstruct.t,
  ChaCha20 interface unchanged, performance improvement roughly 2x
  (mirage/mirage-crypto#203 @hannesm @reynir)
* mirage-crypto-ec, mirage-crypto-pk, mirage-crypto-rng: use digestif for
  hashes (mirage/mirage-crypto#212 mirage/mirage-crypto#215 @reynir @hannesm)
* mirage-crypto-rng: use a set for entropy sources instead of a list
  (mirage/mirage-crypto#218 @hannesm)
* mirage-crypto-rng-mirage: provide a module type S (for use instead of
  mirage-random in mirage) (mirage/mirage-crypto#234 @hannesm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants