Skip to content

Remove all cryptoxide calls from fake_full_tx()#286

Merged
zuzunker merged 1 commit intomasterfrom
tx-builder-optimization-remove-unncessary-crypto
Dec 12, 2021
Merged

Remove all cryptoxide calls from fake_full_tx()#286
zuzunker merged 1 commit intomasterfrom
tx-builder-optimization-remove-unncessary-crypto

Conversation

@rooooooooob
Copy link
Copy Markdown
Contributor

Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within fake_full_tx() were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds (in rust) just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple tx bodies (no witnesses) and serializing
them:

-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for fake_full_tx().

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.

Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within `fake_full_tx()` were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple txs and serializing
them:
```
-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx
```

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for `fake_full_tx()`.

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.
@rooooooooob rooooooooob requested a review from zuzunker December 10, 2021 01:00
@rooooooooob
Copy link
Copy Markdown
Contributor Author

I should note that those inner calls when I expanded the top few are inclusive of the underlying cryptoxide calls. The cryptoxide calls were something like 95%+ of the runtime. For exclusive samples not a single non-cryptoxide function took up more than a fraction of a percent of total samples.

This is also only the time for the transaction body creation (full TransactionBuilder flow + also serializing the resulting tx body and summing the total lengths of the generated CBOR to print out at the end so I could assure myself that the tx building or any other part wasn't being optimized out since this was a release build (with debug symbols) that I was profiling). This seems to line up with the massive difference in regular release builds (albeit in asm.js but I bet the difference is there in other compilation targets but they're just faster in general so no one notices if they're only building 1 tx) when someone removed the other crpyto code (I assume it was the worst offender) in #214

@zuzunker zuzunker added this to the 10.0.0 milestone Dec 12, 2021
@zuzunker zuzunker merged commit e0b33c6 into master Dec 12, 2021
@zuzunker zuzunker deleted the tx-builder-optimization-remove-unncessary-crypto branch December 12, 2021 23:17
@zuzunker zuzunker restored the tx-builder-optimization-remove-unncessary-crypto branch January 10, 2022 22:45
@zuzunker zuzunker deleted the tx-builder-optimization-remove-unncessary-crypto branch January 10, 2022 22:46
zuzunker pushed a commit that referenced this pull request Jan 10, 2022
Profiling the tx builder showed that even after we removed the
fake_key_root construction in #214 due to terrible performance in asm.js
(#213) that the cryptoxide calls within `fake_full_tx()` were up a
ridiculous amount of the runtime. I investigated this as we had a report
that it was taking up to hundreds of milliseconds just to make a tx and
I wanted to see if there would be any significant improvement by
migrating to an idiomatic rust API as discussed in #276 but from a
purely performance perspective in a release build (opt-level=3) this seemed to
be pretty minor, even with the unnecessary cloning forced by this
inside of the implementation.

This will need to be slightly modified once #273 gets merged as there are
some conflicts.

Previous perf results from building 100,000 simple txs and serializing
them:
```
-   53.64%        [.] <&cryptoxide::curve25519::Fe as
core::ops::arith::Mul>::mul
    - 53.63% 0xffffffffffffffff
      + 23.03% cardano_serialization_lib::tx_builder::min_fee
      - 21.32%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          - 15.21%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
              cardano_serialization_lib::tx_builder::min_fee
            + cardano_serialization_lib::tx_builder::fake_full_tx
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      - 7.78% ser_lib_perf::main
          + 6.11%
cardano_serialization_lib::tx_builder::TransactionBuilder::build
          + 1.67%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
      + 1.46% std::rt::lang_start_internal
-   19.40%        [.] cryptoxide::curve25519::Fe::square
    - 0xffffffffffffffff
      - 8.34% cardano_serialization_lib::tx_builder::min_fee
            cardano_serialization_lib::tx_builder::TransactionBuilder::build
            cardano_serialization_lib::tx_builder::TransactionBuilder::build_and_size
          + cardano_serialization_lib::tx_builder::fake_full_tx
      - 7.41%
cardano_serialization_lib::tx_builder::TransactionBuilder::add_change_if_needed
          + 5.58%
cardano_serialization_lib::tx_builder::TransactionBuilder::fee_for_output
          + 1.83%
cardano_serialization_lib::tx_builder::TransactionBuilder::min_fee
(inlined)
      + 2.77% ser_lib_perf::main
      + 0.88% std::rt::lang_start_internal
+    7.80%        [.] cryptoxide::curve25519::GePrecomp::maybe_set
+    5.13%        [.] cryptoxide::curve25519::Fe::invert
+    3.98%        [.] <&cryptoxide::curve25519::GeP3 as
core::ops::arith::Add<&cryptoxide::curve25519::GePrecomp>>::add
+    2.84%        [.] cryptoxide::curve25519::GePrecomp::select
+    1.22%        [.] cryptoxide::curve25519::ge_scalarmult_base
+    1.00%        [.]
cryptoxide::sha2::impl512::reference::digest_block_u64
      0.47%        [.] cryptoxide::curve25519::GeP2::dbl
      0.13%        [.] cryptoxide::curve25519::sc_muladd
      0.12%        [.] cryptoxide::curve25519::Fe::to_bytes
      0.12%        [.] cryptoxide::curve25519::sc_reduce
      0.10%        [.] cbor_event::se::Serializer<W>::write_type
      0.07%        [.]
cardano_serialization_lib::tx_builder::fake_full_tx
```

which shows that almost the entire runtime was spent on cryptoxide calls
which were only necessary for `fake_full_tx()`.

Running it again afterwards there is no overwhelming bottleneck anymore
and the remaining runtime was fairly distributed.

(cherry picked from commit e0b33c6)
zuzunker added a commit that referenced this pull request Jan 10, 2022
Remove all cryptoxide calls from fake_full_tx() (#286)
@zuzunker zuzunker mentioned this pull request Feb 6, 2022
40 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants