Skip to content

Further optimize intermediates_to_table_indices #1457

@andyleiserson

Description

@andyleiserson

intermediates_to_table_indices works as follows:

  • It calls bits_to_table_indices, which takes three u128s each containing the value of one of three intermediates for 128 multiplications, and returns four u128s containing a table index in each nibble.
  • It then reorders those nibbles into bytes as its output. (Originally, the table lookup was done here, but additional optimization moved the table lookup elsewhere.)

It appears that bits_to_table_indices compiles to <200 instructions (fully unrolled with no loops or branches), while the rearranging of nibbles compiles to >1000 instructions (again, fully unrolled with no loops or branches). Implementing a single transpose-like operation covering both steps would probably be more efficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceThis affects protocol performance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions