Skip to content

[FEATURE] Investigate Cython for Hemera loop performance #36

@cmorman89

Description

@cmorman89

[FEATURE] Investigate C Rust for Hemera loop performance

Python is extremely slow when it comes to loops due to its dynamic typing. In contrast, Rust and C are statically typed. Originally, I was going to use C via Cython, but I've decided to use Rust for its memory safety instead.

Utilizing Cython Rust for the loops in Hemera has already shown greatly improved performance.

Additional Context

As can be seen below, the _generate_string_buffer method is the most time-consuming method in the Hemera terminal effects module. This method is responsible for converting the delta frame to its color-formatted str representation form before printing to the terminal.

Line profiling and cProfile has shown that the loop iteration takes ~75% of the total time, which is a significant bottleneck when you consider that the remaining 25% is spent on heavily-penalized system calls for writing to the terminal. This should be the reverse.

The line profile output shows just how challenging of a task this is for Python: the loop had to iterate 6,472,336 subpixel pairs by the 197th frame within the 2.95s total time. Several optimizations have been made to minimize the number of memory retrievals and even conditional checks, with optimization results listed below. However, even with these numbers, Rust should still be able to provide a significant performance boost.


Addendum

Current Optimizations in Processing Subpixel Pairs:

Stage Pixel Pairs Processed Reduction
Total input subpixel pairs 10,423,270 0%
Only pairs in changed rows 6,472,336 37.9%
Only changed subpixel pairs 203,758 96.1%

Current Optimizations in ANSI Code Generation:

ANSI Code Count of Codes Issued Count without Change Detection Reduction
Cursor Movement 43,202 6,472,336 99.3%
Foreground Color 66,449 6,472,336 99.0%
Background Color 54,511 6,472,336 99.2%
All ANSI Codes 164,162 19,417,008 99.2%
Code to optimize with Rust:
# Iterate frame
        for y in range(h):
            row_buffer.clear()

            # Check if row has changes
            if row_sums[y] > empty_sum:
                for x in range(w):
                    sum_color = sum_frame[y, x]

                    # If the current pixel is printable, get the fg color and calculate the bg color
                    if sum_color != empty_sum:
                        fg_color = delta_frame[y, x]
                        bg_color = sum_color - fg_color
                        # bg_color = delta_frame[1, y, x]

                        # Skip cursor movement if it's the same row/column as the last printed pixel
                        if last_subpixel_sum == empty_sum:
                            row_buffer.append(f"\033[{y + 1};{x + 1}H")

                        # Only write color change sequences when necessary (skip if same as last)
                        # Foreground color check/caching
                        if fg_color != last_ansi_fg_color:
                            row_buffer.append(ansi_fg[fg_color])
                            last_ansi_fg_color = fg_color
                        # Background color check/caching
                        if bg_color != last_ansi_bg_color:
                            row_buffer.append(ansi_bg[bg_color])
                            last_ansi_bg_color = bg_color

                        # Add the printed character
                        row_buffer.append("▀")

                    # Cache the last sum
                    last_subpixel_sum = sum_color

                # Add the row buffer for the changed row
                buffer.write("".join(row_buffer) + "\n")

        # Output the accumulated buffer to stdout
        self.write_to_term(buffer.getvalue())
        self.flush_to_term()
Benchmarks for Before Rust Optimization:
Total time: 2.94979 s
File: /home/charles/projects/nyx-engine/nyx/hemera_term_fx/hemera_term_fx.py
Function: _generate_string_buffer at line 134

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   134                                               def _generate_string_buffer(self, delta_frame: np.ndarray):
   135                                                   """Convert the delta frame to its color-formatted `str` representation form before printing
   136                                                   to the terminal.
   137                                                   Args:
   138                                                       delta_frame (np.ndarray): The delta frame to process and print.
   139                                                   """
   140                                                   # Calculate sum of fg + bg
   141       197    4969184.0  25224.3      0.2          delta_frame, sum_frame = self.sum_bg(delta_frame)
   142                                                   # Calculate the sum of each row
   143       197    6658238.0  33798.2      0.2          row_sums = np.sum(sum_frame, axis=1)
   144                                                   # Start the string buffer
   145       197     349312.0   1773.2      0.0          buffer = io.StringIO()
   146                                                   # Initialize loop variables outside the loop
   147       197     355082.0   1802.4      0.0          empty_pixel = np.uint8(0)
   148       197     205343.0   1042.4      0.0          empty_sum = np.uint16(0)
   149       197      30633.0    155.5      0.0          fg_color, bg_color = empty_pixel, empty_sum
   150       197      17450.0     88.6      0.0          last_ansi_fg_color, last_ansi_bg_color = empty_pixel, empty_sum
   151       197      16754.0     85.0      0.0          sum_color, last_subpixel_sum = empty_sum, empty_sum
   152       197      59706.0    303.1      0.0          h, w = delta_frame.shape
   153       197      28575.0    145.1      0.0          row_buffer = []
   154       197      25603.0    130.0      0.0          ansi_fg = self.ansi_fg
   155       197      21032.0    106.8      0.0          ansi_bg = self.ansi_bg
   156                                           
   157                                                   # Iterate frame
   158     21867    2102357.0     96.1      0.1          for y in range(h):
   159     21670    4101178.0    189.3      0.1              row_buffer.clear()
   160                                           
   161                                                       # Check if row has changes
   162     21670    4708098.0    217.3      0.2              if row_sums[y] > empty_sum:
   163   6472336  556593912.0     86.0     18.9                  for x in range(w):
   164   6458880  936312570.0    145.0     31.7                      sum_color = sum_frame[y, x]
   165                                           
   166                                                               # If the current pixel is printable, get the fg color and calculate the bg color
   167   6458880  660925447.0    102.3     22.4                      if sum_color != empty_sum:
   168    203758   35263674.0    173.1      1.2                          fg_color = delta_frame[y, x]
   169    203758   32254690.0    158.3      1.1                          bg_color = sum_color - fg_color
   170                                                                   # bg_color = delta_frame[1, y, x]
   171                                           
   172                                                                   # Skip cursor movement if it's the same row/column as the last printed pixel
   173    203758   21897722.0    107.5      0.7                          if last_subpixel_sum == empty_sum:
   174     43202   14534454.0    336.4      0.5                              row_buffer.append(f"\033[{y + 1};{x + 1}H")
   175                                           
   176                                                                   # Only write color change sequences when necessary (skip if same as last)
   177                                                                   # Foreground color check/caching
   178    203758   21848698.0    107.2      0.7                          if fg_color != last_ansi_fg_color:
   179     66449   10345818.0    155.7      0.4                              row_buffer.append(ansi_fg[fg_color])
   180     66449    6274707.0     94.4      0.2                              last_ansi_fg_color = fg_color
   181                                                                   # Background color check/caching
   182    203758   20488517.0    100.6      0.7                          if bg_color != last_ansi_bg_color:
   183     54511    8241801.0    151.2      0.3                              row_buffer.append(ansi_bg[bg_color])
   184     54511    5274209.0     96.8      0.2                              last_ansi_bg_color = bg_color
   185                                           
   186                                                                   # Add the printed character
   187    203758   19928811.0     97.8      0.7                          row_buffer.append("▀")
   188                                           
   189                                                               # Cache the last sum
   190   6458880  526746257.0     81.6     17.9                      last_subpixel_sum = sum_color
   191                                           
   192                                                           # Add the row buffer for the changed row
   193     13456   10964123.0    814.8      0.4                  buffer.write("".join(row_buffer) + "\n")
   194                                           
   195                                                   # Output the accumulated buffer to stdout
   196       197   37649331.0 191113.4      1.3          self.write_to_term(buffer.getvalue())
   197       197     601186.0   3051.7      0.0          self.flush_to_term()

Metadata

Metadata

Assignees

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions