Block to cpu move to _execute_section_block #675

dkazanc · 2026-01-19T12:53:01Z

This moves block.to_cpu() from writing doing the transfer in dataset_store, which doesn't seem the right place. I've added some logic around how to recognise that the block needs to be moved. Also to be able to clock with the monitor the device to host transfer time.
The resulting csv file reflects the intended behavior. Few tests to fix.

Checklist

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have made corresponding changes to the documentation

dkazanc · 2026-01-20T10:27:09Z

@yousefmoazzam I think it is ready but I couldn't get my head around one last failing test: test_execute_section_for_block. Can you have a look please? cheers

yousefmoazzam · 2026-01-20T11:22:48Z

httomo/runner/task_runner.py

+        self, method: MethodWrapper, block: DataSetBlock, convert_gpu_block_to_cpu: bool
    ) -> DataSetBlock:
        start = time.perf_counter_ns()
        block = method.execute(block)
-        end = time.perf_counter_ns()
+
        if block.is_last_in_chunk:
            self.append_side_outputs(method.get_side_output())
+
+        if convert_gpu_block_to_cpu:
+            with catchtime() as t:
+                block.to_cpu()
+            method.gpu_time.device2host += t.elapsed
+
+        end = time.perf_counter_ns()


Hmm, having a parameter about block transferring in a method called _execute_method() feels a bit odd. I'd naively expect a method called _execute_method() to do stuff about method execution only, and not do data transfer things but rather have the transfer happen somewhere else.

The idea behind this change I believe is to transfer a GPU block at the end of a section to CPU, before the block is given to the dataset store, is that correct? If so, it seems like the transfer might make more sense to go into a method handling section execution, the _execute_section_block() method maybe.

If the data transfer is done there instead of here, then there's no need to pass this convert_gpu_block_to_cpu parameter, and the tests actually don't need any changes to keep passing I think.

This approach also has the advantage that the data transfer from GPU to CPU after the method completes doesn't accidentally get put into the clocking for the method execution (the clocking of the method execution is in _execute_method(), whereas the data transfer from GPU to CPU would happen in _execute_section_block()), which is probably preferable. Currently, it looks like the data transfer form GPU to CPU is included in the method execution time, which we may not want.

Therefore, I'd tentatively suggest moving the following block of code:

if convert_gpu_block_to_cpu: with catchtime() as t: block.to_cpu() method.gpu_time.device2host += t.elapsed

to the _execute_section_block() method after line 254, and see how that feels.

thanks for the insight @yousefmoazzam . I originally implemented exactly as you suggest, so that block to cpu transfer happens in _execute_section_block(). The problem was, however, is to make report_method_block to work correctly. Outside of _execute_method I had no access to the clocked time that is related to block transfers. So it was then I decided to pass this flag to _execute_method and make it aware of the last block processing.

So with that: method.gpu_time.device2host += t.elapsed I'm adding the expected D2H time while being withing the monitor reach. Even though outside of the transfer function within the method.execute, it is still recorded. Also note that it is wrapped within start-end time for it to be a part of the whole business about I/O and method kernel execution for the block. That is how we normally record the transfers, and the numbers do seem to add up while I'm interpreting the CSV file.

I guess with your proposition the whole monitor business should be moved outside of _execute_method to _execute_section_block ? Would it something we should try then?

Ah I see, yeah I understand what you mean: the call to report_method_block() requires the GPU to CPU transfer time, and that call happens inside _execute_method(), so if the GPU to CPU transfer happens after _execute_method(), then that info can't be passed to report_method_block().

Yeah, you're right that my proposition would probably require moving report_method_block(). Maybe it's worth doing, but perhaps not right now - it might be better to do that reorganising of where things are clocked when we have a fuller picture of things.

Let me know what you'd like to do, but feel free to ignore my initial suggestion and defer it till later, I could understand not wanting to tackle this properly right now.

I've moved the code, is it something like this we'd prefer? Haven't fixed the tests yet. I guess it will be unfixing the fixed ones )

yousefmoazzam · 2026-01-20T16:26:32Z

httomo/runner/task_runner.py

+            start = time.perf_counter_ns()
            block = self._execute_method(method, block)
+
+            if convert_gpu_block_to_cpu:
+                with catchtime() as t:
+                    block.to_cpu()
+                method.gpu_time.device2host += t.elapsed
+
+            end = time.perf_counter_ns()


Yeah, this latest change in general looks like a reasonable way of moving the call to report_method_block().

If we'd like method execution time to not include the transfer time (I'm not sure if we wanted to make this change at all, or in the future but not yet), one small detail could be to move the end of the method clocking to be before the transfer from GPU to CPU.

dkazanc added 10 commits January 16, 2026 16:03

moves block to cpu from block writing

edba30e

some logic refinement to deal with calculate_stats

3ac5632

fix

fffdda9

moving block to cpu

2cb1616

fix2

5db4611

corr1

294f9f8

corr2

05ef4dc

fixing a bug due to data_checker name repetitions

11def43

Merge branch 'main' into block_to_cpu_move

89fa14b

fixing 2 tests

0dc5e13

dkazanc marked this pull request as ready for review January 20, 2026 10:25

dkazanc requested a review from yousefmoazzam January 20, 2026 10:27

remove gputimeinfo import

9cc3790

yousefmoazzam reviewed Jan 20, 2026

View reviewed changes

dkazanc and others added 3 commits January 20, 2026 14:07

Merge branch 'main' into block_to_cpu_move

7ef6e3f

Include new param in expected function calls in test

4a80b1b

moving monitor related code to execute section block

c820a63

yousefmoazzam reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Block to cpu move to _execute_section_block #675

Block to cpu move to _execute_section_block #675

dkazanc commented Jan 19, 2026

Uh oh!

dkazanc commented Jan 20, 2026

Uh oh!

yousefmoazzam Jan 20, 2026 •

edited

Loading

Uh oh!

dkazanc Jan 20, 2026 •

edited

Loading

Uh oh!

yousefmoazzam Jan 20, 2026 •

edited

Loading

Uh oh!

dkazanc Jan 20, 2026

Uh oh!

yousefmoazzam Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Block to cpu move to _execute_section_block #675

Are you sure you want to change the base?

Block to cpu move to _execute_section_block #675

Conversation

dkazanc commented Jan 19, 2026

Checklist

Uh oh!

dkazanc commented Jan 20, 2026

Uh oh!

yousefmoazzam Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkazanc Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yousefmoazzam Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkazanc Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

yousefmoazzam Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yousefmoazzam Jan 20, 2026 •

edited

Loading

dkazanc Jan 20, 2026 •

edited

Loading

yousefmoazzam Jan 20, 2026 •

edited

Loading