Skip to content

[nvptx-run] Add --verbose/-v#27

Open
vries wants to merge 1 commit into
SourceryTools:masterfrom
vries:verbose-2
Open

[nvptx-run] Add --verbose/-v#27
vries wants to merge 1 commit into
SourceryTools:masterfrom
vries:verbose-2

Conversation

@vries

@vries vries commented Oct 13, 2020

Copy link
Copy Markdown
Contributor

No description provided.

Add a --verbose flag to nvptx-run, such that we have:
...
$ gcc ~/hello.c
$ nvptx-none-run -v ./a.out
Total device memory: 4242604032 (3.95 GiB)
Initial free device memory: 4222156800 (3.93 GiB)
Program args reservation (effective): 1048576 (1.00 MiB)
Set stack size limit: 131072 (128.00 KiB)
Stack size limit reservation (estimated): 1342177280 (1.25 GiB)
Stack size limit reservation (effective): 1423966208 (1.32 GiB)
Free device memory: 2797142016 (2.60 GiB)
Set heap size limit: 268435456 (256.00 MiB)
hello
...
@vries

vries commented Oct 13, 2020

Copy link
Copy Markdown
Contributor Author

Note: contains "[nvptx-run] Fix greedy option parsing" to avoid merge conflict.

@vries vries changed the title Verbose 2 [nvptx-run] Add --verbose/-v Oct 13, 2020

@tschwinge tschwinge left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vries, thanks. I have a few questions, please have a look.

Comment thread nvptx-run.c
Comment on lines +289 to +291

size_t free_mem;
size_t dummy;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should dummy move inside the if (verbose)?

Comment thread nvptx-run.c
r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0);
fatal_unless_success (r, "could not set stack limit");

r = cuMemGetInfo (&free_mem, &dummy);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, doesn't dummy here (when given a better name) make obsolete the earlier cuDeviceTotalMem call?

Or, is total amount of memory available for allocation by the CUDA context vs. total amount of memory available on the device intentional?

Comment thread nvptx-run.c
Comment on lines +294 to +295
/* Set stack size limit to 0 to get more accurate free_mem. */
r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From cuCtxSetLimit: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65a I can't easily tell the rationale here.

So, should we add more commentary for this, or point to an external URL if that makes sense?

Comment thread nvptx-run.c
Comment on lines +333 to +337
size_t free_mem_update;
r = cuMemGetInfo (&free_mem_update, &dummy);
fatal_unless_success (r, "could not get free memory");
report_val (stderr, "Program args reservation (effective)",
free_mem - free_mem_update);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this difference computation implicitly assume that nothing else is using the GPU concurrently? (Which is a wrong assumption?) Or, does every process/CUDA context always have available all the GPU memory -- I don't remember the details, and have not yet looked that up.

Comment thread nvptx-run.c
Comment on lines +377 to +381
size_t free_mem_update;
r = cuMemGetInfo (&free_mem_update, &dummy);
fatal_unless_success (r, "could not get free memory");
report_val (stderr, "Stack size limit reservation (effective)",
free_mem - free_mem_update);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants