fix memory leak: free g_tid in Driver::runTest#32
Conversation
| // 0 represents the Data-Maintenance thread | ||
| for (int i = 0; i <= iUsers; i++) { | ||
| if (pthread_join(g_tid[i], NULL) != 0) { | ||
| free(g_tid); |
There was a problem hiding this comment.
This is destroying all the handles to all the threads if there is an issues joining one of them right?
There was a problem hiding this comment.
Yes sir, free(g_tid) frees the whole array, so we lose every pthread_t in one go, including threads we haven’t joined yet.
That’s intentional on this path: pthread_join failing is treated as fatal and we throw immediately afterward, so the normal driver run doesn’t continue. We still free(g_tid) so we don’t leak if the exception is ever caught or for static analysis.
I’ve added a short comment above that free in the error branch documenting this tradeoff.
There was a problem hiding this comment.
This feels a little heavy handed if threads are still running, or rather maybe what we should really do based should be based on the situation. Needs some more thought...
There was a problem hiding this comment.
I think what went wrong with my current approach is that if we free(g_tid) as soon as one pthread_join fails, we throw away every handle in one shot. Other threads can still be running or at least still need a join, so that’s heavy-handed (as pointed out by you also): we’ve lost the ability to join them even though we might still want to try.
|
@sahitya-chandra Since you are vying for a Google Summer of Code mentorship, I'm going to respond in an unconventional way. What I was looking for is discussion that shows understanding of what the program is meant to be doing and what it should be doing instead. Your code pushes after each of my comments indicates you think there is a binary response to my questions. |
|
You’re right @markwkm sir, I treated your notes as “fix the code” instead of stepping back and explaining what it does, first What this code is doing: |
I think the better approach would be to attempt |
I think that sounds better. Now if you're going to go through that effort, it may be worth identifying further the type of errors that maybe thrown and what is worth doing about them. |
|
@markwkm
So instead of treating every non-zero return the same way, I think the join loop should capture the actual return code per-thread and log which thread failed and with what error. If this approach looks reasonable, I'll update the branch with the implementation. |
I agree, this sound reasonable. |
|
Thanks sir, I will go ahead with the implementation now |
Handle thread join failure by freeing thread IDs and throwing an error.
Instead of treating all pthread_join failures the same, capture the return code per-thread and respond based on the error: - ESRCH (no such thread): log a warning and continue, since the thread is already gone and there is nothing left to wait for. - EINVAL/EDEADLK or any other error: log the error and flag as fatal, since these indicate a programming bug. After attempting to join every thread, free g_tid and only then throw if a fatal error was seen. This keeps all handles valid until every join has been tried.
4ced9a9 to
033bc38
Compare
| continue; | ||
|
|
||
| ostringstream osErr; | ||
| if (rc == ESRCH) { |
There was a problem hiding this comment.
I think readability is more important here, so don't use multiple if statements. Additional comments that reiterate your rationale for doing it this way would be good too.
There was a problem hiding this comment.
Thanks for the feedback sir. I think a switch statement would read much better here than chained if/else if blocks , one case per error code makes it easier to scan at a glance. I'll also add comments inside each case to explain why that error matters in this context, so the rationale is clear to anyone reading the code later
I'll push that update now
Use a switch statement instead of if/else chain for readability. Each case handles a distinct pthread_join error code: - ESRCH: thread already gone, non-fatal warning - EINVAL: thread not joinable, indicates a programming bug - EDEADLK: circular join detected, serious logic error g_tid is freed after all joins are attempted, then throws only if a fatal error was seen.
| throw new CThreadErr( | ||
| CThreadErr::ERR_THREAD_JOIN, "Driver::RunTest"); | ||
| int rc = pthread_join(g_tid[i], NULL); | ||
| if (rc == 0) |
There was a problem hiding this comment.
I think this can be part of the switch. Any reason not to?
There was a problem hiding this comment.
I had kept rc == 0 outside the switch as a simple fast path since there’s nothing to handle in the success case, but I agree folding it into the switch is cleaner overall.
I’ll update it.
| ostringstream osErr; | ||
| switch (rc) { | ||
| case 0: | ||
| continue; |
There was a problem hiding this comment.
Comment here, would be good too.
There was a problem hiding this comment.
Sure sir. I'll add a comment there. I think something like Thread joined successfully, nothing to report keeps it consistent with the other cases
| free(g_tid); | ||
| g_tid = NULL; | ||
|
|
||
| if (fatal_error) { |
There was a problem hiding this comment.
Noting so I don't forget later. If we make it this far, I'm wondering if throwing an error is necessary if we're past the expected test duration.
There was a problem hiding this comment.
If the test duration has already passed, I assume the useful work is done and the threads should be exiting on their own. Throwing at that point feels like it could mask an otherwise clean run. I think we could check against stop_time and just log the error instead of throwing if we're past the expected duration. Happy to tackle it in this PR if you'd prefer, otherwise I can do it as a follow-up.
There was a problem hiding this comment.
I like that idea to check against stop_time. Log a warning or error as appropriate depending on the condition and explain the situation.
There was a problem hiding this comment.
Thanks sir. I've pushed the update in commit 0c602a6. Now if pthread_join reports an error after stop_time has already passed, the loop logs a warning and returns normally instead of throwing, since the useful work of the run is already done and the threads are expected to be winding down. If we're still within the scheduled duration, it throws as before so the caller knows the run aborted early. I've added a comment above the check explaining the rationale.
If the scheduled test duration has already elapsed, the useful work of the run is complete and worker threads are expected to be winding down. A pthread_join failure at that point is likely just noise from a thread that has already exited, so propagating it as an error would mask an otherwise clean run. Log a warning and return normally in that case; if we are still within the scheduled duration, throw as before so the caller knows the run aborted early.
Summary
Fixes a memory leak in
src/Driver/Driver.cppwhereg_tidwas allocated withmallocbut never freed.Changes
g_tidafter all worker threads are joined at the end ofrunTest().g_tidbefore throwing whenpthread_joinfails so the allocation is not leaked on the error path.g_tid = NULLafter freeing to avoid use-after-free.#include <cstdlib>forfree().