fix memory leak: free g_tid in Driver::runTest by sahitya-chandra · Pull Request #32 · osdldbt/dbt5

sahitya-chandra · 2026-02-22T19:59:14Z

Summary

Fixes a memory leak in src/Driver/Driver.cpp where g_tid was allocated with malloc but never freed.

Changes

Free g_tid after all worker threads are joined at the end of runTest().
Free g_tid before throwing when pthread_join fails so the allocation is not leaked on the error path.
Set g_tid = NULL after freeing to avoid use-after-free.
Add #include <cstdlib> for free().

markwkm · 2026-02-23T21:23:45Z

 	// 0 represents the Data-Maintenance thread
 	for (int i = 0; i <= iUsers; i++) {
 		if (pthread_join(g_tid[i], NULL) != 0) {
+			free(g_tid);


This is destroying all the handles to all the threads if there is an issues joining one of them right?

Yes sir, free(g_tid) frees the whole array, so we lose every pthread_t in one go, including threads we haven’t joined yet.

That’s intentional on this path: pthread_join failing is treated as fatal and we throw immediately afterward, so the normal driver run doesn’t continue. We still free(g_tid) so we don’t leak if the exception is ever caught or for static analysis.

I’ve added a short comment above that free in the error branch documenting this tradeoff.

This feels a little heavy handed if threads are still running, or rather maybe what we should really do based should be based on the situation. Needs some more thought...

I think what went wrong with my current approach is that if we free(g_tid) as soon as one pthread_join fails, we throw away every handle in one shot. Other threads can still be running or at least still need a join, so that’s heavy-handed (as pointed out by you also): we’ve lost the ability to join them even though we might still want to try.

markwkm · 2026-03-24T17:47:10Z

@sahitya-chandra Since you are vying for a Google Summer of Code mentorship, I'm going to respond in an unconventional way. What I was looking for is discussion that shows understanding of what the program is meant to be doing and what it should be doing instead. Your code pushes after each of my comments indicates you think there is a binary response to my questions.

sahitya-chandra · 2026-03-24T18:51:15Z

You’re right @markwkm sir, I treated your notes as “fix the code” instead of stepping back and explaining what it does, first

What this code is doing: runTest starts the data-maintenance thread plus iUsers customer threads and stores their pthread_t values in g_tid. The join loop is meant to block until all of those threads have finished, so we don’t return from the test while work is still running.

sahitya-chandra · 2026-03-24T19:01:57Z

I think what went wrong with my current approach is that if we free(g_tid) as soon as one pthread_join fails, we throw away every handle in one shot. Other threads can still be running or at least still need a join, so that’s heavy-handed (as pointed out by you also): we’ve lost the ability to join them even though we might still want to try.

I think the better approach would be to attempt pthread_join on every slot (0 … iUsers) even if an earlier join fails, then free(g_tid) once, and only then throw if any join failed. That keeps handles valid until we’ve tried every join, avoids leaking g_tid, and reports failure after we’ve done what we can.

markwkm · 2026-04-01T15:43:42Z

I think the better approach would be to attempt pthread_join on every slot (0 … iUsers) even if an earlier join fails, then free(g_tid) once, and only then throw if any join failed. That keeps handles valid until we’ve tried every join, avoids leaking g_tid, and reports failure after we’ve done what we can.

I think that sounds better. Now if you're going to go through that effort, it may be worth identifying further the type of errors that maybe thrown and what is worth doing about them.

sahitya-chandra · 2026-04-01T18:57:08Z

@markwkm pthread_join can fail with three errors, and each one means something different in this program's context:

ESRCH - no thread with that ID exists. This is the most realistic failure here. It could happen if a pthread_t slot was never properly written by pthread_create (though right now both entryCustomerWorkerThread and entryDMWorkerThread call exit(1) on creation failure, so if we reach the join loop all slots should be valid). It could also mean the thread already terminated and was reaped. Either way the thread is gone, there's nothing to wait for, so this is more of a warning than a fatal condition.

EINVAL - the thread is not joinable (detached), or another thread is already trying to join it. This shouldn't happen because pthread_attr_init defaults to joinable and nothing in the code detaches these threads. If it does occur, it points to a programming bug.

EDEADLK - deadlock detected (a thread trying to join itself, or a circular join). Can't happen here since the main thread is joining worker threads that never join back. I think if it did happen, it's a serious logic error

So instead of treating every non-zero return the same way, I think the join loop should capture the actual return code per-thread and log which thread failed and with what error. ESRCH could be treated as a warning (the thread is already gone, nothing to clean up), while EINVAL or EDEADLK would indicate a real bug worth throwing for. After the loop finishes and g_tid is freed, we throw only if a genuinely unexpected error was seen.

If this approach looks reasonable, I'll update the branch with the implementation.

markwkm · 2026-04-03T17:56:39Z

So instead of treating every non-zero return the same way, I think the join loop should capture the actual return code per-thread and log which thread failed and with what error. ESRCH could be treated as a warning (the thread is already gone, nothing to clean up), while EINVAL or EDEADLK would indicate a real bug worth throwing for. After the loop finishes and g_tid is freed, we throw only if a genuinely unexpected error was seen.

If this approach looks reasonable, I'll update the branch with the implementation.

I agree, this sound reasonable.

sahitya-chandra · 2026-04-03T18:02:39Z

Thanks sir, I will go ahead with the implementation now

Handle thread join failure by freeing thread IDs and throwing an error.

Instead of treating all pthread_join failures the same, capture the return code per-thread and respond based on the error: - ESRCH (no such thread): log a warning and continue, since the thread is already gone and there is nothing left to wait for. - EINVAL/EDEADLK or any other error: log the error and flag as fatal, since these indicate a programming bug. After attempting to join every thread, free g_tid and only then throw if a fatal error was seen. This keeps all handles valid until every join has been tried.

markwkm · 2026-04-07T21:20:24Z

+			continue;
+
+		ostringstream osErr;
+		if (rc == ESRCH) {


I think readability is more important here, so don't use multiple if statements. Additional comments that reiterate your rationale for doing it this way would be good too.

Thanks for the feedback sir. I think a switch statement would read much better here than chained if/else if blocks , one case per error code makes it easier to scan at a glance. I'll also add comments inside each case to explain why that error matters in this context, so the rationale is clear to anyone reading the code later

I'll push that update now

Use a switch statement instead of if/else chain for readability. Each case handles a distinct pthread_join error code: - ESRCH: thread already gone, non-fatal warning - EINVAL: thread not joinable, indicates a programming bug - EDEADLK: circular join detected, serious logic error g_tid is freed after all joins are attempted, then throws only if a fatal error was seen.

markwkm · 2026-04-08T23:04:58Z

-			throw new CThreadErr(
-					CThreadErr::ERR_THREAD_JOIN, "Driver::RunTest");
+		int rc = pthread_join(g_tid[i], NULL);
+		if (rc == 0)


I think this can be part of the switch. Any reason not to?

I had kept rc == 0 outside the switch as a simple fast path since there’s nothing to handle in the success case, but I agree folding it into the switch is cleaner overall.

I’ll update it.

markwkm · 2026-04-13T17:52:13Z

+		ostringstream osErr;
+		switch (rc) {
+		case 0:
+			continue;


Comment here, would be good too.

Sure sir. I'll add a comment there. I think something like Thread joined successfully, nothing to report keeps it consistent with the other cases

markwkm · 2026-04-13T17:53:36Z

+	free(g_tid);
+	g_tid = NULL;
+
+	if (fatal_error) {


Noting so I don't forget later. If we make it this far, I'm wondering if throwing an error is necessary if we're past the expected test duration.

If the test duration has already passed, I assume the useful work is done and the threads should be exiting on their own. Throwing at that point feels like it could mask an otherwise clean run. I think we could check against stop_time and just log the error instead of throwing if we're past the expected duration. Happy to tackle it in this PR if you'd prefer, otherwise I can do it as a follow-up.

I like that idea to check against stop_time. Log a warning or error as appropriate depending on the condition and explain the situation.

Thanks sir. I've pushed the update in commit 0c602a6. Now if pthread_join reports an error after stop_time has already passed, the loop logs a warning and returns normally instead of throwing, since the useful work of the run is already done and the threads are expected to be winding down. If we're still within the scheduled duration, it throws as before so the caller knows the run aborted early. I've added a comment above the check explaining the rationale.

If the scheduled test duration has already elapsed, the useful work of the run is complete and worker threads are expected to be winding down. A pthread_join failure at that point is likely just noise from a thread that has already exited, so propagating it as an error would mask an otherwise clean run. Log a warning and return normally in that case; if we are still within the scheduled duration, throw as before so the caller knows the run aborted early.

markwkm reviewed Feb 23, 2026

View reviewed changes

markwkm marked this pull request as ready for review March 10, 2026 23:51

markwkm marked this pull request as draft March 10, 2026 23:51

markwkm marked this pull request as ready for review March 10, 2026 23:52

markwkm marked this pull request as draft March 10, 2026 23:52

sahitya-chandra requested a review from markwkm March 23, 2026 19:02

sahitya-chandra marked this pull request as ready for review March 23, 2026 19:18

markwkm requested changes Apr 1, 2026

View reviewed changes

sahitya-chandra requested a review from markwkm April 1, 2026 18:57

sahitya-chandra added 4 commits April 3, 2026 23:40

fix memory leak: free g_tid in Driver::runTest

93dc0b8

Fix thread join failure handling in Driver.cpp

710d87c

Handle thread join failure by freeing thread IDs and throwing an error.

Driver: free g_tid after joining all threads

247a56e

sahitya-chandra force-pushed the fix/driver-g_tid-memory-leak branch from 4ced9a9 to 033bc38 Compare April 3, 2026 18:10

markwkm requested changes Apr 7, 2026

View reviewed changes

sahitya-chandra requested a review from markwkm April 8, 2026 14:27

markwkm requested changes Apr 8, 2026

View reviewed changes

Driver: keep pthread_join handling in switch

51bee98

sahitya-chandra requested a review from markwkm April 9, 2026 06:05

markwkm reviewed Apr 13, 2026

View reviewed changes

Driver: add comment to success case in pthread_join switch

c97fbf7

sahitya-chandra requested a review from markwkm April 20, 2026 15:02

Conversation

sahitya-chandra commented Feb 22, 2026

Summary

Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markwkm Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markwkm commented Mar 24, 2026

Uh oh!

sahitya-chandra commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahitya-chandra commented Mar 24, 2026

Uh oh!

markwkm commented Apr 1, 2026

Uh oh!

sahitya-chandra commented Apr 1, 2026

Uh oh!

markwkm commented Apr 3, 2026

Uh oh!

sahitya-chandra commented Apr 3, 2026

Uh oh!

markwkm Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

markwkm Mar 23, 2026 •

edited

Loading

sahitya-chandra commented Mar 24, 2026 •

edited

Loading

markwkm Apr 7, 2026 •

edited

Loading