Skip to content

BCP Out performance#705

Closed
mmcnabb-vms wants to merge 7 commits into
FreeTDS:masterfrom
mmcnabb-vms:bcp_performance
Closed

BCP Out performance#705
mmcnabb-vms wants to merge 7 commits into
FreeTDS:masterfrom
mmcnabb-vms:bcp_performance

Conversation

@mmcnabb-vms
Copy link
Copy Markdown
Contributor

@mmcnabb-vms mmcnabb-vms commented Feb 22, 2026

This PR also includes the MS DATETIME2 bcp precision fix, from #704 , but otherwise should be independent of my other two BCP PRs.

  • Default to not fflush the TDSDUMP file after each line: not necessary unless we're debugging a segfault; otherwise this increases performance by a factor of 3 when logging which is quite nice to have if logging a big operation, or if logging is enabled for the test suite.
  • Buffer output for "bcp out"
  • Optimize int32-string conversion and bcp date formatting

Speed of my bcp out test case increased by nearly factor of 2 in Windows, and factor of 10 in VMS. have included individual performance gains in the commit message for each one. Profiling showed more than 30% of runtime spent in tds_strftime on both systems.

I tried applying the output buffer stream to the log file but it didn't help (replacing vfprintf with vasprintf/fwrite hurts).

Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
	MSVC profiler: reduce time spent in fwrite
	from 41% to 17.3% for a test case of 400MB of data.

Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
	Increase log performance by factor of 3.
	Set TDSDUMP_FFLUSH to anything, to flush after each line.
	Should only be needed when debugging segfaults.

Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
	MSVC profiler shows 5% gain in bcp -c from this change

Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
	MSVC profiler showed 35% in tds_strftime for a test case.
	Now 2.7% spent in _bcp_strftime.

Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
	Gain of about 1% from baseline.

Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Comment thread src/tds/tds_checks.c
break; /* found */
}
}
#endif /* ENABLE_EXTRA_CHECKS */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem right. The "extra checks" are expected to be expensive and we don't want to risk having in the final builds. It is this some kind of leftover of some middle development?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to let you define ENABLE_EXTRA_CHECKS for a .c file you're working on without having to turn it on for the whole project (and rebuild the whole project). Not really important but I found it useful while testing _bcp_strftime

Comment thread src/tds/convert.c
Comment thread include/freetds/convert.h
size_t tds_u32toafast_right(char out[10], uint32_t v);
size_t tds_u32toafast(char out[10], uint32_t v);
size_t tds_i32toafast(char out[11], int32_t v);
void tds_02dfast(char out[2], int v); /* v must be 0-99 (non negative) */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the caller and to avoid "surprises" why not saying that range is 0-63 and do a & 63 in tds_02dfast ?

I would also use an underscore before "fast", like tds_u32toa_fast_right.

Copy link
Copy Markdown
Contributor Author

@mmcnabb-vms mmcnabb-vms Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, although it might also be surprising to another coder who assumed 02d would mean 0-99 ! perhaps a more meaningful function name is possible.

Note that tds_u32toafast_right does use the full 0-99 capability of the underlying function

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

v = TDS_CLAMP(v, 0, 99);

??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

v = TDS_CLAMP(v, 0, 99);

??

OK but for performance I think it would be best to to only have this on code paths where it is needed (e.g. tds_u32toafast_right does not need it since that call site guarantees the value is in range already). So perhaps it could go in tds_02dfast but not in tds02dfast. And include a TDS_LIKELY.

You would have noticed I had two functions tds02dfast and tds_02dfast, this is so that tds02dfast can be inlined in convert.c . It would be nice to have it inlinable everywhere but I wasn't sure about the support for having inline function bodies in header files across all the compilers and linkers that we need to support.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I was suggesting the v &= 63 for tds_02dfast, it's just a single machine instruction with no conditions.

Comment thread include/freetds/stream.h
Comment thread include/freetds/stream.h
Comment thread src/dblib/bcp.c
Comment thread src/tds/convert.c

/* more characters are required because we replace %z with up to 7 digits */
our_format = tds_new(char, strlen(format) + 1 + 5 + 1);
our_format = tds_new(char, strlen(format) + 7);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope any modern compiler could optimize the above without hints, otherwise I would use

our_format = tds_new(char, strlen(format) + (1 + 5 + 1));

just to be on the safer side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. And maybe expand the comment a bit to explain what the 1 + 5 + 1 actually represents (we can work it out of course but it'd make for quicker reading of the code)

Comment thread src/tds/convert.c
Comment on lines +3267 to +3268
if (pad > 3)
memset(buf + 3, '0', pad - 3);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another hunk of code you first set all buffer and then use tds_u32toafast_right. I think filling all buffer with 0es should be faster with most modern compilers as the compiler will do the filling using less instructions.

Comment thread src/tds/log.c

static FILE* tdsdump_append(void);

static bool g_dumpfile_fflush = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file is the only file using the g_ prefix, I would avoid extending this style.

Comment thread src/tds/stream.c
Comment thread src/tds/stream.c
Comment thread src/dblib/bcp.c
}

/** Fast strftime for default bcp format */
static int _bcp_strftime(TDS_UCHAR* buf, size_t maxsize, const char* format, const TDSDATEREC* dr, int prec)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that buf is always converted to char * I would declare this as char *. Maybe renaming cbuf to pbuf. I can do it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I was thinking in terms of avoiding the need for a cast at the call site.

Comment thread src/tds/convert.c
/* Fill buffer from the end so it comes out in the right order */
while (v >= 100u) {
uint32_t q = v / 100u;
uint32_t r = v - q * 100u;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

uint32_t r = v % 100u;
```
?? Modern compiler should be able to optimize depending on specific processor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I have checked with VSI C on Itanium and both versions do compute r using a mod-100 "magic multiplication", although the generated assembly isn't exactly identical for the entire computation .

Comment thread src/tds/convert.c
p -= 2;
tds02dfast(p, v);
}
else if (v > 0u)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that 0 will be formatted as "" (empty string) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which is fine for the existing call sites. I left that case out to avoid an extra branch that wasn't needed but if you want to add in that case I doubt it would have an measurable performance penalty.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you are using this function when converting a number to string so 0 will be converted to "" instead of "0". The code difference is just to remove the last if (v > 0u).

Comment thread src/tds/log.c
@freddy77
Copy link
Copy Markdown
Contributor

Merged part of the PR.

@freddy77
Copy link
Copy Markdown
Contributor

Merged log part

@freddy77
Copy link
Copy Markdown
Contributor

Merged, minor changes (mainly style).

@freddy77 freddy77 closed this Feb 28, 2026
@mmcnabb-vms mmcnabb-vms deleted the bcp_performance branch April 7, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants