BCP improvements#704
Conversation
| buflen = (int)tds_strftime((TDS_CHAR *)(*p_data), 256, | ||
| bcpdatefmt, &when, 3); | ||
| buflen = (int)tds_strftime((TDS_CHAR*)(*p_data), 256, | ||
| bcpdatefmt, &when, prec); |
There was a problem hiding this comment.
This fixes a bug in that MS datetime2 would be truncated to 3 digits of precision even if the value held higher precision.
A side-effect of this change is that the bcp out -c file will differ between 1.5 and 1.6 for MS datetime2 columns: with this change, it always prints 7 digits of precision even if they are just trailing zeroes.
This actually matches the behaviour of the Microsoft bcp client.
It would be possible to have our freebcp save space on disk by truncating any trailing zeroes here (including omitting the decimal point entirely if the value were an exact number of seconds), your thoughts on that? The file is still read back in correctly without trailing zeroes.
As an aside freebcp still does not generate an exactly identical output to the Microsoft bcp when talking to a MSSQL server: the Microsoft bcp also drops the leading zeroes on NUMERIC/DECIMAL/MONEY values between -1 and 1. (But ASE bcp retains that leading zero!).
Helps with the heap-tracing report Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
However, does not yet report on rows silently dropped from hostfile. Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
|
Rebased to master . Note some issues on #686 are still outstanding, for example |
| RETCODE bcp_exec(DBPROCESS * dbproc, DBINT * rows_copied); | ||
| DBBOOL bcp_getl(LOGINREC * login); | ||
| RETCODE bcp_options(DBPROCESS * dbproc, int option, BYTE * value, int valuelen); | ||
| RETCODE bcp_options(DBPROCESS * dbproc, int option, BYTE * value, size_t valuelen); |
There was a problem hiding this comment.
OK, leave this change out
freddy77
left a comment
There was a problem hiding this comment.
What was wrong with the previous code? Can you explain in the commit message?
freddy77
left a comment
There was a problem hiding this comment.
Did you test with MS dblib ?
What about Sybase behavior ?
From your comment it looks like the correct way would be to understand the reason of tds_bcp_fread failed and set the proper error instead.
freddy77
left a comment
There was a problem hiding this comment.
Why was it wrong before ?
Is this consistent with Sybase ? Microsoft ?
freddy77
left a comment
There was a problem hiding this comment.
It looks like you are changing the behavior here to then change again in the next commit.
|
|
||
| /* Validate that data does not exceed expected maximum (could cause | ||
| * wrong row length to be sent on wire) */ | ||
| if (hostcol->column_len > 0 && collen > hostcol->column_len) |
There was a problem hiding this comment.
What is the hostcol->column_len > 0 condition trying to check ?
There was a problem hiding this comment.
Checking size of data only makes sense when there is a known maximum size to check against.
hostcol->column_len can be -1 (for fixed types) or 0 (meaning no data should be sent for this column)
|
Merged some commits |
@freddy77 Which code is these comments referring to? It doesn't show me in the conversation & they don't appear on the other views. |
When I wrote them they were attached to specific commits. The interface is not showing the relationship apparently. |
Signed-off-by: Matt McNabb <matthew.mcnabb@vmssoftware.com>
In VMS we build with MSDBLIB enabled but the same binary appears to work correctly in all tests against both SQL Server, and ASE 16 . When running test suite in Linux or Windows I don't enable that option. I guess the Not sure how easy (and/or useful) it would be to try and get more accurate failure information out of |
|
@freddy77 waiting for anything here? Can rebase if desired. |
|
Sorry, just busy. I merged About diff --git a/src/tds/bulk.c b/src/tds/bulk.c
index f959e3958..96225900a 100644
--- a/src/tds/bulk.c
+++ b/src/tds/bulk.c
@@ -888,18 +888,18 @@ tds5_bcp_add_variable_columns(TDSBCPINFO *bcpinfo, tds_bcp_get_col_data get_col_
if (ncols && !bcpinfo->datarows_locking) {
TDS_UCHAR *poff = rowbuffer + row_pos;
- unsigned int pfx_top = offsets[ncols] >> 8;
+ unsigned int pfx_top = offsets[ncols] / 256u;
tdsdump_log(TDS_DBG_FUNC, "ncols=%u poff=%p [%u]\n", ncols, poff, offsets[ncols]);
- if (offsets[ncols] / 256 == offsets[ncols - 1] / 256)
+ if (offsets[ncols] / 256u == offsets[ncols - 1] / 256u)
*poff++ = ncols + 1;
/* this is some kind of run-length-prefix encoding */
while (pfx_top) {
unsigned int n_pfx = 1;
for (i = 0; i <= ncols ; ++i)
- if ((offsets[i] >> 8) < pfx_top)
+ if (offsets[i] / 256u < pfx_top)
++n_pfx;
*poff++ = n_pfx;
--pfx_top;
@@ -916,8 +916,8 @@ tds5_bcp_add_variable_columns(TDSBCPINFO *bcpinfo, tds_bcp_get_col_data get_col_
for (col = ncols; col-- > 0;) {
/* The DOL BULK format has a much simpler row table -- it's just a 2-byte length for every
* non-fixed column (does not have the extra "offset after the end" that the basic format has) */
- rowbuffer[row_pos++] = offsets[col] % 256;
- rowbuffer[row_pos++] = offsets[col] / 256;
+ rowbuffer[row_pos++] = offsets[col] % 256u;
+ rowbuffer[row_pos++] = offsets[col] / 256u;
tdsdump_log(TDS_DBG_FUNC, "[DOL BULK offset table] col=%u offset=%u\n", col, offsets[col]);
}There are already division by The others mainly they miss some more explanation and I would write some tests.
|
The line 902 change
The previous code didn't work at all, the logic was wrong in the attempted application. I also added a comment for suggested improvement: after the requested number of rows have been processed,
By "MS bcp" I mean bcp.exe that SQL Server Client SDK (v17.0 in my case). But as you suggest , my comment is not correct for that utility. My main point here is that "Attempt to bulk-copy an oversized row to the SQL Server" is neither correct nor useful for the case of failing a type conversion while reading the data file, so that should be changed. The ASE bcp utility gives the following error (among others) for the case of a missing field terminator: I think you are suggesting improving the error message further by analyzing why tds_bcp_fread failed in more detail? Maybe possible but that could be a future change. "Syntax error in the source field" seems very likely to be the cause of any failure to read the hostfile, other than lower level read errors.
Well, the error condition here is an invalid I agree that my message would not make sense to someone who's not a programmer, but this situation probably only arises when someone does call Not a big deal though and this could be left out, the programmer could do as I did and search the code base for the error message etc.
Say for example the data file uses a length prefix to specify that the next 8 bytes in the file are the value for this type, then we should be reading 8 bytes and not overriding it with an expected size calculated another way. It's never correct to read say 6 bytes from a length-prefixed field of length 8 and then just carry on, because whatever data was in the last 2 bytes will be tried to be read as the next field and things go off the rails quickly. It could give bizarre errors, or (even worse) end up succeeding but writing garbage into the database. My change will prioritize the length prefix if it exists, falling back to the fixed size for fixed types if there was no prefix. The previous code was the other way around. It's possible for a format file, or a call to The situation might also arise if we had a mismatch between After my change, a length-prefixed fixed type with the wrong length will carry on through to Not sure about testing this, as we currently don't have any "front door" freebcp tests with a format file. Perhaps we could add another round to
The rest of the diff lines in the commit are refactoring for readability -- I've separated the This is a plausible error scenario, for example the DBA narrows a VARCHAR max field width and then a user tries to
That explicitly catches the case described just above, after my fix to make sure These last three commits could be squashed, I separated them to hopefully make more sense at the review stage. (I guess this is what you referred to in your older comment about one commit modifying another commit) I'll try to come up with a test suite entry for this last case of data file char field exceeding database size. |
|
About fc14e3d, I look for possible error in documentation and I didn't find it, then I looked for possible messages from Sybase library and I didn't find it. At the end I wrote a test program and, surprise surprise, no errors is returned by either if (is_fixed_type(vartype) && varlen != 0)
varlen = tds_get_size_by_type(vartype); |
Not sure about this - the function documentation (in bcp.c comment block ahead of I think the existing logic is correct, just recommending a more appropriate error message for invalid input. Changing the function to set Although, looking through the usages of that value, I am not sure how Note - |
|
Recapping this one: everything is merged except for fc14e3d , and the next group of 3 commits regarding reading the hostfile correctly for a "bcp in". I think fc14e3d can be accepted as-is: it's an improvement of error message for when a user specifically calls |
|
@freddy77 as this is now a fair way behind master, I'll recommend closing this PR, and I can reformulate and resubmit the three "bcp in" behaviour commits as a new PR, with better documentation and test cases. I added format file testing to t0016 and can present a clearer overview of the problems and solutions now , as well as test cases for my suggested fixes . |
|
@mmcnabb-vms I kept a branch updated, see https://github.com/freddy77/freetds/tree/mmc_bcp_improve. |
I left out one change from my earlier pull request #685 , to do with validating hostfile data, because that territory is covered by other commits on Ucko PR#558 .
This PR should be processable independently of the BCP Defaults PR although I developed them both together and then separated into more manageably-sized PRs . If you merge/rebase these two PRs, there is one conflict: a common on bcp_defaults deletes some code from the start of
bcp_options()and a commit on this PR modifies the first line after the deleted text; git doesn't automatically resolve that.