-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[MDEV-31414] Implement optional lengths for string types #4551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi,
That is a nice guess! Our documentation claims it's 21,844 characters, though I think it's a bit outdated, since we support 4-byte encodings for some time. So yes, it's got to be UINT16_MAX/4 - header_length, where header_length is something like 3. |
FooBarrior
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We follow the TDD paradigm - every feature should be covered with tests (and every bugfix, if possible). Include both .test and .result files in the commit.
I'm fine with a hard constant of 16383 in scope of this small work, we can always extend it and make a function of charset any time we want, since 16383 is expected to be the minimum of the maximums.
sql/sql_type.cc
Outdated
| Support SQL Standard T081: "Optional string types maximum length" | ||
| Allows users to specify VARCHAR fields without a length | ||
| */ | ||
| def->length= 16383; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare a constant in the appropriate place, like MAX_VARCHAR_SIZE, or maybe better MAX_VARCHAR_ESTIMATED_SIZE, signifying the imprecise nature.
I tried to look for the existing one and turns out it's pretty widely used in a raw literal form in the sql directory. Let's be the first good guys who'll add it there.
There is a constant in innodb -- we can't refer to it from sql, because this will reverse the dependencies. (innodb depends from sql, not vice-versa)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I expected to have it defined somewhere as it's used in several places, including the error that was reported, but I didn't find any so I hardcoded it for now.
My thought process was to get early feedback to verify my approach.
I now understand the better way of doing this is to start with the test.
I'll add a definition for MAX_VARCHAR_ESTIMATED_SIZE.
We can later investigate other places where the value 16383 is hardcoded and reference the same variable in a future PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a hard constant of 16383 in scope of this small work, we can always extend it and make a function of charset any time we want, since 16383 is expected to be the minimum of the maximums.
I noticed this comment in sql_type.cc line 2796 which is concerning
In MariaDB the limit for VARCHAR is 65535 bytes.
We could translate VARCHAR with no length to VARCHAR(65535), but
it would mean that for multi-byte character sets we'd have to translate
VARCHAR to MEDIUMTEXT, to guarantee 65535 characters.
Also we could translate VARCHAR to VARCHAR(16383), where 16383 is
the maximum possible length in characters in case of mbmaxlen=4
(e.g. utf32, utf16, utf8mb4). However, we'll have character sets with
mbmaxlen=5 soon (e.g. gb18030).
This was committed by Alexander Barkov 6 years ago, not sure if it is still relevant.
In the case of gb18030, the value of 16383 might not be the minimum after all. Is gb18030 currently supported?
sql/sql_type.cc
Outdated
| Support SQL Standard T081: "Optional string types maximum length" | ||
| Allows users to specify VARCHAR fields without a length | ||
| */ | ||
| def->length= 16383; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty surprised this change would work on its own. Fine by me, if it does, though. Please add the test with a new syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I am still familiarizing myself with the testing framework, I manually tested it by creating a table with a VARCHAR field without any parameters and creation was successful and running DESCRIBE returned 16383 as the result. This was just a sanity check for me locally.
I will add the .test and .results files. Is that sufficient to verify the code is behaving as expected?
|
You can try to make tha max value variable if you want. Though it may require some research. The problem is that we have (at least) three layers of charset settings:
One overrides another. At some point, Filed would get a deduced value of charset, and perhaps Create_field would get it, too. Though it's hard to say, when exactly. You'd have to test all three cases though:
Again, I'm NOT putting this as a requirement for this submission! |
That is very informative. To verify my understanding, we are okay with limiting the scope of this PR to the |
17e0e30 to
c6e985c
Compare
Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
This reverts commit 85217ec2774aeefaf077db2d81a56777bea91ada.
…o DB charset Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Jira: https://jira.mariadb.org/browse/MDEV-31414
Adding support for SQL standard 2023's T081:
Optional string types maximum lengthThis PR allows the user to not specify a length for a
VARCHARfield. We default to the maximum possible value,which is65,535(0xFFFF) according to https://mariadb.com/docs/server/reference/data-types/string-data-types/varcharAfter trying
65,535, I got the error that size is too big, maximum is16383Questions:
16383reported as the maximum due to collation choice? I am guessing the max bytes is65,535but maybe due to collation every char requires 4 bytes and that's how we end up with16383? Is there a way to get the current char size for the server?m_has_explicit_lengthproperty inLex_length_and_dec_stin this case be set to true? (default is false here as no value was provided toLexduring parsing of theVARCHARclause.