Skip to content

Conversation

@OsamaNabih
Copy link

@OsamaNabih OsamaNabih commented Jan 17, 2026

Jira: https://jira.mariadb.org/browse/MDEV-31414
Adding support for SQL standard 2023's T081: Optional string types maximum length

This PR allows the user to not specify a length for a VARCHAR field. We default to the maximum possible value, which is 65,535 (0xFFFF) according to https://mariadb.com/docs/server/reference/data-types/string-data-types/varchar
After trying 65,535, I got the error that size is too big, maximum is 16383

Questions:

  1. Is 16383 reported as the maximum due to collation choice? I am guessing the max bytes is 65,535 but maybe due to collation every char requires 4 bytes and that's how we end up with 16383? Is there a way to get the current char size for the server?
  2. Should the m_has_explicit_length property in Lex_length_and_dec_st in this case be set to true? (default is false here as no value was provided to Lex during parsing of the VARCHAR clause.

@CLAassistant
Copy link

CLAassistant commented Jan 17, 2026

CLA assistant check
All committers have signed the CLA.

@FooBarrior
Copy link
Contributor

Hi,

Is 16383 reported as the maximum due to collation choice?

That is a nice guess!

Our documentation claims it's 21,844 characters, though I think it's a bit outdated, since we support 4-byte encodings for some time. So yes, it's got to be UINT16_MAX/4 - header_length, where header_length is something like 3.

Copy link
Contributor

@FooBarrior FooBarrior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow the TDD paradigm - every feature should be covered with tests (and every bugfix, if possible). Include both .test and .result files in the commit.

I'm fine with a hard constant of 16383 in scope of this small work, we can always extend it and make a function of charset any time we want, since 16383 is expected to be the minimum of the maximums.

sql/sql_type.cc Outdated
Support SQL Standard T081: "Optional string types maximum length"
Allows users to specify VARCHAR fields without a length
*/
def->length= 16383;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare a constant in the appropriate place, like MAX_VARCHAR_SIZE, or maybe better MAX_VARCHAR_ESTIMATED_SIZE, signifying the imprecise nature.

I tried to look for the existing one and turns out it's pretty widely used in a raw literal form in the sql directory. Let's be the first good guys who'll add it there.

There is a constant in innodb -- we can't refer to it from sql, because this will reverse the dependencies. (innodb depends from sql, not vice-versa)

Copy link
Author

@OsamaNabih OsamaNabih Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I expected to have it defined somewhere as it's used in several places, including the error that was reported, but I didn't find any so I hardcoded it for now.
My thought process was to get early feedback to verify my approach.
I now understand the better way of doing this is to start with the test.

I'll add a definition for MAX_VARCHAR_ESTIMATED_SIZE.
We can later investigate other places where the value 16383 is hardcoded and reference the same variable in a future PR. What do you think?

Copy link
Author

@OsamaNabih OsamaNabih Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with a hard constant of 16383 in scope of this small work, we can always extend it and make a function of charset any time we want, since 16383 is expected to be the minimum of the maximums.

I noticed this comment in sql_type.cc line 2796 which is concerning

In MariaDB the limit for VARCHAR is 65535 bytes.
We could translate VARCHAR with no length to VARCHAR(65535), but
it would mean that for multi-byte character sets we'd have to translate
VARCHAR to MEDIUMTEXT, to guarantee 65535 characters.

Also we could translate VARCHAR to VARCHAR(16383), where 16383 is
the maximum possible length in characters in case of mbmaxlen=4
(e.g. utf32, utf16, utf8mb4). However, we'll have character sets with
mbmaxlen=5 soon (e.g. gb18030).

This was committed by Alexander Barkov 6 years ago, not sure if it is still relevant.
In the case of gb18030, the value of 16383 might not be the minimum after all. Is gb18030 currently supported?

sql/sql_type.cc Outdated
Support SQL Standard T081: "Optional string types maximum length"
Allows users to specify VARCHAR fields without a length
*/
def->length= 16383;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty surprised this change would work on its own. Fine by me, if it does, though. Please add the test with a new syntax.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am still familiarizing myself with the testing framework, I manually tested it by creating a table with a VARCHAR field without any parameters and creation was successful and running DESCRIBE returned 16383 as the result. This was just a sanity check for me locally.

I will add the .test and .results files. Is that sufficient to verify the code is behaving as expected?

@FooBarrior
Copy link
Contributor

FooBarrior commented Jan 17, 2026

You can try to make tha max value variable if you want. Though it may require some research.

The problem is that we have (at least) three layers of charset settings:

  • system-wide
  • per-table
  • per-field

One overrides another. At some point, Filed would get a deduced value of charset, and perhaps Create_field would get it, too. Though it's hard to say, when exactly. Create_field has charset field in its parent Column_definition_attributes, you may check if it's set to the right encoding at different points in time.

You'd have to test all three cases though:

  • When a system-wide encoding is set
  • When it's overriden with table
  • When the latter is overriden with field charset.

Again, I'm NOT putting this as a requirement for this submission!

@github-actions github-actions bot added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jan 18, 2026
@OsamaNabih
Copy link
Author

OsamaNabih commented Jan 20, 2026

You can try to make tha max value variable if you want. Though it may require some research.

The problem is that we have (at least) three layers of charset settings:

* system-wide

* per-table

* per-field

One overrides another. At some point, Filed would get a deduced value of charset, and perhaps Create_field would get it, too. Though it's hard to say, when exactly. Create_field has charset field in its parent Column_definition_attributes, you may check if it's set to the right encoding at different points in time.

You'd have to test all three cases though:

* When a system-wide encoding is set

* When it's overriden with table

* When the latter is overriden with field charset.

Again, I'm NOT putting this as a requirement for this submission!

That is very informative.
I was mainly focusing on the CREATE TABLE scenario so I only paid attention to the system-wide encoding, but your comment made me realize there are other scenarios where T081 applies as well.
For example ALTER, CAST, CREATE TYPE...etc
For some of these scenarios, we need to handle the situation where the table/field were already created with a different charset than the system-wide one.

To verify my understanding, we are okay with limiting the scope of this PR to the CREATE TABLE scenario?

@OsamaNabih OsamaNabih force-pushed the main branch 2 times, most recently from 17e0e30 to c6e985c Compare January 20, 2026 13:59
@OsamaNabih OsamaNabih closed this Jan 20, 2026
@OsamaNabih OsamaNabih reopened this Jan 20, 2026
Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
This reverts commit 85217ec2774aeefaf077db2d81a56777bea91ada.
…o DB charset

Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

3 participants