Skip to content

Set specific charset on new OPL tables#745

Closed
bldewolf wants to merge 1 commit into
openwebwork:developfrom
bldewolf:force-latin1
Closed

Set specific charset on new OPL tables#745
bldewolf wants to merge 1 commit into
openwebwork:developfrom
bldewolf:force-latin1

Conversation

@bldewolf
Copy link
Copy Markdown
Contributor

If our database defaults to the utf8 character set and we run OPL-update, it
will fail creating tables. It fails because the primary key of the new tables
is too large. This is because the primary key includes several VARCHAR
columns, which are 3x in size if they are utf8.

This commit changes the table creating queries to force the latin1 character
set instead of using the database default. This allows the tables to be
created and the OPL indexes loaded into the database.

If our database defaults to the utf8 character set and we run OPL-update, it
will fail creating tables.  It fails because the primary key of the new tables
is too large.  This is because the primary key includes several VARCHAR
columns, which are 3x in size if they are utf8.

This commit changes the table creating queries to force the latin1 character
set instead of using the database default.  This allows the tables to be
created and the OPL indexes loaded into the database.
@goehle
Copy link
Copy Markdown
Member

goehle commented Sep 15, 2016

As part of a broader localization effort we were going to try and move towards more utf8 encoding (see #712) with the idea that eventually WeBWorK and PG would support problems with non-latin character sets. Is there a way to fix this which supports utf8? In the WeBWorK database we manually set the key prefix to be 100 characters so that even in utf8 mode the prefix isn't too long. Are there varchar keys that are likely to be identical up to 100 characters?

@bldewolf
Copy link
Copy Markdown
Contributor Author

OPL-update dies specifically with this error:

ENGINE=MYISAMDBD::mysql::db do failed: Specified key was too long; max key length is 1000 bytes at ./OPL-update line 267.

After adding some debugging, it's when it creates this table:

ENGINE=MYISAMCREATE TABLE `OPL_author` ( 
        author_id int (15) NOT NULL auto_increment,
        institution tinyblob,
        lastname varchar (255) NOT NULL,
        firstname varchar (255) NOT NULL,
        email varchar (255),
        KEY author (lastname, firstname),
        PRIMARY KEY (author_id)
)

It looks like it's the only table that creates a key of two varchar(255)s.

@goehle
Copy link
Copy Markdown
Member

goehle commented Sep 15, 2016

Ok. We can definitely add code to limit the key prefix to 100 characters
then. Its highly unlikely that we will have authors with first and last
names that agree up to 100 characters.

Would you be willing to implement this? I think its just changing

KEY author (lastname, firstname)

to

KEY author (lastname(100), firstname(100))

instead of setting the table character set.

On Thu, Sep 15, 2016 at 1:48 PM, Brian De Wolf notifications@github.com
wrote:

OPL-update dies specifically with this error:

ENGINE=MYISAMDBD::mysql::db do failed: Specified key was too long; max key length is 1000 bytes at ./OPL-update line 267.

After adding some debugging, it's when it creates this table:

ENGINE=MYISAMCREATE TABLE OPL_author (
author_id int (15) NOT NULL auto_increment,
institution tinyblob,
lastname varchar (255) NOT NULL,
firstname varchar (255) NOT NULL,
email varchar (255),
KEY author (lastname, firstname),
PRIMARY KEY (author_id)
)

It looks like it's the only table that creates a key of two varchar(255)s.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#745 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABsVO_kq-TiiybeuDtrkYlLcrtYeat6Pks5qqYT7gaJpZM4J9YpI
.

@bldewolf
Copy link
Copy Markdown
Contributor Author

I was thinking it might be simpler to store the author in a single author varchar(255), like OPL_textbook does. A good portion of the existing author data doesn't follow a standard name format (multiple people, "modified by", etc), so splitting the last word off to be lastname doesn't make much sense.

To make sure this wouldn't break anything, I went looking for where this table gets used. Does this table actually get used anywhere? I can't find any references to it or its columns in webwork. Are there other projects that would use this data?

Also, the character set should probably be forced on these tables to utf8 eventually, if that's the direction webwork is going. Leaving it unset will take the DB's default. So, if OPL starts shipping in utf8, and someone's DB makes latin1 tables, they won't be able to load the OPL. I had this issue in openwebwork/webwork-open-problem-library#64, but in reverse: OPL contained non-utf8 characters and my DB made utf8 tables.

@jwj61
Copy link
Copy Markdown
Member

jwj61 commented Sep 19, 2016

There are several tags stored in the database which are not used in webwork, and author is one of them. For various reasons, those tags are wrong in many cases, and I don't think there is much demand for search by problem author, and favor dropping support for them completely. But, whenever I brought it up, someone would object, so they stay in this state of limbo.

@heiderich heiderich mentioned this pull request Dec 28, 2016
6 tasks
jutrembBDEB pushed a commit to CCDMD/webwork2_fr_qc that referenced this pull request Feb 12, 2017
@mgage
Copy link
Copy Markdown
Member

mgage commented Jun 21, 2017

In the interests of resolving this for the time being at least I will impliment
KEY author (lastname(100), firstname(100))
and test it to make sure things don't hand when creating the OPL database.

This doesn't resolve the fact that the formatting of author names is inconsistent and "first name" and "last name" may not mean much.

Any objections? suggestions?

@jwj61
Copy link
Copy Markdown
Member

jwj61 commented Jun 22, 2017

Since it is unused, you can make it even smaller if you like.

@mgage
Copy link
Copy Markdown
Member

mgage commented Jun 22, 2017

This issue is partially resolved by PR #778. I'll close this discussion for now.

@mgage mgage closed this Jun 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants