Set specific charset on new OPL tables#745
Conversation
If our database defaults to the utf8 character set and we run OPL-update, it will fail creating tables. It fails because the primary key of the new tables is too large. This is because the primary key includes several VARCHAR columns, which are 3x in size if they are utf8. This commit changes the table creating queries to force the latin1 character set instead of using the database default. This allows the tables to be created and the OPL indexes loaded into the database.
|
As part of a broader localization effort we were going to try and move towards more utf8 encoding (see #712) with the idea that eventually WeBWorK and PG would support problems with non-latin character sets. Is there a way to fix this which supports utf8? In the WeBWorK database we manually set the key prefix to be 100 characters so that even in utf8 mode the prefix isn't too long. Are there varchar keys that are likely to be identical up to 100 characters? |
|
OPL-update dies specifically with this error: After adding some debugging, it's when it creates this table: It looks like it's the only table that creates a key of two |
|
Ok. We can definitely add code to limit the key prefix to 100 characters Would you be willing to implement this? I think its just changing KEY author (lastname, firstname) to KEY author (lastname(100), firstname(100)) instead of setting the table character set. On Thu, Sep 15, 2016 at 1:48 PM, Brian De Wolf notifications@github.com
|
|
I was thinking it might be simpler to store the author in a single author varchar(255), like OPL_textbook does. A good portion of the existing author data doesn't follow a standard name format (multiple people, "modified by", etc), so splitting the last word off to be lastname doesn't make much sense. To make sure this wouldn't break anything, I went looking for where this table gets used. Does this table actually get used anywhere? I can't find any references to it or its columns in webwork. Are there other projects that would use this data? Also, the character set should probably be forced on these tables to utf8 eventually, if that's the direction webwork is going. Leaving it unset will take the DB's default. So, if OPL starts shipping in utf8, and someone's DB makes latin1 tables, they won't be able to load the OPL. I had this issue in openwebwork/webwork-open-problem-library#64, but in reverse: OPL contained non-utf8 characters and my DB made utf8 tables. |
|
There are several tags stored in the database which are not used in webwork, and author is one of them. For various reasons, those tags are wrong in many cases, and I don't think there is much demand for search by problem author, and favor dropping support for them completely. But, whenever I brought it up, someone would object, so they stay in this state of limbo. |
|
In the interests of resolving this for the time being at least I will impliment This doesn't resolve the fact that the formatting of author names is inconsistent and "first name" and "last name" may not mean much. Any objections? suggestions? |
|
Since it is unused, you can make it even smaller if you like. |
|
This issue is partially resolved by PR #778. I'll close this discussion for now. |
If our database defaults to the utf8 character set and we run OPL-update, it
will fail creating tables. It fails because the primary key of the new tables
is too large. This is because the primary key includes several VARCHAR
columns, which are 3x in size if they are utf8.
This commit changes the table creating queries to force the latin1 character
set instead of using the database default. This allows the tables to be
created and the OPL indexes loaded into the database.