Skip to content

UTF-8 Database connection - inconsistencies to review  #1126

@taniwallach

Description

@taniwallach

While reviewing #1125 in noticed that one of the patches is fixing a SET NAMES call to mySQL to set the connection character set being used in lib/WeBWorK/Utils/ListingDB.pm.

It is not clear to me why this file needs to manually force the UTF-8 charset for the connection using SET NAMES. (bin/OPL-update also uses SET NAMES after establishing a UTF-8 connection.)

  • When considering these matters we get into the ugly issue that older versions of mySQL supported only the 3-byte utf8 character set and not the full 4-byte set which is called utf8mb4 for mySQL. The need for the control switch under discussion was to provide a mechanism to fall back to the older servers without support for the newer utf8mb4.
  • Apparently the use of SET NAMES was added by @heiderich to fix issues with non-ASCII characters in the library databases making trouble:
  • History
  • I suspect that a better approach would be to add the relevant control settings to the connection parameters in the getDB method of this file, namely mysql_enable_utf8mb4 => 1 and mysql_enable_utf8 => 1, # for older versions of DBD-mysql Perl modules as appears in lib/WeBWorK/DB/Driver/SQL.pm which would also set the connection character set (based on the version of the Perl DBD module).
    • I suspect that the lack of that setting may have been the root issue, and that the SET NAMES could only partially solve the problems, as it does not have the full effect of mysql_enable_utf8 and ``mysql_enable_utf8mb4`.
    • Any change of this sort would need to be tested.
  • For now, I doubt there is any damage in leaving this line where it is but maybe it should be modified to select one of the two UTF-8 mySQL character set options depending on the value of ENABLE_UTF8MB4.
if $ce->{ENABLE_UTF8MB4} {
  $dbh->do(qq{SET NAMES 'utf8mb4';});
} else {
  $dbh->do(qq{SET NAMES 'utf8';});
}

There are several other files which open their own connections to the database, and which were not modified to provide the UTF-8 support switches as part of their connection parameters. I suspect all should have the relevant connection options added. These are:
- bin/load-OPL-global-statistics.pl
- bin/OPL-update
- bin/test_library_build.pl
- bin/update-OPL-statistics.pl
- lib/WeBWorK/Utils/CourseIntegrityCheck.pm
- lib/WeBWorK/Utils/DBUpgrade.pm
- lib/WeBWorK/Utils/LibraryStats.pm

There seem to be 2 options:

  1. Using the approach as in lib/WeBWorK/DB/Driver/SQL.pm and setting both mysql_enable_utf8 and ``mysql_enable_utf8mb4`.
  2. Setting only one of them based on the value of ENABLE_UTF8MB4 as done in bin/OPL-update.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions