Skip to content

WW 2.15 - update Docker config to use Ubuntu 18.04#980

Merged
taniwallach merged 21 commits into
openwebwork:WeBWorK-2.15from
taniwallach:docker_to_ubuntu_18lts
Aug 22, 2019
Merged

WW 2.15 - update Docker config to use Ubuntu 18.04#980
taniwallach merged 21 commits into
openwebwork:WeBWorK-2.15from
taniwallach:docker_to_ubuntu_18lts

Conversation

@taniwallach

@taniwallach taniwallach commented Aug 3, 2019

Copy link
Copy Markdown
Member

Changes were made, and this was rebased to move the OPL scripts into #985.

The docker-entrypoint.sh files assumes that the OPL-scripts from #985 are available, but that is the price of splitting the PR into two independent parts.


This replaces the Docker config changes from #972 and was influenced by feedback from @xcompass and @nmoller in that thread and elsewhere.

Changes:

  • Update Docker configuration to use Ubuntu 18.04 as the base image.
  • Revised the manner in which webwork2 and pg are downloaded, based on the suggested by @nmoller at https://gist.github.com/nmoller/81bd8e149e6aa2a7cf051e0bf248b2e2.
  • Install more Perl packages from Ubuntu and less from CPAN.
  • Includes sample Apache and SSL configuration files, and include many details about possible local configuration in docker-compose.yml. The sample files are in under the docker-config/ subdirectory, and docker-entrypoint.sh was moved into that subdirectory.
  • Some of the run-time configuration is done by setting specific environment variables in docker-compose.yml.
    • SSL can be used to turn SSL on in the running container.
    • ADD_LOCALES can be used to have the running container build additional locale files on startup.
    • ADD_PACKAGES can be used to have the running container install additional Ubuntu packages on startup.
    • PAPERSIZE can be used to set the default system papersize in the running container.
    • Such changes are not persistent.
  • By default the OPL will be installed in a local named Docker volume when a container is first started, and the OPL-update will be run.
    • Running OPL-update takes a very long time.
    • Later startups of the containers will used the OPL downloaded the first time.
    • An alternative approach is to modify docker-compose.yml to mount the OPL from an external directory, which allows for OPL maintenance to be handled outside of the Docker container/named storage volume.

@taniwallach

Copy link
Copy Markdown
Member Author

Warning:

  • The upgrade from MariaDB 10.1 to 10.4 seems to make some trouble for existing databases.
    • The problems were related to the main mysql tables for the SQL database themselves, which need changes.
    • Below are some of the messages issued when I tried to docker-compose up when there was an existing database on the named storage volume.
    • If someone wants to figure out how to overcome this - that would be a big help.
  • For testing purposes, I recommend that you archive all courses on your test machine (probably except admin), and the delete the Docker named storage volume which has the SQL tables:
    • It should be something like: docker volume rm webwork2_mysql which needs to be run when the container is not running.
  • I am starting to work on how to have the OPL database tables saved/loaded from mysqldump files, as generating them is very slow.
    • It may be related to the use of InnoDB, and possible to a need to better handle when "commits" occur, and maybe grouping things into transactions, which the existing WW database code does not handle at all.

Error messages seen:

  • The long line seems to have been trunctated.
db_1   | 2019-08-02 14:54:00 0 [Note] mysqld (mysqld 10.4.7-MariaDB-1:10.4.7+maria~bionic) starting as process 1 ...

...

db_1   | 2019-08-02 14:54:00 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH','EMPTY_STRING_IS_NULL','SIMULTANEOUS_ASSIGNMENT'), found type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALU

...

db_1   | 2019-08-02 14:54:00 0 [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.

...

db_1   | 2019-08-02 14:54:00 6 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
db_1   | 2019-08-02 14:54:00 6 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name.  Please run mysql_upgrade

@nmoller

nmoller commented Aug 4, 2019

Copy link
Copy Markdown
Member

@taniwallach : running in the db container

mysql_upgrade -u root -p
randomepassword

corrects the message. The integration of mysql 5.7+ surely adds that... it is usually worst when going to mysql8.0.

@taniwallach

Copy link
Copy Markdown
Member Author

I have implemented some of the ideas about the OPL databases raised in:

In particular:

  • I added 2 new scripts to dump/restore the OPL tables to/from a file, which can save on the need to run OPL-update when a dump file is ready.
    • They are ready for use outside the context of Docker use of webwork.
  • They are used in the revised docker-config/docker-entrypoint.sh added to this PR:
    • so that a dump file is created after OPL-update is run, and
    • should the code suspect based on a missing tagging-taxonomy.json file that there is a need to run OPL-update it will first look for the dump file and load that if it exists.
  • Additional changes to docker-config/docker-entrypoint.sh will also attempt to save/restore the JSON files generated when OPL-update is run using a custom location, as these files are not included in the base image, and at container startup time should be "restored" to the files generated by OPL-update.

@nmoller @xcompass - Your testing and feedback would be appreciated.

If the local webwork2 tree has the changes in this PR before it is merged into the WeBWorK-2.15 branch, the image built will contain the updated docker-entrypoint.sh but will not contain the new scripts. To overcome this for testing, I manually mounted them by adding 2 more lines to the volumes section of docker-compose.yml:

       - "/path_to_ww2/webwork2/bin/restore-OPL-tables:/opt/webwork/webwork2/bin/restore-OPL-tables"
       - "/path_to_ww2/webwork2/bin/dump-OPL-tables:/opt/webwork/webwork2/bin/dump-OPL-tables"

where the value of /path_to_ww2/ needs to be set appropriately.


@mgage @heiderich @nmoller @jwj61 : We need to consider how to best distribute pre-made versions of these data files in the future:

  • The OPL-tables.sql file comes out at about 5.3MB (at present), and gzipped is under 1MB.
  • The 4 JSON files together are currently about 1.4MB together, but gzipped come out at about 144kB.

Given that these are large files which change quickly as modifications are made to the OPL, it does not seem wise to include then (or gzipped versions) inside the OPL itself.

Maybe a pointer to the URL of the current version of a TGZ file of the complete set could be included in the OPL, together with a SHA256 sum or something else to help verify the integrity of the downloaded file.

@heiderich

Copy link
Copy Markdown
Member

@mgage @heiderich @nmoller @jwj61 : We need to consider how to best distribute pre-made versions of these data files in the future:

  • The OPL-tables.sql file comes out at about 5.3MB (at present), and gzipped is under 1MB.

  • The 4 JSON files together are currently about 1.4MB together, but gzipped come out at about 144kB.

Given that these are large files which change quickly as modifications are made to the OPL, it does not seem wise to include then (or gzipped versions) inside the OPL itself.

Maybe a pointer to the URL of the current version of a TGZ file of the complete set could be included in the OPL, together with a SHA256 sum or something else to help verify the integrity of the downloaded file.

Maybe we could use releases of GitHub to store those files. And yes, I also think that we could include a link to the latest "release" of the OPL together with a checksum. I guess at least the latter must be hard coded somewhere. Maybe we could do periodic releases of the OPL. I guess the whole process (creating the tgz, making a release on GitHub and updating the reference in the docker related files) could be automatized.

@taniwallach

Copy link
Copy Markdown
Member Author

Maybe we could use releases of GitHub to store those files. And yes, I also think that we could include a link to the latest "release" of the OPL together with a checksum. I guess at least the latter must be hard coded somewhere. Maybe we could do periodic releases of the OPL. I guess the whole process (creating the tgz, making a release on GitHub and updating the reference in the docker related files) could be automatized.

I think given the size of the files, at the very least it should be in a separate Git repository, possibly defined as a sub-module of the OPL, so that the old files do not accumulate in the main OPL Git repository. Most people could use a "shallow clone" of that special repository (to only get the current version). If it were a separate Git repository, we could possibly reset it as an empty repository, or change repositories every so often.

I tend to think that huge non-human generated files like this are probably not best handled in Git (but there is Git LFS). I think it might be better to host them as static files somewhere else.

But, I have not given this very much thought yet.

@heiderich

Copy link
Copy Markdown
Member

I did not mean to commit the tgz files. I proposed to add them as additional binary files to the releases (see step 7 in https://help.github.com/en/articles/creating-releases).

@taniwallach

Copy link
Copy Markdown
Member Author

I did not mean to commit the tgz files. I proposed to add them as additional binary files to the releases (see step 7 in https://help.github.com/en/articles/creating-releases).

This sounds like a very feasible approach.

It certainly provides a means of hosting the tgz file on GitHub, and they don't restrict anything except that each file must be under 2GB. We would need to determine how to make it easy to include the links to those files in the main OPL repository, but that can be handled by small text files.

It might still be easier to handle it as a different repository, just to keep the OPL "clean" and not clogged up with pseudo-releases used to distribute the tgz files.

@heiderich

Copy link
Copy Markdown
Member

It might still be easier to handle it as a different repository, just to keep the OPL "clean" and not clogged up with pseudo-releases used to distribute the tgz files.

As far as I understand, the git repository itself would not be affected by the releases and would thus not be "clogged" by the binary files. I guess this might have been a motivation to introduce the release files on GitHub.

@nmoller

nmoller commented Aug 13, 2019

Copy link
Copy Markdown
Member

I think Florian's idea seems great. We have to put a note to modify the docker-entrypoint script for doing the "plummbing" needed to use those distribution files.

@heiderich

Copy link
Copy Markdown
Member

The dump and release could maybe be automated as @mgage suggested in #972 (comment):

We could create a github hook that creates a new OPL database when changes are made (or once a day if changes are made).

I am however not sure whether we really want a new release after every commit or once a day if changes were made. This may also depend on how these releases are used in different contexts.

@heiderich heiderich left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the local webwork2 tree has the changes in this PR before it is merged into the WeBWorK-2.15 branch, the image built will contain the updated docker-entrypoint.sh but will not contain the new scripts. To overcome this for testing, I manually mounted them by adding 2 more lines to the volumes section of docker-compose.yml:

This makes testing a bit cumbersome. As I see it, these two scripts are independent from the docker changes, even though they might be particularly useful.

@taniwallach How about opening a new PR in which you only add these two scripts. Then we could test it, merge it and afterwards come back to this PR. Ideally include some instructions on how to test the scripts. Maybe you could even add some a unit test for the scripts (I guess roughly dumping followed by dropping the tables and finally restoring them should not change anything).

I would strongly advise to add such tests whenever possible. I know that this can be a lot of additional work, but I think in the long run we will profit from such tests.

@heiderich heiderich left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look good to me.

Comment thread docker-config/docker-entrypoint.sh Outdated
Comment thread bin/dump-OPL-tables Outdated
Comment thread bin/restore-OPL-tables Outdated
Comment thread docker-compose.yml Outdated
Comment thread docker-compose.yml
Comment thread Dockerfile
Comment thread docker-config/ssl/default-ssl.conf
Comment thread docker-config/ssl/default-ssl.conf
Comment thread docker-config/ssl/local/PutYourCertificate_and_key_here.txt
Comment thread Dockerfile
@pstaabp

pstaabp commented Aug 15, 2019

Copy link
Copy Markdown
Member

I'm trying to test this and being new to docker, I get the following error a number of times:

app_1  | httpd (pid 8) already running

I had it running, with docker-compose up but it wasn't successful, and retrying I get this. Any help getting past this without completely restarting would be helpful.

@heiderich

Copy link
Copy Markdown
Member

As an additional general comment to my last review:

My impression was that certain values are used in several places. In general I prefer avoiding hardcoding values that appear at least twice.

The idea of some of the comments in this review is the following: Use variables in the docker-compose.yml file and provide (external to the docker-compose.yml file) an environment with sensible default values for those variables. This way I hope that a user will be able to get a quick overview of the most relevant variables. This would of course be most valuable if no other changes will be necessary to those files. Currently I do not oversee whether this will be the case.

Comment thread bin/dump-OPL-tables Outdated
Comment thread bin/restore-OPL-tables Outdated
@heiderich

Copy link
Copy Markdown
Member

When is update-OPL-statistics supposed to run? I once ended up in an environment without the statistics tables (starting in a not completely clean environment). I guess we should call update-OPL-statistics after restore-OPL-tables to be safe. See taniwallach#7.

@taniwallach taniwallach removed the request for review from xcompass August 16, 2019 12:21
taniwallach and others added 3 commits August 19, 2019 15:10
…rk2_and_pg

introduce build time arguments for the branches of webwork2 and pg
and use it instead of the hard-coded string "../ww-docker-data/courses" in
docker-compose.yml
Comment thread docker-compose.yml Outdated
heiderich and others added 4 commits August 20, 2019 12:55
define the environment variable COURSES_DIRECTORY_ON_HOST in .env
Set them all in docker-compose.yml, where some depend on the .env file.
@taniwallach

Copy link
Copy Markdown
Member Author

Based on @heiderich's changes, and an off-line discussion, all the WEBWORK_DB_* environment variables were removed from Dockerfile, as they are not need to build the webwork image.

They are now defined in docker-compose.yml and the username/password are pulled in from the .env file.

Comment thread docker-compose.yml
@taniwallach

Copy link
Copy Markdown
Member Author

I just tested this PR again after all the recent changed using a clean run of

docker-compose build

but then manually copied the updated script files from #985 into webwork2/bin/ before running

docker-compose up

and everything I tested seems fine to me (login, library browser, dump+restore of OPL tables). However, I kept my data volumes (SQL and courses) unchanged.

@heiderich @nmoller - Can you also run a test.

If someone can let a totally clean machine without any data volumes start up and run an OPL-update, we can:

  1. test that a totally clean build works properly
  2. Get an updated OPL table dump and set of JSON files, to be posted for download.

The only thing I would like to consider adding to this PR is a download of the OPL table dump and JSON data... so that a Docker system can be brought up without any need to run OPL-update.

@taniwallach taniwallach requested a review from heiderich August 20, 2019 14:25
@heiderich

Copy link
Copy Markdown
Member

I tested it and modulo the missing binaries it seems to work, also with my changes from taniwallach#11.

I approve this PR being merged once #985 is merged.

@taniwallach

Copy link
Copy Markdown
Member Author

I just merged #985 as we seem to have settled on the current level of security for the DB password there (using the MYSQL_PWD environment variable).

As getting that merged in first was the main thing we we waiting for, I'm merging this in also. Further testing can be done in the WeBWorK-2.15 branch, and any additional improvements can be made later.

@taniwallach taniwallach merged commit 17ce1ff into openwebwork:WeBWorK-2.15 Aug 22, 2019
@taniwallach taniwallach deleted the docker_to_ubuntu_18lts branch August 22, 2019 18:17
@taniwallach taniwallach restored the docker_to_ubuntu_18lts branch August 22, 2019 18:18
@taniwallach taniwallach deleted the docker_to_ubuntu_18lts branch February 17, 2021 13:53
@taniwallach taniwallach restored the docker_to_ubuntu_18lts branch February 17, 2021 13:54
@taniwallach taniwallach deleted the docker_to_ubuntu_18lts branch February 17, 2021 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NeedsTesting Tentatively fixed bug or implemented feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants