Skip to content

new OPL scripts: dump/restore OPL tables, load global statistics without other local table work#985

Merged
taniwallach merged 4 commits into
openwebwork:WeBWorK-2.15from
taniwallach:OPL-tables-dump-restore
Aug 22, 2019
Merged

new OPL scripts: dump/restore OPL tables, load global statistics without other local table work#985
taniwallach merged 4 commits into
openwebwork:WeBWorK-2.15from
taniwallach:OPL-tables-dump-restore

Conversation

@taniwallach

Copy link
Copy Markdown
Member

New scripts to dump the OPL tables to a dump-file and to restore from such a file, as loading from a dump-file is much faster than running OPL-update.

Also add a script to import that OPL global statistics without running the full update-OPL-statistics script which also works on local data.

such a file, as loading from a dump-file is much faster than running
OPL-update. Modify the docker-entrypoint.sh to do such a dump after
running OPL-update, to attempt to restore the OPL tables from a
dump-file before trying to run OPL-update, and also to save/restore the
JSON files generated by OPL-update. Also add a script to import that
OPL global statistics without running the full update-OPL-statistics
script which also works on local data.

@heiderich heiderich left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the dump and restore files a test in their version in #980. I think this is a great idea!

The file load-OPL-global-statistics seems to do part of what update-OPL-statistics is doing. This means code duplication, which is usually not a good idea. There are different ways around this:

  1. Remove that part from update-OPL-statistics and call both files when update-OPL-statistics was called before. This would probably require updating the documentation.
  2. Remove that part from update-OPL-statistics and call load-OPL-global-statistics from update-OPL-statistics instead.
  3. Move the code that does the actual work into a third file that is used by both of them.

Comment thread bin/dump-OPL-tables Outdated
Comment thread bin/load-OPL-global-statistics Outdated
Comment thread bin/restore-OPL-tables Outdated
Comment thread bin/dump-OPL-tables
global statistics data from the OPL after running update-OPL-statistics.
Remove the reloading of the global statistics data from
update-OPL-statistics but print a warning that it no longer does that.
Also fix the warning in lib/WeBWorK/Utils/LibraryStats.pm to inform
users to use load-OPL-global-statistics instead of update-OPL-statistics
when the global statistics data is missing.
@taniwallach

taniwallach commented Aug 16, 2019

Copy link
Copy Markdown
Member Author

I gave the dump and restore files a test in their version in #980. I think this is a great idea!

The file load-OPL-global-statistics seems to do part of what update-OPL-statistics is doing. This means code duplication, which is usually not a good idea. There are different ways around this:

  1. Remove that part from update-OPL-statistics and call both files when update-OPL-statistics was called before. This would probably require updating the documentation.
  2. Remove that part from update-OPL-statistics and call load-OPL-global-statistics from update-OPL-statistics instead.
  3. Move the code that does the actual work into a third file that is used by both of them.

Addressed by commit f63ed9f

  1. I modified OPL-update to call load-OPL-global-statistics after it calls update-OPL-statistics.
  2. update-OPL-statistics will no longer have the code to load the global statistics data, and will print a message that it no longer does so.
  3. I modified lib/WeBWorK/Utils/LibraryStats.pm to recommend running load-OPL-global-statistics (instead of update-OPL-statistics) after the OPL is downloaded (to obtain the SQL dump file with the global statistics data).

The documentation at
http://webwork.maa.org/wiki/OPL_Problem_Statistics#Generating_Global_OPL_Problem_statistics
was edited to reflect the fact that in WW-2.15 a different script will be used for loading the global statistics data.

@taniwallach

Copy link
Copy Markdown
Member Author

Since I do not intend to address the inline comment about the passwords at present, I marked the tread above as resolved, but the issue should be obvious - so here is a summary:

@heiderich wrote:

It is insecure to pass a password as a command line argument and mysqldump complains about that. Maybe consider storing it in an option file:

https://dev.mysql.com/doc/refman/5.5/en/option-files.html

I responded:

I agree this is an issue, but the same method was used by the preexisting:

  • bin/upload-OPL-statistics
    and the code was borrowed from there.

Given that the password is in an environment variable in the running container and in the docker-compose.yml file on a Docker setup, I don't see why using the password in this manner is really significantly more insecure than the alternatives.

The file lib/WeBWorK/DB/Schema/NewSQL/Std.pm has a subroutine called _get_db_info which created a my.cnf file with the password, and then uses the options --defaults-extra-file=" . shell_quote($my_cnf->filename) to load the settings including the password from the file in both dump_table and restore_table. If you have time, you are welcome to modify both the new scripts and bin/upload-OPL-statistics to use such an approach.

Note also: lib/WeBWorK/Utils/CourseManagement/sql_single.pm has code listed as

# TOTALLY STOLEN FROM NewSQL::Std.

which us used by unarchiveCourseHelper to also avoid a password on the command line.

A contribution improving the manner in which the database settings are handled and which secures the password would be very welcome.

@taniwallach

Copy link
Copy Markdown
Member Author

Testing (requires comfort working in the mysql database directly):

  1. Use the bin/dump-OPL-tables script to dump the OPL database table data to a file.
  2. Look at the file to see it was created.
  3. Manually delete one or more of the OPL database tables in mysql.
  4. Use the bin/restore-OPL-tables script to reload the data.
  5. Check that the deleted data was restored.
  6. Manually delete the table OPL_global_statistics table in mysql.
  7. Run bin/load-OPL-global-statistics.
  8. Check that the OPL_global_statistics table exists again.

@mgage mgage added the NeedsTesting Tentatively fixed bug or implemented feature label Aug 18, 2019
@heiderich

Copy link
Copy Markdown
Member

I opened #987 to document the security issue. Even though I think that this practice is a a bad idea, I do not want to stay in the way of this PR.

@heiderich

Copy link
Copy Markdown
Member

Commit f63ed9f resolves my concerns about code duplication. Thanks @taniwallach.

@mgage

mgage commented Aug 19, 2019

Copy link
Copy Markdown
Member

Thanks David. This looks like it might be the best quick fix:


Set MYSQL_PWD in the environment (export MYSQL_PWD=muhpassword) and execute your command without the -p. See MySQL Program Environment Variables. In spite of the manual's dire warnings, this is rather safe. Unless you start weird warez in the same shell later. So we run: MYSQL_PWD=$(cat foo.php etc) mysql -u foouser -h barhost – David Tonhofer Mar 1 '18 at 18:02


Note that there is the MYSQL_PWD environment variable which is read by mysql if you don't specify -p. And under Linux this is a secure method because the environment of a user cannot be read by other unprivilleged users - in contrast to the argument vector. – maxschlepzig Jul 14 at 14:55

@heiderich

Copy link
Copy Markdown
Member

Thanks David. This looks like it might be the best quick fix:

Set MYSQL_PWD in the environment (export MYSQL_PWD=muhpassword) and execute your command without the -p. See MySQL Program Environment Variables. In spite of the manual's dire warnings, this is rather safe. Unless you start weird warez in the same shell later. So we run: MYSQL_PWD=$(cat foo.php etc) mysql -u foouser -h barhost – David Tonhofer Mar 1 '18 at 18:02

Note that there is the MYSQL_PWD environment variable which is read by mysql if you don't specify -p. And under Linux this is a secure method because the environment of a user cannot be read by other unprivilleged users - in contrast to the argument vector. – maxschlepzig Jul 14 at 14:55

@mgage Did you possibly mean to post this comment in #987? If so, could you post it again there? It would be easier to keep track of the discussion if we keep topics little separated when possible. (This was one of the reasons I opened that issue.)

with the use of the MYSQL_PWD environment variable.
See: openwebwork#987
@taniwallach

Copy link
Copy Markdown
Member Author

Thanks David. This looks like it might be the best quick fix:

Set MYSQL_PWD in the environment (export MYSQL_PWD=muhpassword) and execute your command without the -p. See MySQL Program Environment Variables. In spite of the manual's dire warnings, this is rather safe. Unless you start weird warez in the same shell later. So we run: MYSQL_PWD=$(cat foo.php etc) mysql -u foouser -h barhost – David Tonhofer Mar 1 '18 at 18:02

Note that there is the MYSQL_PWD environment variable which is read by mysql if you don't specify -p. And under Linux this is a secure method because the environment of a user cannot be read by other unprivilleged users - in contrast to the argument vector. – maxschlepzig Jul 14 at 14:55

implemented in commit 00d4f30

@taniwallach taniwallach requested a review from heiderich August 20, 2019 14:09
@taniwallach

Copy link
Copy Markdown
Member Author

I just tested the current version of PR #980 and the current version of these scripts. See: #980 (comment)

@heiderich - after the recent changes - are you ready to approve?

@mgage @pstaabp @jwj61 @glarose @nmoller @sean-fitzpatrick @Alex-Jordan - Do any of you plan to test this (before it gets merged)?

@jwj61

jwj61 commented Aug 20, 2019

Copy link
Copy Markdown
Member

No, I won't have a chance to test it. I am just glad people are able to build on what I wrote many years ago.

@heiderich heiderich left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand it, the approach to save the password in an options file still seems to be safer than the environment variable approach (if done correctly). But 00d4f30 is an improvement over the previous implementation. As the problem is documented in #987, I think this should be merged.

@heiderich

heiderich commented Aug 21, 2019

Copy link
Copy Markdown
Member

One more minor comment:

Some of the Perl scripts in bin have a .pl ending, other do not have it.
Some of the schell scripts in bin have a .sh ending, other do not have it.

I am in favor of having the endings in the filenames, as it indicates already what to expect.

So I wonder whether we should add those endings to the newly added filenames.

@taniwallach taniwallach force-pushed the OPL-tables-dump-restore branch from caa03af to 70f2e2e Compare August 21, 2019 18:54
@taniwallach

Copy link
Copy Markdown
Member Author

One more minor comment:

Some of the Perl scripts in bin have a .pl ending, other do not have it.
Some of the schell scripts in bin have a .sh ending, other do not have it.

I am in favor of having the endings in the filenames, as it indicates already what to expect.

So I wonder whether we should add those endings to the newly added filenames.

I renamed the "new" scripts and some of the old ones modified by this PR.
However, I left OPL-update without a .pl as that is much more commonly used, and too much existing documentations refers to it in the current name, so I am afraid to rename it at present.

@heiderich

Copy link
Copy Markdown
Member

Yet another minor comment:

In view of #876 I wonder whether we should possibly avoid "OPL" in the name of the (new) scripts and rather use something generic like "library"?

I agree that renaming OPL-update would be more disruptive and we could keep that first.

@taniwallach

Copy link
Copy Markdown
Member Author

For now, we want to solve these "problems" for the OPL, and until the support for multiple libraries is completed and tested, it is too soon to predict exactly how to modify these scripts to support other libraries. For now, I think this is good enough.

When additional library support is added, we would want to create generic scripts which could run for a given library based on a parameter sent as a command line argument, and the OPL specific scripts could then be wrappers which call the generic scripts with the parameter set to "OPL".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NeedsTesting Tentatively fixed bug or implemented feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants