Skip to content

Addresses several issues during the re-start of the core container#158

Open
cgbautista wants to merge 2 commits intodevelopmentfrom
fix/core_does_not_start_upon_reboot
Open

Addresses several issues during the re-start of the core container#158
cgbautista wants to merge 2 commits intodevelopmentfrom
fix/core_does_not_start_upon_reboot

Conversation

@cgbautista
Copy link
Contributor

  • The first d2-docker start ... -k changes the LOAD_FROM_DATA env variable, destroying the core and data containers in the process, but keeping the volumes. That means that the INIT_DONE_FILE in the core container is no longer present and will always try to copy apps, documents, datavalues and will try to run sql files and pre-scripts. As the data container is run first and sees the LOAD_FROM_DATA=no, it will remove ALL contents under /data, so no apps, documents, datavalues or SQL files should be present for the core container but the mapped volume /data/db/post is still there, but not reachable (there is no /data/db, so find /data/db/post fails). Two safe-lines here: data container will create /data/db after deleting /data/*; the core container will check if there is a directory or not for base_db_path
  • Second: core container launches catalina.sh run in background, then waits for it to be ready and runs post-scripts. After that hangs in a wait until the container is stopped. When stopping the container, the term signal is not reaching tomcat and after 10 seconds the container is killed, thus generating a RC=137. As the docker-compose.yml has a restart:no, this prevents the core image from loading up automatically after a reboot (a manual d2-docker start will work as it does not depend on the RC of the previous stop). So the TERM signal is captured and it launches a catalina.sh stop and waits a little for it to finish. In any case, the RC from both the wait and the cleanup is forced to 0 to avoid errors.

This may introduce some uncertainty in the core container as if the container receives a TERM or INT signal it will return with RC=0 regardless of the real RC of the stop command of the tomcat, but should not affect any errors from previous steps (like SQL or pre-scripts), so past functionality should still work.

- The first `d2-docker start ... -k` changes the LOAD_FROM_DATA env
  variable, destroying the core and data containers in the process, but
  keeping the volumes. That means that the INIT_DONE_FILE in the core
  container is no longer present and will always try to copy apps,
  documents, datavalues and will try to run sql files and pre-scripts.
  As the data container is run first and sees the LOAD_FROM_DATA=no, it
  will remove ALL contents under /data, so no apps, documents,
  datavalues or SQL files should be present for the core container but
  the mapped volume /data/db/post is still there, but not reachable
  (there is no /data/db, so `find /data/db/post` fails). Two safelines
  here: data container will create /data/db after deleting /data/*; the
  core container will check if there is a directory or not for
  base_db_path
- Second: core container launches catalina.sh run in background, then
  waits for it to be ready and runs post-scripts. After that hangs in a
  wait until the container is stopped. When stopping the container, the
  term signal is not reaching tomcat and after 10 seconds the container
  is killed, thus generating a RC=137. As the docker-compose.yml has a
  restart:no, this prevents the core image from loading up automatically
  after a reboot (a manual d2-docker start will work as it does not
  depend on the RC of the previous stop). So the TERM signal is captured
  and it launches a catalina.sh stop and waits a little for it to
  finish. In any case, the RC from both the wait and the cleanup is
  forced to 0 to avoid errors.
@cgbautista cgbautista changed the base branch from master to development March 17, 2026 18:03
@cgbautista cgbautista requested a review from idelcano March 17, 2026 18:04
…for the core container to not be restarted upon exit, to avoid mandatory SQL to be bypassed and exposing data that should have been altered/deleted.

This had the side effect of not starting the core container upon server restart.
As the core container is needed for the service, the logic has been changed so upon mandatory-SQL execution error, instead of just stopping, it removes the application and creates a flag in a named volume to persist this behavior upon core restart: it will delete the application and show a generic error, but will keep the container up and running.
This achieves a better result in both scenarios:
- In case of SQL error, the application won't work until the error is
  addressed. The db container would be working and the SQL could be run
  to fix the issue and adjust the SQL script accordingly.
- In case all works as intended, even after server restart, the
  container will behave just like the other two db/gateway containers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant