Currently the start and stop commands bypass the manager service and directly interact with the remote agents running on the cluster nodes. That is a legacy implementation and no longer makes sense for the current architecture. The commands should be re-evaluated to make sure they are useful in the current architecture.
Arguably the "start" command is no longer useful, because the manager's default action when running is to start all resources. This makes the start command somewhat redundant. The main thing that the start command offers that the manager currently doesn't handle is that it will start the MGS resource before any other Lustre targets. Perhaps it would make sense to make the manager aware of a "special" resource that must be started first?
The stop command is also not entirely useful because the manager will restart any resources it notices are stopped. In order to make "stop" useful, it would have to also "unmanage" the resources. Rather than implementing "stop" as a separate command, it may be simpler to add a --stop flag to the unmanage command that combines both actions. This would purely be an admin convenience, as the admin currently could unmanage the resource and then manually stop it.
Currently the start and stop commands bypass the manager service and directly interact with the remote agents running on the cluster nodes. That is a legacy implementation and no longer makes sense for the current architecture. The commands should be re-evaluated to make sure they are useful in the current architecture.
Arguably the "start" command is no longer useful, because the manager's default action when running is to start all resources. This makes the start command somewhat redundant. The main thing that the start command offers that the manager currently doesn't handle is that it will start the MGS resource before any other Lustre targets. Perhaps it would make sense to make the manager aware of a "special" resource that must be started first?
The stop command is also not entirely useful because the manager will restart any resources it notices are stopped. In order to make "stop" useful, it would have to also "unmanage" the resources. Rather than implementing "stop" as a separate command, it may be simpler to add a
--stopflag to theunmanagecommand that combines both actions. This would purely be an admin convenience, as the admin currently could unmanage the resource and then manually stop it.