Documenting solutions to common issues faced while restarting glusterd2 service#885
Documenting solutions to common issues faced while restarting glusterd2 service#885rishubhjain wants to merge 3 commits intogluster:masterfrom
Conversation
|
@atinmu I have just documented solutions to some of the issues faced while restarting glusterd2 service. I would like to know if this is a suitable way of documenting the solutions to the common issues? |
doc/quick-start-user-guide.md
Outdated
| The path to default directory used by glusterd2 is ["/var/lib/glusterd/"](https://github.com/gluster/glusterd2/blob/master/glusterd2.toml.example#L1), if using custom config file then please provide working directory path instead of "/var/lib/glusterd/" | ||
|
|
||
| ```sh | ||
| # rm /var/lib/glusterd/* |
There was a problem hiding this comment.
do we need to delete the complete glusterd2 directory? it will delete the old ETCD data which is stored in ETCD folder. ETCD config will be stored in store.toml (I think clearing this should be enough). If store.toml is empty glusterd2 should regenerate the store.toml based on the configuration
There was a problem hiding this comment.
This comment is still not addressed?
There was a problem hiding this comment.
@Madhu-1 I will provide a warning, and will send a new PR after testing whether its suffiecient to delete store.toml.
This is definitely a good start and we should make this as a continuous process to capture all the troubleshooting experience. However what I am additionally looking at the document is more of a concrete direction on "Do these x, y, z things before setting up the environment" |
|
@atinmu Is this PR good to go, or would you like to reformat the documenting? |
I don't see any reason why this PR can not get in. We need to get into a good habit of refreshing this document in a periodic basis or whenever we stumble upon some issues which an user can face frequently. |
doc/quick-start-user-guide.md
Outdated
| ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]" | ||
| ``` | ||
| To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2). | ||
| The path to default directory used by glusterd2 is ["/var/lib/glusterd/"](https://github.com/gluster/glusterd2/blob/master/glusterd2.toml.example#L1), if using custom config file then please provide working directory path instead of "/var/lib/glusterd/" |
There was a problem hiding this comment.
default is /var/lib/glusterd2
|
|
||
| Sample Output: | ||
| ```log | ||
| FATA[2018-06-04 06:07:22.605017] Failed to create pid file error="Process is already running" source="[main.go:87:main.main]" |
There was a problem hiding this comment.
This should not occur if pid file exists and process is not running. Please recheck
| To delete the pid file for the given output: | ||
|
|
||
| ```sh | ||
| # rm /usr/local/var/run/glusterd2/glusterd2.pid |
There was a problem hiding this comment.
If other instance of glusterd2 is running, removing pid file is not sufficient.
| ```log | ||
| FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]" | ||
| ``` | ||
| This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service. |
There was a problem hiding this comment.
Also add a note to check if any other instance of glusterd2 is running before removing the socket file
| ```log | ||
| ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]" | ||
| ``` | ||
| To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2). |
There was a problem hiding this comment.
Deleting the workdir will erase all cluster's data. Suggest the workaround only if this issue faced during re-setup.
|
@aravindavk Please review, I have made the changes |
doc/quick-start-user-guide.md
Outdated
| The path to default directory used by glusterd2 is "/var/lib/glusterd2/", if using custom config file then please provide working directory path instead of "/var/lib/glusterd2/" | ||
|
|
||
| ```sh | ||
| # rm /var/lib/glusterd/2* |
There was a problem hiding this comment.
Is this path correct? Above you write "/var/lib/glusterd2/" but here the last slash and the 2 are transposed.
prashanthpai
left a comment
There was a problem hiding this comment.
Some of the issues aren't major and may seem obvious from symptoms and error messages. We'll have to focus on documenting cluster restart and quorum loss.
| ```log | ||
| FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]" | ||
| ``` | ||
| This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service. |
There was a problem hiding this comment.
The socket file isn't cleaned up on ungraceful or abrupt shutdown such as SIGKILL or a crash.
|
|
||
| ### Solutions to common issues | ||
|
|
||
| > Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup. |
There was a problem hiding this comment.
These are hacks or temporary solutions to the problems.
Doesn't instil confidence on glusterd2, from user's perspective ;)
|
|
||
| > Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup. | ||
|
|
||
| * If glusterd service fails with error: "failed to start embedded store" |
There was a problem hiding this comment.
You don't have to start every sentence with `If glusterd service fails with error. Just the symptom should suffice.
| ```log | ||
| ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]" | ||
| ``` | ||
| To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2). |
There was a problem hiding this comment.
It's localstatedir, and not working directory.
| # rm /usr/local/var/run/glusterd2/glusterd2.pid | ||
| ``` | ||
|
|
||
| * If glusterd service fails with error: "failed to listen" |
There was a problem hiding this comment.
"failed to listen" can also mean another glusterd2 process is running. You have to be a little more specific about the socket file and what caused it
| ```log | ||
| FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]" | ||
| ``` | ||
| This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service. |
There was a problem hiding this comment.
because the socket address is already in use by old glusterd service.
It's not.
Either:
- It was being in use but wasn't cleaned up on shutdown
- There's another glusterd2 instances running
Issue: #883