Skip to content

Fixed missing OK HARD state after OK SOFT.#472

Open
ccztux wants to merge 4 commits intonaemon:masterfrom
ccztux:issue-368
Open

Fixed missing OK HARD state after OK SOFT.#472
ccztux wants to merge 4 commits intonaemon:masterfrom
ccztux:issue-368

Conversation

@ccztux
Copy link
Copy Markdown
Contributor

@ccztux ccztux commented Aug 6, 2024

This pull request fixes the missing OK HARD state after an OK SOFT state as described in #368:

image

@sni
Copy link
Copy Markdown
Contributor

sni commented Aug 26, 2024

Thanks for your pull request. I'd say what's wrong in the image is the soft recovery at 11:39:29. Which is actually a hard recovery. The soft recovery at 11:39:41 is right.
Also i don't think it is necessary to introduce a new flag for this case.

@ccztux
Copy link
Copy Markdown
Contributor Author

ccztux commented Jul 29, 2025

Apologies for the delayed response. The aim of this pull request was to resolve the issue of missing log entries for OK HARD states that are expected after a soft recovery. I didn’t notice the host’s soft recovery at all.
According to the documentation, the host entered a SOFT UP state at 11:39:41, which is considered OK. However, similar to the behavior observed with services, the corresponding HARD UP state for the host appears to be missing from the log file.
The screenshot illustrates that, as a result of the changes in this pull request, the HARD OK state at 11:44:41 is now properly logged following the SOFT OK state at 11:39:41, which reflects the expected behavior.

@ccztux
Copy link
Copy Markdown
Contributor Author

ccztux commented Jul 29, 2025

Thanks for the feedback! I'm currently working on a solution that avoids introducing a new flag.
While working on these changes, I encountered a specific situation where I'm unsure how Naemon should interpret this:

image

@sni
Copy link
Copy Markdown
Contributor

sni commented Jul 29, 2025

Entry 14 starts at 1 again since it is a new problem and there was a recovery/ok before. After any OK it starts with 1 again.

@ccztux
Copy link
Copy Markdown
Contributor Author

ccztux commented Jul 29, 2025

Thank you for your prompt reply.

@ccztux
Copy link
Copy Markdown
Contributor Author

ccztux commented Jul 30, 2025

This is how i have tested it:

cat /root/naemontest.sh

#!/usr/bin/env bash


submitExtCmd()
{
        local ext_cmd="${1}"
        local sleep_duration="${2:-1}"
        local timestamp="$(date +%s)"

        if [[ "${ext_cmd}" =~ ^(.*)(PROCESS_HOST_CHECK_RESULT)(.*)$ ]]
        then
                host_counter="$(( host_counter + 1 ))"
        elif [[ "${ext_cmd}" =~ ^(.*)(PROCESS_SERVICE_CHECK_RESULT)(.*)$ ]]
        then
                service_counter="$(( service_counter + 1 ))"
        fi

        printf '[%lu] %s\n' "${timestamp}" "${ext_cmd}" > /var/lib/naemon/naemon.cmd
        sleep "${sleep_duration}"
}



host_counter="1"
service_counter="1"

up_status_code="0"
down_status_code="1"
up_plugin_output="Host is UP"
down_plugin_output="Host is DOWN"
ok_status_code="0"
warn_status_code="1"
critical_status_code="2"
unkn_status_code="3"
ok_plugin_output="Service is OK"
warn_plugin_output="Service is WARNING"
critical_plugin_output="Service is CRITICAL"
unkn_plugin_output="Service is UNKNOWN"

host_name="zet"
service_description="Dummy"
start_time="$(date +%s)"
end_time="$(( start_time + 5 ))"
fixed="1"
trigger_id="0"
duration="3600"
author="cli"
comment="test"



submitExtCmd "PROCESS_HOST_CHECK_RESULT;${host_name};${up_status_code};${up_plugin_output} (Hostcheck count: ${host_counter})"
submitExtCmd "SCHEDULE_SVC_DOWNTIME;${host_name};${service_description};${start_time};${end_time};${fixed};${trigger_id};${duration};${author};${comment}"
submitExtCmd "PROCESS_HOST_CHECK_RESULT;${host_name};${down_status_code};${down_plugin_output} (Hostcheck count: ${host_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_HOST_CHECK_RESULT;${host_name};${up_status_code};${up_plugin_output} (Hostcheck count: ${host_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${critical_status_code};${critical_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${warn_status_code};${warn_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${unkn_status_code};${unkn_plugin_output} (Servicecheck count: ${service_counter})"
submitExtCmd "PROCESS_SERVICE_CHECK_RESULT;${host_name};${service_description};${ok_status_code};${ok_plugin_output} (Servicecheck count: ${service_counter})"
systemctl stop naemon
truncate -s 0 /var/lib/naemon/naemon.debug /var/log/naemon/naemon.log
systemctl restart naemon.service
/root/naemontest.sh

Result:

image

Host configuration:

image

Service configuration:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants