Skip to content

Comments

Add ability to log timings of some events...#248

Draft
VGoshev wants to merge 5 commits intomasterfrom
add-log-timing-ability
Draft

Add ability to log timings of some events...#248
VGoshev wants to merge 5 commits intomasterfrom
add-log-timing-ability

Conversation

@VGoshev
Copy link
Collaborator

@VGoshev VGoshev commented Feb 11, 2026

Add configurable support for logging different timings via calling of external tool.
Currently are supported following events:

  • master_unavailable
  • node_offline
  • switchover_failed
  • failover_failed
  • switchover_complete
  • failover_complete

Copy link
Contributor

@secwall secwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we need to add tests on this.

go r.send(eventType, duration)
}

func (r *timingReporter) send(eventType string, duration time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach leads to unlimited number of simultaneously running commands. Shouldn't we try a different approach? Like running a goroutine that reads a buffered channel of events and runs a single command to send them one by one?

@VGoshev VGoshev force-pushed the add-log-timing-ability branch from 8810c04 to f257da9 Compare February 16, 2026 19:50
command: conf.EventTimingNotifyCommand,
argsFmt: conf.EventTimingNotifyArgs,
logger: logger,
events: make(chan timingEvent, 64), // buffered channel to prevent blocking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 is reused later in test. Why isn't it a constant?

// Give worker time to process the event
time.Sleep(100 * time.Millisecond)

// If we reach here without panic or deadlock, the test passes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this test is meaningful? We don't even check that the command was executed.

func TestReportTimingChannelFull(t *testing.T) {
conf := &config.Config{
EventTimingNotifyCommand: "sleep",
EventTimingNotifyArgs: []string{"10"}, // slow command to keep worker busy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep in linux could sleep for infinity.

// Report master unavailability duration when failover is approved
if failTime, ok := app.nodeFailTime[master]; ok {
dur := time.Since(failTime)
app.timings.reportTiming("master_unavailable", dur)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should move all such strings into constants to ease the maintenance (imagine that we need to rename the string).

maintenance_file: /var/run/rdsync.maintenance
daemon_lock_file: /var/run/rdsync.lock
event_timing_notify_command: "sh"
event_timing_notify_args: ["-c", "echo {event} {duration_ms} >> /var/log/rdsync_events.log"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this is a command that just logs into a logfile makes me think that we could just use a separate logger instead of all this machinery. Why don't we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants