Skip to content

Sometimes failed cluster causes sql driver panic #41

@axllent

Description

@axllent

Hi @otoolep! I've been doing some cluster testing in my local network using three computers running rqlited, and a single instance of Mailpit. While ingesting a constant stream of incoming emails via Mailpit (thus writing as fast as rqlite can handle) I have been randomly dropping out one of the three rqlite nodes to test fault tolerance / recovery etc. rqlited seems to handle this very elegantly, but every now and then I get a panic from the gorqlite sql driver panic: runtime error: index out of range [0] with length 0:

ERRO[2024/04/14 16:04:21] [db] error storing message: tried all peers unsuccessfully. here are the results:
   peer #0: http://192.168.7.2:4001/db/execute?timings&level=weak&transaction failed due to Post "http://192.168.7.2:4001/db/execute?timings&level=weak&transaction": dial tcp 192.168.7.2:4001: connect: connection refused
   peer #1: http://192.168.7.50:4001/db/execute?timings&level=weak&transaction failed, got: 503 Service Unavailable, message: leader not found

   peer #2: http://192.168.7.1:4001/db/execute?timings&level=weak&transaction failed, got: 503 Service Unavailable, message: leader not found
 
panic: runtime error: index out of range [0] with length 0

goroutine 376180 [running]:
github.com/rqlite/gorqlite.(*Connection).WriteOneParameterizedContext(...)
        /home/ralph/.cache/go/pkg/mod/github.com/rqlite/gorqlite@v0.0.0-20240227123050-397b03f02418/write.go:79
github.com/rqlite/gorqlite/stdlib.(*Stmt).ExecContext(0xc008ce1410, {0x108d1b8, 0x1745480}, {0xc000e79860, 0xa, 0x1083460?})
        /home/ralph/.cache/go/pkg/mod/github.com/rqlite/gorqlite@v0.0.0-20240227123050-397b03f02418/stdlib/sql.go:96 +0x3f8
database/sql.ctxDriverStmtExec({0x108d1b8, 0x1745480}, {0x108d730, 0xc008ce1410}, {0xc000e79860, 0xa, 0x108cb80?})
        /usr/lib/go-1.22/src/database/sql/ctxutil.go:65 +0xa3
database/sql.resultFromStatement({0x108d1b8, 0x1745480}, {0x108cb80, 0xc00011c518}, 0xc000cf5438, {0xc000cf58b0, 0xa, 0xa})
        /usr/lib/go-1.22/src/database/sql/sql.go:2670 +0x13a
database/sql.(*DB).execDC(0x0?, {0x108d1b8, 0x1745480}, 0xc0001cc750, 0x10?, {0xc0043c1ae0, 0x99}, {0xc000cf58b0, 0xa, 0xa})
        /usr/lib/go-1.22/src/database/sql/sql.go:1722 +0x42c
database/sql.(*Tx).ExecContext(0xc007d56a80, {0x108d1b8, 0x1745480}, {0xc0043c1ae0, 0x99}, {0xc000cf58b0, 0xa, 0xa})
        /usr/lib/go-1.22/src/database/sql/sql.go:2506 +0xad
database/sql.(*Tx).Exec(...)
        /usr/lib/go-1.22/src/database/sql/sql.go:2515
...

exit status 2

I've tried to find the cause of the panic and I can clearly see where it is happening, just not why it is happening. WriteOne(), WriteOneContext(), WriteOneParameterized() and WriteOneParameterizedContext() are all potentially affected by this too as they all return wra[0], err (eg: here). If wra[0] does not exist (ie: empty slice) you will get a panic. I just can't work out what is causing wra to be empty as your error handling appears to append the error as a result. The only thing I can think of is if the final:

 else {
	return results, nil
}

... if results has no results... Hopefully this means more to you than me?

Ensuring the wra[0] exists before trying to return it is obviously the safest solution (though maybe not so elegant), and I do not know what the consequences of that are "down the food chain" (you're way more familiar with the code than I am).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions