Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ user, err := resile.Do(ctx, func(ctx context.Context) (*User, error) {
- [Native Chaos Engineering (Fault & Latency Injection)](#22-native-chaos-engineering-fault--latency-injection)
- [Distributed Deadline Propagation](#23-distributed-deadline-propagation)
- [Reliable File Downloads (HTTP Resumption)](#24-reliable-file-downloads-http-resumption)
- [SQL Resilience](#25-sql-resilience)
- [Built on Hyperscaler Research](#built-on-hyperscaler-research)
- [Configuration Reference](#configuration-reference)
- [Architecture & Design](#architecture--design)
Expand Down Expand Up @@ -117,6 +118,7 @@ Want to learn more about the philosophy behind Resile and advanced resilience pa
* [Native Chaos Engineering: Testing Resilience with Fault & Latency Injection](docs/articles/chaos-engineering.md)
* [Stopping the Zombie Requests: Distributed Deadline Propagation in Go](docs/articles/distributed-deadline-propagation.md)
* [Reliable File Downloads with HTTP Range Resumption](docs/articles/streaming-http-resumption.md)
* [Building Bulletproof Database Clients in Go: SQL Resilience with Resile](docs/articles/sql-resilience.md)

Also, check out our [Dev.to space](https://dev.to/onurcinar) for more articles and discussions.

Expand All @@ -139,6 +141,7 @@ The [examples/](examples/) directory contains standalone programs showing how to
- **[State Machine](examples/statemachine/main.go)**: Building resilient state machines inspired by Erlang's `gen_statem`.
- **[Chaos Injection](examples/chaos/main.go)**: Simulating faults and latency to test your policies.
- **[HTTP Resumption](examples/http_resume_stream/main.go)**: Resuming large file downloads using HTTP Range.
- **[SQL Resilience](examples/sql/main.go)**: Using Resile with standard `database/sql`.

---

Expand Down Expand Up @@ -544,6 +547,23 @@ err := resile.DoErr(ctx, func(ctx context.Context) error {

[Read more: Reliable File Downloads with HTTP Range Resumption](docs/articles/streaming-http-resumption.md)

### 25. SQL Resilience
**The Problem**: Databases are critical yet vulnerable to transient network errors and failovers.

**The Recipe**:
Wrap standard `database/sql` calls with retries and a circuit breaker to protect against both blips and systemic outages.

```go
_, err := resile.Do(ctx, func(ctx context.Context) (sql.Result, error) {
return db.ExecContext(ctx, "UPDATE users SET active = ? WHERE id = ?", true, 42)
},
resile.WithRetry(3),
resile.WithCircuitBreaker(breaker),
)
```

[Read more: Building Bulletproof Database Clients in Go: SQL Resilience with Resile](docs/articles/sql-resilience.md)

---

## Built on Hyperscaler Research
Expand Down
5 changes: 5 additions & 0 deletions docs/articles/preventing-meltdowns.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ err := cb.Execute(ctx, func() error {
})
```

### Case Study: SQL Resilience
Applying a circuit breaker to database operations is one of the most effective ways to prevent systemic meltdowns.

[Read more: Building Bulletproof Database Clients in Go: SQL Resilience with Resile](sql-resilience.md)

---

## 3. Adaptive Concurrency (The Buffer)
Expand Down
8 changes: 8 additions & 0 deletions docs/articles/sliding-window-circuit-breakers.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,14 @@ cb.Reset()

---

## Real-world Example: SQL Databases

Circuit breakers are often used to protect databases from thundering herds during recovery. By wrapping your SQL calls with a circuit breaker and retries, you can ensure your application remains responsive even when the database is struggling.

[Read more: Building Bulletproof Database Clients in Go: SQL Resilience with Resile](sql-resilience.md)

---

## Testing Your Breaker: Chaos Engineering

How do you know your failure thresholds are tuned correctly? Instead of waiting for a real outage, you can use Resile's native **Chaos Engineering** features to synthetically trip your breaker.
Expand Down
102 changes: 102 additions & 0 deletions docs/articles/sql-resilience.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Building Bulletproof Database Clients in Go: SQL Resilience with Resile

You’ve seen it before: a brief network blip causes a database connection timeout, and suddenly your logs are flooded with errors. Even worse, if the database is struggling under heavy load, your application's aggressive retry loops might just be the "final straw" that causes a complete database meltdown.

Database operations are the most critical—yet often the most fragile—part of a backend system. In this article, we'll explore how to build a resilient SQL client in Go using [Resile](https://github.com/cinar/resile) to handle transient failures gracefully while protecting your database from overload.

---

## The "Retry-Breaker" Pattern: The Golden Standard

When dealing with databases, you typically want two layers of protection:

1. **Retries**: To handle "transient" errors. These are short-lived blips like a temporary network hiccup or a row lock that clears in milliseconds.
2. **Circuit Breaker**: To handle "systemic" failures. If the database is down or undergoing a failover, retrying immediately and indefinitely will only waste resources and potentially prevent the database from recovering.

By combining them, you get the best of both worlds: you retry the small stuff, but you "stop the bleeding" when things go seriously wrong.

---

## The Implementation

Resile makes it incredibly easy to wrap standard `database/sql` calls. Because Resile is designed with Go generics and the `context` package in mind, it integrates seamlessly with existing database drivers.

Here is how you can wrap a standard `ExecContext` call:

```go
import (
"context"
"database/sql"
"time"

"github.com/cinar/resile"
"github.com/cinar/resile/circuit"
)

func UpdateUserStatus(ctx context.Context, db *sql.DB, userID int, active bool) error {
// 1. Define a circuit breaker (usually defined once at the service level)
breaker := circuit.New(circuit.Config{
WindowType: circuit.WindowCountBased,
WindowSize: 10,
MinimumCalls: 3,
FailureRateThreshold: 50.0,
ResetTimeout: time.Second,
})

// 2. Wrap the SQL call with Resile
_, err := resile.Do(ctx, func(ctx context.Context) (sql.Result, error) {
return db.ExecContext(ctx, "UPDATE users SET active = ? WHERE id = ?", active, userID)
},
resile.WithRetry(3), // Retry up to 3 times
resile.WithBaseDelay(100*time.Millisecond), // Wait between retries
resile.WithCircuitBreaker(breaker), // Trip the circuit if failures persist
)

return err
}
```

### Why This Works
* **Context Awareness**: If the user cancels the request or the deadline is reached, Resile stops retrying immediately and honors the context.
* **Exponential Backoff**: By default, Resile uses a smart backoff strategy, preventing "retry storms."
* **Shared Intelligence**: If multiple SQL calls share the same `breaker` instance, a failure in one query can help protect the entire database connection pool.

---

## The Critical Caveat: Idempotency and SQL Writes

While retries are powerful, they come with a major risk for **Write** operations (`INSERT`, `UPDATE`, `DELETE`).

Imagine this scenario:
1. Your app sends an `UPDATE` command to the database.
2. The database processes it successfully.
3. The network fails *after* the update but *before* the database can send the "OK" back to your app.
4. Resile sees a network error and **retries** the operation.

If your operation isn't **idempotent**, you might end up with duplicate data or corrupted state.

### How to Stay Safe:
* **Use Idempotency Keys**: For `INSERT` operations, use a unique request ID or a `UUID` to prevent duplicates.
* **Atomic Updates**: Use `WHERE` clauses that check for the previous state (e.g., `UPDATE orders SET status='shipped' WHERE id=123 AND status='pending'`).
* **Transactions**: Wrap complex multi-step operations in a single SQL transaction to ensure "all or nothing" execution.

---

## Beyond Simple Retries

Database resilience isn't just about trying again. In complex systems, you might want to combine SQL resilience with other Resile features:

* **[Sliding Window Circuit Breakers](sliding-window-circuit-breakers.md)**: For more accurate failure detection over time.
* **[Bulkhead Isolation](bulkhead-isolation.md)**: To ensure that a slow "Reports" query doesn't consume all database connections, starving your "User Login" flow.
* **[Chaos Engineering](chaos-engineering.md)**: To test how your application reacts when the database suddenly starts returning `500` errors or high latency.

---

## Conclusion

The standard library `database/sql` package is excellent, but it leaves resilience as an "exercise for the reader." By using Resile, you can transform a basic database client into a production-grade, self-healing system with just a few lines of declarative code.

**Ready to make your Go services more resilient?**
Check out the full project and more examples on GitHub: [github.com/cinar/resile](https://github.com/cinar/resile)

#golang #sql #database #resilience #sre #microservices #distributed-systems
5 changes: 4 additions & 1 deletion docs/articles/stop-writing-manual-loops.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,10 @@ Resile isn't just a retry loop; it's a resilience toolkit. Out of the box, you g
- **Distributed Deadline Propagation**: Abort zombie requests early and inject timeout headers.
- **Stateful Resumption**: Automatically handle partial failures by resuming from the last successful byte (e.g., in large downloads).

[Read more: Reliable File Downloads with HTTP Range Resumption](streaming-http-resumption.md)
### Case Study: SQL Resilience
Retrying database operations is a classic use case, but it requires careful handling of circuit breakers and idempotency.

[Read more: Building Bulletproof Database Clients in Go: SQL Resilience with Resile](sql-resilience.md)

---

Expand Down
38 changes: 38 additions & 0 deletions examples/sql/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# SQL Example

This example demonstrates how to use Resile with Go's standard `database/sql` package.

It wraps `db.ExecContext` call with retries and a circuit breaker. The example uses a mock SQL driver instead of a real database so it can run anywhere without extra setup.

The mock driver fails the first two attempts with a temporary error and succeeds on the third attempt. This shows Resile retrying the operation.

## Run
```bash
go run ./examples/sql
```

Expected output:
```
query succeeded after 3 attempts; rows affected: 1
```

## How It Works
```go
result, err := resile.Do(ctx, func(ctx context.Context) (sql.Result, error) {
return db.ExecContext(ctx, "UPDATE users SET active = ? WHERE id = ?",
true,
42,
)
},
resile.WithRetry(3),
resile.WithBaseDelay(100*time.Millisecond),
resile.WithCircuitBreaker(breaker),
)
```

The SQL call is wrapped with `resile.Do`.
The circuit breaker is included to show how you can stop calling a database when repeated failures suggest it is unhealthy.

## Note about SQL Writes
Be careful when retrying write queries like UPDATE, INSERT, or DELETE.
A retry in this case will run the same write more than once if the first attempt has reached the database but client received no response. In production applications, transactions, idempotency keys, unique constraints, or other safeguards are employed when retrying writes.
90 changes: 90 additions & 0 deletions examples/sql/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
// Copyright (c) 2026 Onur Cinar.
// The source code is provided under MIT License.
// https://github.com/cinar/resile

package main

import (
"context"
"database/sql"
"database/sql/driver"
"errors"
"fmt"
"github.com/cinar/resile"
"github.com/cinar/resile/circuit"
"sync/atomic"
"time"
)

var attempts atomic.Int32

// transientFailureDriver is a mock SQL driver that fails the first two calls.
type transientFailureDriver struct{}
type transientFailureConnection struct{}

func (transientFailureDriver) Open(name string) (driver.Conn, error) {
return transientFailureConnection{}, nil
}

func (transientFailureConnection) Prepare(query string) (driver.Stmt, error) {
return nil, errors.New("Prepare is not implemented")
}

func (transientFailureConnection) Begin() (driver.Tx, error) {
return nil, errors.New("transactions are not implemented")
}

func (transientFailureConnection) Close() error {
return nil
}

func (transientFailureConnection) ExecContext(ctx context.Context, query string, args []driver.NamedValue) (driver.Result, error) {
current := attempts.Add(1)
if current < 3 {
return nil, errors.New("temporary database error")
}
return driver.RowsAffected(1), nil
}

func main() {
sql.Register("transient-sql", transientFailureDriver{})

db, err := sql.Open("transient-sql", "")
if err != nil {
panic(err)
}
defer db.Close()

ctx := context.Background()

breaker := circuit.New(circuit.Config{
WindowType: circuit.WindowCountBased,
WindowSize: 10,
MinimumCalls: 3,
FailureRateThreshold: 50,
ResetTimeout: time.Second,
})

// SQL call wrapped with Resile so transient db errors can be retried
result, err := resile.Do(ctx, func(ctx context.Context) (sql.Result, error) {
return db.ExecContext(ctx, "UPDATE users SET active = ? WHERE id = ?", true, 42)
},
resile.WithRetry(3),
resile.WithBaseDelay(100*time.Millisecond),
resile.WithCircuitBreaker(breaker),
)

if err != nil {
fmt.Printf("query failed: %v\n", err)
return
}

rowsAffected, err := result.RowsAffected()
if err != nil {
fmt.Printf("failed to get rows affected: %v\n", err)
return
}

fmt.Printf("query succeeded after %d attempts; rows affected: %d\n",
attempts.Load(), rowsAffected)
}