fix(rpc): derive /raft/node leadership from raft leader ID by auricom · Pull Request #3266 · evstack/ev-node

auricom · 2026-04-20T09:28:28Z

Tested this PR on a 3 members raft cluster, with this fix, the leader detection displayed on /raft/node endpoint is instantaneous instead of taking ~ 30 sedonds.

Overview

Summary by CodeRabbit

New Features
- Added public API for safe leader ID retrieval from raft nodes with null-safety checks.
- Enhanced leader status detection with fallback mechanism for improved reliability.
Tests
- Added comprehensive HTTP endpoint tests covering leader status, empty leader scenarios, and HTTP method validation.

github-actions · 2026-04-20T09:28:43Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Apr 20, 2026, 10:04 AM

claude · 2026-04-20T09:28:45Z

Claude finished @auricom's task in 1m 45s —— View job

Code Review

Gather context and read changed files
Review pkg/raft/node.go
Review pkg/rpc/server/http.go
Review pkg/rpc/server/server.go
Review pkg/rpc/server/http_test.go
Post final review

Summary

This PR correctly fixes the stale leadership state bug by deriving is_leader from the authoritative LeaderID() comparison (leaderID == nodeID) rather than trusting the possibly-lagging IsLeader() state machine field. The implementation is clean and the latest commits have already addressed the initial CodeRabbit feedback (doc comment on LeaderID, consolidating to a table-driven test, adding the critical "leader differs" case).

What's Good

Correct root cause fix: Comparing LeaderID() to NodeID() is the right approach — raft.State() can lag after a leadership transition, while LeaderWithID() reflects the current consensus view. (http.go:144-146)
Nil safety: LeaderID() guards against both nil receiver and nil inner raft field, matching the pattern of IsLeader() and HasQuorum(). (node.go:265-270)
Good doc comment: The stale-value caveat and HasQuorum cross-check guidance on LeaderID() is exactly the right kind of documentation for a concurrent state accessor.
Complete test matrix: The table-driven test includes the exact failure scenario this PR targets (leaderID="node-b", nodeID="node-a" → is_leader=false) plus the empty-fallback and method-not-allowed cases.

Issues

Suggestion: expose `leader_id` in the `/raft/node` response

The response currently only returns is_leader and node_id, but operators troubleshooting a split-brain or stale-leader scenario need to know which node is the leader — not just whether the queried node is it. Adding leader_id to the response body is a low-cost change that would eliminate the need to poll every node.

// pkg/rpc/server/http.go
rsp := struct {
    IsLeader bool   `json:"is_leader"`
    NodeID   string `json:"node_id"`
    LeaderID string `json:"leader_id"`
}{
    IsLeader: isLeader,
    NodeID:   raftNode.NodeID(),
    LeaderID: leaderID,
}

Fix this →

Minor: JSON encode error handling fires after headers are sent

In http.go:155-157:

if err := json.NewEncoder(w).Encode(rsp); err != nil {
    http.Error(w, "Internal Server Error", http.StatusInternalServerError)
}

http.Error is called after Encode has already begun writing to w, so the 500 status can't be sent — this will produce a superfluous response.WriteHeader log warning. The correct pattern is to encode into a bytes.Buffer first, then write atomically. That said, encoding a struct of bool + string should never fail in practice, so the real-world impact is zero. This is also a pre-existing pattern in the file, not introduced by this PR.

Minor: nil `store` in raft-node tests could mask panics

http_test.go:122 passes nil as the store argument. Since nil is also passed for mux.HandleFunc("/health/live", ...) which calls s.Height(...), an accidental request to that endpoint would panic. The tests are focused enough that this isn't a real risk, but using a mocks.NewMockStore(t) (even without expectations) would be safer.

Minor: `LeaderID()` nil-guard paths not covered by tests

Codecov flags 4 lines in node.go as uncovered — these are the n == nil and n.raft == nil guards in LeaderID(). The existing IsLeader() and HasQuorum() guards have similar gaps. A unit test like:

func TestLeaderID_nilNode(t *testing.T) {
    var n *Node
    assert.Equal(t, "", n.LeaderID())
}

would cover the nil-receiver path cheaply. Fix this →

Verdict

The core fix is correct and well-tested. The suggestions above are quality-of-life improvements rather than blockers. Good to merge once the PR description is filled in (as flagged by CodeRabbit's pre-merge checks).

coderabbitai · 2026-04-20T09:28:47Z

📝 Walkthrough

Walkthrough

A new public LeaderID() method is added to the Node type, the RaftNodeSource interface is updated to include this method, the /raft/node HTTP endpoint now derives leader status by comparing LeaderID to NodeID, and comprehensive tests are added for the endpoint.

Changes

Cohort / File(s)	Summary
Core API Addition `pkg/raft/node.go`	Added public `LeaderID() string` method that safely returns the current raft leader ID, or empty string if receiver or raft is nil.
Interface Update `pkg/rpc/server/server.go`	Extended `RaftNodeSource` interface with new `LeaderID() string` method requirement.
HTTP Endpoint Update `pkg/rpc/server/http.go`	Modified `/raft/node` GET handler to derive `is_leader` by comparing `LeaderID()` to `NodeID()`, falling back to `IsLeader()` when LeaderID is empty.
Test Coverage `pkg/rpc/server/http_test.go`	Added test helper type `testRaftNodeSource` and three new tests verifying `/raft/node` endpoint behavior for leader status derivation, empty leader ID fallback, and HTTP method validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A leader's beacon, once shrouded and veiled,
Now hops forth in code, where queries prevail! 🐰
LeaderID shines bright through the interface's call,
With tests standing guard to certify it all! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description includes a link to the related issue and mentions testing, but lacks detailed explanation of the problem, solution, and technical rationale.	Expand the description to explain why deriving leadership from raft leader ID is more reliable, what the previous approach was, and provide more context about the 30-second delay issue.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: introducing leader ID-based leadership derivation in the /raft/node endpoint.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/3265-raft-node-leadership-state

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

pkg/raft/node.go (1)

211-216: Add a doc comment for exported LeaderID.

Other exported methods on Node (e.g., IsLeader, HasQuorum) are documented; LeaderID should be as well.

📝 Proposed doc

+// LeaderID returns the raft leader's server ID, or "" if unknown or the node is not initialized.
 func (n *Node) LeaderID() string {
 	if n == nil || n.raft == nil {
 		return ""
 	}
 	return n.leaderID()
 }

As per coding guidelines: "Document exported types and functions in Go code".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/raft/node.go` around lines 211 - 216, Add a Go doc comment for the
exported function LeaderID on type Node: place a comment immediately above func
(n *Node) LeaderID() string that describes what LeaderID returns (the current
cluster leader's ID as a string), its behavior when the receiver is nil or
uninitialized (returns empty string), and any guarantees or semantics (e.g., may
be stale until raft leadership changes or only valid when raft != nil). Use the
style of the existing comments for Node.IsLeader and Node.HasQuorum for
consistency.

pkg/rpc/server/http_test.go (2)

71-75: Test intent would be clearer by removing the misleading isLeader: false.

Setting isLeader: false while asserting body.IsLeader == true is correct (the handler derives from LeaderID), but makes the test read ambiguously — a reader might think the assertion is wrong. Either drop the field (zero value is false anyway, which actually reinforces the point) or add a comment noting that IsLeader() is intentionally stale here to verify LeaderID() takes precedence.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/rpc/server/http_test.go` around lines 71 - 75, The test uses
testRaftNodeSource with isLeader:false while asserting body.IsLeader == true,
which is confusing; update the test by either removing the isLeader field (let
it use the zero value) or add a short clarifying comment next to the
testRaftNodeSource instantiation stating that IsLeader() is intentionally stale
and the handler should derive leadership from LeaderID() via LeaderID();
reference testRaftNodeSource, IsLeader, LeaderID, and body.IsLeader in the
comment or change so the intent is explicit.

67-147: Consider consolidating the three /raft/node tests into a table-driven test.

The three new tests are largely duplicated setup/teardown with only input and expected output differing. A single table-driven test would be more idiomatic and easier to extend (e.g., leader mismatch case: leaderID="node-b", nodeID="node-a" → is_leader=false, which is currently not covered and is arguably the most important new behavior).

♻️ Suggested structure

func TestRegisterCustomHTTPEndpoints_RaftNodeStatus(t *testing.T) {
    specs := map[string]struct {
        src            testRaftNodeSource
        method         string
        expStatus      int
        expIsLeader    bool
        expNodeID      string
    }{
        "leader id matches node id":        {src: testRaftNodeSource{leaderID: "node-a", nodeID: "node-a"}, method: http.MethodGet, expStatus: http.StatusOK, expIsLeader: true, expNodeID: "node-a"},
        "leader id differs from node id":   {src: testRaftNodeSource{leaderID: "node-b", nodeID: "node-a"}, method: http.MethodGet, expStatus: http.StatusOK, expIsLeader: false, expNodeID: "node-a"},
        "empty leader id falls back":       {src: testRaftNodeSource{isLeader: false, leaderID: "", nodeID: "node-a"}, method: http.MethodGet, expStatus: http.StatusOK, expIsLeader: false, expNodeID: "node-a"},
        "non-GET is rejected":              {src: testRaftNodeSource{}, method: http.MethodPost, expStatus: http.StatusMethodNotAllowed},
    }
    for name, tc := range specs {
        t.Run(name, func(t *testing.T) { /* ... */ })
    }
}

Adding the "leader id differs from node id" case is especially worthwhile — it's the exact scenario the fix targets (node that stepped down but previously reported itself as leader). As per coding guidelines: "Use table-driven tests where appropriate".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/rpc/server/http_test.go` around lines 67 - 147, Consolidate the three
near-duplicate tests TestRegisterCustomHTTPEndpoints_RaftNodeStatus,
TestRegisterCustomHTTPEndpoints_RaftNodeStatusFallsBackWithoutLeaderID, and
TestRegisterCustomHTTPEndpoints_RaftNodeStatusMethodNotAllowed into a single
table-driven test that iterates over cases (using testRaftNodeSource entries)
and calls RegisterCustomHTTPEndpoints for each case; include cases for
leader-matches (leaderID == nodeID → is_leader true), leader-differs (leaderID
!= nodeID → is_leader false), empty leaderID fallback, and non-GET method
expecting StatusMethodNotAllowed, and assert expected HTTP status and decoded
response fields (IsLeader, NodeID) per case.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/raft/node.go`:
- Around line 211-216: Add a Go doc comment for the exported function LeaderID
on type Node: place a comment immediately above func (n *Node) LeaderID() string
that describes what LeaderID returns (the current cluster leader's ID as a
string), its behavior when the receiver is nil or uninitialized (returns empty
string), and any guarantees or semantics (e.g., may be stale until raft
leadership changes or only valid when raft != nil). Use the style of the
existing comments for Node.IsLeader and Node.HasQuorum for consistency.

In `@pkg/rpc/server/http_test.go`:
- Around line 71-75: The test uses testRaftNodeSource with isLeader:false while
asserting body.IsLeader == true, which is confusing; update the test by either
removing the isLeader field (let it use the zero value) or add a short
clarifying comment next to the testRaftNodeSource instantiation stating that
IsLeader() is intentionally stale and the handler should derive leadership from
LeaderID() via LeaderID(); reference testRaftNodeSource, IsLeader, LeaderID, and
body.IsLeader in the comment or change so the intent is explicit.
- Around line 67-147: Consolidate the three near-duplicate tests
TestRegisterCustomHTTPEndpoints_RaftNodeStatus,
TestRegisterCustomHTTPEndpoints_RaftNodeStatusFallsBackWithoutLeaderID, and
TestRegisterCustomHTTPEndpoints_RaftNodeStatusMethodNotAllowed into a single
table-driven test that iterates over cases (using testRaftNodeSource entries)
and calls RegisterCustomHTTPEndpoints for each case; include cases for
leader-matches (leaderID == nodeID → is_leader true), leader-differs (leaderID
!= nodeID → is_leader false), empty leaderID fallback, and non-GET method
expecting StatusMethodNotAllowed, and assert expected HTTP status and decoded
response fields (IsLeader, NodeID) per case.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02c32518-6288-421f-9aab-a1309d349144

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7323f and 2b3e5f9.

📒 Files selected for processing (4)

pkg/raft/node.go
pkg/rpc/server/http.go
pkg/rpc/server/http_test.go
pkg/rpc/server/server.go

codecov · 2026-04-20T09:32:39Z

Codecov Report

❌ Patch coverage is 55.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.54%. Comparing base (d9046e6) to head (ed2f57d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/raft/node.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3266      +/-   ##
==========================================
+ Coverage   62.52%   62.54%   +0.02%     
==========================================
  Files         122      122              
  Lines       13012    13020       +8     
==========================================
+ Hits         8136     8144       +8     
+ Misses       3992     3990       -2     
- Partials      884      886       +2

Flag	Coverage Δ
combined	`62.54% <55.55%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add Go doc comment to Node.LeaderID() describing return value, nil-safety, and staleness semantics, consistent with IsLeader/HasQuorum style. - Consolidate three near-duplicate TestRegisterCustomHTTPEndpoints_RaftNodeStatus* tests into a single table-driven test covering: leaderID==nodeID (is_leader true), leaderID!=nodeID (is_leader false), empty leaderID fallback, and non-GET method (405). Clarifies that is_leader is derived from LeaderID(), not the IsLeader() field on testRaftNodeSource. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Separate third-party imports from evstack/ev-node-prefixed imports per project gci custom-order config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(rpc): derive /raft/node leadership from raft leader ID

2b3e5f9

github-actions bot assigned auricom Apr 20, 2026

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

auricom and others added 3 commits April 20, 2026 11:54

Merge branch 'main' into fix/3265-raft-node-leadership-state

4677b5b

fix(rpc): fix gci import grouping in http_test.go

ed2f57d

Separate third-party imports from evstack/ev-node-prefixed imports per project gci custom-order config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

auricom enabled auto-merge April 20, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rpc): derive /raft/node leadership from raft leader ID#3266

fix(rpc): derive /raft/node leadership from raft leader ID#3266
auricom wants to merge 4 commits intomainfrom
fix/3265-raft-node-leadership-state

auricom commented Apr 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

claude bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

auricom commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

github-actions bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

What's Good

Issues

Suggestion: expose leader_id in the /raft/node response

Minor: JSON encode error handling fires after headers are sent

Minor: nil store in raft-node tests could mask panics

Minor: LeaderID() nil-guard paths not covered by tests

Verdict

Uh oh!

coderabbitai bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

auricom commented Apr 20, 2026 •

edited

Loading

github-actions bot commented Apr 20, 2026 •

edited

Loading

claude bot commented Apr 20, 2026 •

edited

Loading

Suggestion: expose `leader_id` in the `/raft/node` response

Minor: nil `store` in raft-node tests could mask panics

Minor: `LeaderID()` nil-guard paths not covered by tests

coderabbitai bot commented Apr 20, 2026 •

edited

Loading

codecov bot commented Apr 20, 2026 •

edited

Loading