Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestLeaseQueueSwitchesLeaseType failed #132797

Open
cockroach-teamcity opened this issue Oct 16, 2024 · 2 comments · May be fixed by #132798
Open

kv/kvserver: TestLeaseQueueSwitchesLeaseType failed #132797

cockroach-teamcity opened this issue Oct 16, 2024 · 2 comments · May be fixed by #132798
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 16, 2024

kv/kvserver.TestLeaseQueueSwitchesLeaseType failed with artifacts on master @ 88faab92fdf2566d3ccb710374aa9f285f077906:

*   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1515
*   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processTick
*   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:711
*   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker
*   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410
*   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2
*   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319
*   | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
*   | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498
*   | runtime.goexit
*   | 	src/runtime/asm_arm64.s:1222
* Wraps: (2) panic: can only defortify at current term if told by the leader or if fortification has expired
* Error types: (1) *withstack.withStack (2) *errutil.leafError
*
panic: can only defortify at current term if told by the leader or if fortification has expired [recovered]
	panic: can only defortify at current term if told by the leader or if fortification has expired

goroutine 77414 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x400c1e9218?, {0x8476ed0, 0x40073907b0})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:226 +0x68
panic({0x5cd5b20?, 0x400c02a590?})
	GOROOT/src/runtime/panic.go:770 +0x124
github.com/cockroachdb/cockroach/pkg/raft.assertTrue(...)
	github.com/cockroachdb/cockroach/pkg/raft/util.go:326
github.com/cockroachdb/cockroach/pkg/raft.(*raft).deFortify(0x2?, 0x400574dd80?, 0x17ff0ef5c34e3075?)
	github.com/cockroachdb/cockroach/pkg/raft/raft.go:2310 +0xc8
github.com/cockroachdb/cockroach/pkg/raft.(*raft).sendDeFortify(0x4002ecd6b0?, 0x400d273200?)
	github.com/cockroachdb/cockroach/pkg/raft/raft.go:838 +0x38
github.com/cockroachdb/cockroach/pkg/raft.(*raft).bcastDeFortify.func1(0x60dc5a0?, 0x4002ecd6b0?)
	github.com/cockroachdb/cockroach/pkg/raft/raft.go:887 +0x28
github.com/cockroachdb/cockroach/pkg/raft/tracker.(*ProgressTracker).Visit(0x40092da958, 0x400c1e9518)
	github.com/cockroachdb/cockroach/pkg/raft/tracker/progresstracker.go:106 +0x180
github.com/cockroachdb/cockroach/pkg/raft.(*raft).bcastDeFortify(0x40092da900)
	github.com/cockroachdb/cockroach/pkg/raft/raft.go:886 +0x5c
github.com/cockroachdb/cockroach/pkg/raft.(*raft).tickElection(0x40092da900)
	github.com/cockroachdb/cockroach/pkg/raft/raft.go:1078 +0x58
github.com/cockroachdb/cockroach/pkg/raft.(*RawNode).Tick(...)
	github.com/cockroachdb/cockroach/pkg/raft/rawnode.go:68
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).tick(0x400d64a388, {0x8476ed0, 0x4005964690}, 0x401123e720, 0x401a117b48)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1515 +0x7c8
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processTick(0x400b941c08, {0x4008576e10?, 0x1?}, 0x1?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:711 +0x140
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker(0x400af2c770, {0x8476ed0, 0x40073907b0}, {0x849cf20, 0x400b941c08}, 0x4006329208)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 +0x3b0
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2({0x8476ed0?, 0x40073907b0?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 +0x4c
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2({0x6ac3442?, 0x4007b73a40?})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 +0x198
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx in goroutine 77337
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:488 +0x384
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-43282

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Oct 16, 2024
@arulajmani arulajmani self-assigned this Oct 16, 2024
@arulajmani

This comment was marked as outdated.

@arulajmani
Copy link
Collaborator

arulajmani commented Oct 17, 2024

What I said above is correct, but I don't think that's why we're seeing the assertion fire. The assertion is firing because:

  • Leader steps down. Starts broadcasting MsgDeFortifyLeader
  • New leader gets elected. The old leader learns about this and advances its term.
  • The old leader continues to de-fortify itself (we haven't landed raft: stop broadcasting MsgDeFortifyLeader once it is safe  #132616 yet).
  • We shouldn't call into deFortify in this case, as the term has advanced. We hit the panic, because we're trying to de-fortify a newer term when we shouldn't.

Turns out, I actually fixed this in #132616 -- I'll just pull that out into a separate patch.

arulajmani added a commit to arulajmani/cockroach that referenced this issue Oct 17, 2024
Typically, we perform term comparisons up top in Step. If a message
affects a term that's older than the term known to a raft peer it's
discarded. When a raft peer de-fortifies itself, it doesn't send itself
a self-addressed message. As a result, we need to roll out these term
checks ourselves -- otherwise, as we saw in the assertion failure in the
linked issue, we could end up trying to incorrectly de-fortify a newer
fortified leader.

Fixes cockroachdb#132797

Release note: None
@arulajmani arulajmani linked a pull request Oct 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants