Can't recover from disk full error #758

abh · 2024-10-23T16:42:19Z

Report

Disk (PVC) was full; I made the PVCs bigger and let mysql restart.

The group replication never came back. I set instance 0 to "bootstrap" and it got replication working.

The two other instances never finished "recovering" though and are just crash looping now. The logs from one of them attached.

More about the problem

mysql-1.txt

The controller doesn't have any (to me) useful information; it seems to think everything is fine-ish. I'm not sure what role the controller here has though (I'm migrating from the bitpoke operator which worked a little differently with the orchestrator exposed).

2024-10-23T16:21:26.676Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-10"}
2024-10-23T16:22:27.762Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1060a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:23:40.357Z	INFO	Crash recovery	Cluster was successfully rebooted	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65"}
2024-10-23T16:23:47.288Z	INFO	groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb	Member is not ONLINE	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "state": "RECOVERING"}
2024-10-23T16:30:19.004Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-13"}
2024-10-23T16:31:20.054Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1360a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:31:55.660Z	INFO	Crash recovery	Cluster was successfully rebooted	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940"}
2024-10-23T16:32:02.594Z	INFO	groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb	Member is not ONLINE	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "state": "OFFLINE"}

Steps to reproduce

let disk run full; for example use the default configuration that doesn't limit how many binlog files are kept.
watch cluster go down
watch cluster not recover after disk has been added
force mysql-0 to start group replication
watch the replicas never recovering

Versions

Kubernetes - v1.28.12
Operator - 0.8.0
Database - the default 8.x version from 0.8.0

Anything else?

No response

The text was updated successfully, but these errors were encountered:

abh · 2024-10-23T16:59:02Z

Oh, confusingly another bit later the two replicas recovered!

The cluster didn't seem to recover on its own from the disk space issue after more space was added though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't recover from disk full error #758

Can't recover from disk full error #758

abh commented Oct 23, 2024

abh commented Oct 23, 2024

Can't recover from disk full error #758

Can't recover from disk full error #758

Comments

abh commented Oct 23, 2024

Report

More about the problem

Steps to reproduce

Versions

Anything else?

abh commented Oct 23, 2024