Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't recover from disk full error #758

Open
abh opened this issue Oct 23, 2024 · 1 comment
Open

Can't recover from disk full error #758

abh opened this issue Oct 23, 2024 · 1 comment

Comments

@abh
Copy link

abh commented Oct 23, 2024

Report

Disk (PVC) was full; I made the PVCs bigger and let mysql restart.

The group replication never came back. I set instance 0 to "bootstrap" and it got replication working.

The two other instances never finished "recovering" though and are just crash looping now. The logs from one of them attached.

More about the problem

mysql-1.txt

The controller doesn't have any (to me) useful information; it seems to think everything is fine-ish. I'm not sure what role the controller here has though (I'm migrating from the bitpoke operator which worked a little differently with the orchestrator exposed).

2024-10-23T16:21:26.676Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-10"}
2024-10-23T16:22:27.762Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1060a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:23:40.357Z	INFO	Crash recovery	Cluster was successfully rebooted	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65"}
2024-10-23T16:23:47.288Z	INFO	groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb	Member is not ONLINE	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "state": "RECOVERING"}
2024-10-23T16:30:19.004Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-13"}
2024-10-23T16:31:20.054Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1360a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:31:55.660Z	INFO	Crash recovery	Cluster was successfully rebooted	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940"}
2024-10-23T16:32:02.594Z	INFO	groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb	Member is not ONLINE	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "state": "OFFLINE"}

Steps to reproduce

  1. let disk run full; for example use the default configuration that doesn't limit how many binlog files are kept.
  2. watch cluster go down
  3. watch cluster not recover after disk has been added
  4. force mysql-0 to start group replication
  5. watch the replicas never recovering

Versions

  1. Kubernetes - v1.28.12
  2. Operator - 0.8.0
  3. Database - the default 8.x version from 0.8.0

Anything else?

No response

@abh
Copy link
Author

abh commented Oct 23, 2024

Oh, confusingly another bit later the two replicas recovered!

The cluster didn't seem to recover on its own from the disk space issue after more space was added though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant