Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Create documentation for Remote Cluster State #5053

Closed
1 of 4 tasks
soosinha opened this issue Sep 20, 2023 · 8 comments · Fixed by #5726
Closed
1 of 4 tasks

[DOC] Create documentation for Remote Cluster State #5053

soosinha opened this issue Sep 20, 2023 · 8 comments · Fixed by #5726
Assignees
Labels
2 - In progress Issue/PR: The issue or PR is in progress. Sev2 High-medium priority. Upcoming release or incorrect information. v2.11.0
Milestone

Comments

@soosinha
Copy link
Member

soosinha commented Sep 20, 2023

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request and all versions that are affected.

Remote Cluster State

[New sub page under https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store]
Remote Cluster State protects against any cluster state metadata loss resulting due to quorum loss (permanently losing majority of cluster manager nodes) in the cluster.

Cluster State in an internal structure which contains the metadata of the cluster along with other information. The metadata includes details about index metadata like settings, mappings, active copies of the shards, cluster level settings, aliases, templates and data streams etc. This metadata is managed by the elected cluster manager node and is essential for proper functioning of the cluster. When the cluster loses majority of the cluster manager nodes permanently, lets say 2 out of 3 cluster manager nodes are lost, then the cluster can experience data loss as there are no guarantees that latest cluster state metadata is present in the surviving nodes. Today, cluster durability is the function of the cluster manager node storage. And, persisting the state to remote provides better durability guarantees.
When remote cluster state feature is enabled, the cluster metadata will be published to a remote repository configured in the cluster. Note that, currently only index metadata will be persisted to remote store in OpenSearch 2.10.

Any time, the new cluster manager nodes are launched after disaster recovery, they will bootstrap using the index metadata stored in the remote repository automatically. Consequently, the data of the indices will also be restored when remote store is enabled.

How to use ?

Add the enabled flag and the repository settings specify below in the yml and start the cluster.

  1. Setting: cluster.remote_store.state.enabled
    Data type: boolean
    Properties: Final, NodeScope
    This settings controls the enabling of remote cluster state.

  2. Setting: node.attr.remote_store.state.repository
    Data type: String
    Properties Node attribute
    This settings specifies the repository to be used for remote cluster state storage. The actual repository settings are specifies with the prefix as node.attr.remote_store.repository.<repository_name>.*
    Both the above settings should be present in order for remote cluster state to work.

Limitations

  • Currently, only index metadata is supported for upload and restore from remote store.
  • Unsafe bootstrap script cannot be run when remote cluster state is enabled. In case a majority of cluster-manager nodes are lost and the cluster goes down, the user needs to replace any remaining cluster-manager nodes and seed the nodes again for bootstrapping a new cluster.

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

@Naarcha-AWS Naarcha-AWS added 1 - Backlog Issue: The issue is unassigned or assigned but not started v2.10.0 and removed untriaged labels Sep 20, 2023
@Naarcha-AWS Naarcha-AWS self-assigned this Sep 20, 2023
@Naarcha-AWS Naarcha-AWS added this to the v2.10 milestone Sep 20, 2023
@sachinpkale
Copy link
Member

@soosinha

[New sub page under https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/]

Shouldn't this be under https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/?

@soosinha
Copy link
Member Author

soosinha commented Sep 21, 2023

@sachinpkale It could be under https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/ but the content at the main index page is very specific to segment and translog storage. We could create a section for remote cluster state in the index page.
@shwetathareja Any thoughts ?

@sachinpkale
Copy link
Member

@sachinpkale It could be under https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/ but the content at the main index page is very specific to segment and translog storage. We could create a section for remote cluster state in the index page. @shwetathareja Any thoughts ?

@soosinha Sorry for the confusion. I was suggesting to have sub-page under remote-store instead of availability-and-recovery.

@shwetathareja
Copy link
Member

I was suggesting to have sub-page under remote-store instead of availability-and-recovery.

@sachinpkale yes i agree we should have sub page under remote-store as remote cluster state would contain metadata beyond Index Metadata as well.

@ashking94
Copy link
Member

@Naarcha-AWS Is there a PR corresponding to this issue? Wondering if we have mistakenly closed this issue.

@Naarcha-AWS
Copy link
Collaborator

@ashking94: I'll link the PR in here once its ready.

@hdhalter
Copy link
Contributor

@Naarcha-AWS - is this targeting the 2.12 release?

@Naarcha-AWS
Copy link
Collaborator

@hdhalter: This is something we missed in 2.11. Adding a PR for it shortly.

@Naarcha-AWS Naarcha-AWS added v2.11.0 Sev2 High-medium priority. Upcoming release or incorrect information. and removed v2.10.0 labels Nov 14, 2023
@Naarcha-AWS Naarcha-AWS modified the milestones: v2.10, Sprint 11.13.23 Nov 14, 2023
@Naarcha-AWS Naarcha-AWS added 2 - In progress Issue/PR: The issue or PR is in progress. and removed 1 - Backlog Issue: The issue is unassigned or assigned but not started labels Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In progress Issue/PR: The issue or PR is in progress. Sev2 High-medium priority. Upcoming release or incorrect information. v2.11.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants