Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated RFS to filter out system indices by default #763

Merged
merged 3 commits into from
Jun 25, 2024

Conversation

chelma
Copy link
Member

@chelma chelma commented Jun 25, 2024

Description

  • Updated the RFS index creation and document migration code to filter out system indices (those starting w/ .) by default. The user can still set an explicit allowlist to bypass this default behavior

Issues Resolved

Bug fix

Testing

Tested manually on my laptop. Spun up the docker compose setup for RFS and manually added a new index called .test_should_not_migrate:

chelma@80a9970a4d02 opensearch-migrations % curl -X GET "http://localhost:19200/_cat/indices?v"
health status index                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logs-221998              g2fDhAowSeO2L7bK0IFkOg   5   0       1000            0    165.7kb        165.7kb
green  open   logs-211998              GFJ4chDGT4mv9RnYCw0bVg   5   0       1000            0    162.3kb        162.3kb
green  open   geonames                 CISRqtNmQnqWDtwYuijppA   5   0       1000            0    402.9kb        402.9kb
green  open   reindexed-logs           fWxInXixRV6Fz1DKJum0vQ   5   0          0            0        1kb            1kb
green  open   logs-231998              Aa9lh7FbRd-0MpMyNGLvDA   5   0       1000            0    168.6kb        168.6kb
green  open   nyc_taxis                YYcnszK-QqCjXpYMoNajzA   1   0       1000            0    171.4kb        171.4kb
green  open   logs-241998              _ZVD86l3RWWRTs8oRZf40w   5   0       1000            0    169.8kb        169.8kb
green  open   sonested                 tJapkWRjQjOiCk4iOkgBYQ   1   0       2977            0    460.2kb        460.2kb
green  open   .test_should_not_migrate 3QyGw5UwRkGR6w-MY6eDgA   1   0          0            0       208b           208b
green  open   logs-181998              Vor2q1PXTbOsR4G0jN9VHw   5   0       1000            0    166.2kb        166.2kb
green  open   logs-201998              -GiCtDk2QqKJGFJCLMCCww   5   0       1000            0    162.4kb        162.4kb
green  open   logs-191998              VP9IYK_ZSi6ldlX6FgwJ5A   5   0       1000            0    164.2kb        164.2kb

I then took a snapshot and ran the metadata migration code, which skipped that that index:

chelma@80a9970a4d02 opensearch-migrations % ./gradlew MetadataMigration:run --args='--snapshot-name reindex-from-snapshot --s3-local-dir /tmp/s3_files --s3-repo-uri s3://chelma-iad-rfs-local-testing --s3-region us-east-1 --target-
host http://localhost:29200'

> Task :MetadataMigration:run
09:01:21.986 INFO  Running RfsWorker
09:01:22.518 INFO  Migrating the Templates...
09:01:23.198 INFO  Downloading file from S3: s3://chelma-iad-rfs-local-testing/index-0 to /tmp/s3_files/index-0
09:01:23.414 INFO  Downloading file from S3: s3://chelma-iad-rfs-local-testing/meta-kwYbF_W8Q9CxljUswfLDOQ.dat to /tmp/s3_files/meta-kwYbF_W8Q9CxljUswfLDOQ.dat
09:01:23.582 INFO  Setting Global Metadata
09:01:23.582 INFO  Setting Legacy Templates...
09:01:23.582 INFO  No Legacy Templates in specified allowlist
09:01:23.582 INFO  Setting Component Templates...
09:01:23.582 INFO  No Component Templates in Snapshot
09:01:23.582 INFO  Setting Index Templates...
09:01:23.582 INFO  No Index Templates in Snapshot
09:01:23.582 INFO  Templates migration complete
09:01:23.587 INFO  Downloading file from S3: s3://chelma-iad-rfs-local-testing/indices/uRQQooLMS9SZpVtIJL1nKA/meta-e4ezT5ABNFsALwOUKeS3.dat to /tmp/s3_files/indices/uRQQooLMS9SZpVtIJL1nKA/meta-e4ezT5ABNFsALwOUKeS3.dat
09:01:23.853 ERROR Unable to load io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider, fallback to system defaults. This may result in incorrect DNS resolutions on MacOS. Check whether you have a dependency on 'io.nett
y:netty-resolver-dns-native-macos'. Use DEBUG level to see the full stack: java.lang.UnsatisfiedLinkError: failed to load the required native library
09:01:24.004 INFO  Index nyc_taxis created successfully
09:01:24.004 INFO  Index .test_should_not_migrate rejected by allowlist  <---------- SEE HERE
09:01:24.004 INFO  Downloading file from S3: s3://chelma-iad-rfs-local-testing/indices/H90080DJSKuFpVRa_zs9Qg/meta-gYezT5ABNFsALwOUKuTX.dat to /tmp/s3_files/indices/H90080DJSKuFpVRa_zs9Qg/meta-gYezT5ABNFsALwOUKuTX.dat
09:01:24.206 INFO  Index sonested created successfully
.
.
.

And ran the document migration code, which also skipped it:

chelma@80a9970a4d02 opensearch-migrations % ./gradlew DocumentsFromSnapshotMigration:run --args='--snapshot-name reindex-from-snapshot --s3-local-dir /tmp/s3_files --s3-repo-uri s3://chelma-iad-rfs-local-testing --s3-region us-eas
t-1 --lucene-dir /tmp/lucene_files --target-host http://localhost:29200'

> Task :DocumentsFromSnapshotMigration:run
09:01:56.885 INFO  Running RfsWorker
09:01:57.251 INFO  Creating .migrations_working_state because it's HEAD check returned 404
09:01:57.305 WARN  Retrying setup-.migrations_working_state because the predicate failed for: (com.rfs.cms.ApacheHttpClient$1@6232ffdb,[ statusCode: 200, payload: {"acknowledged":true,"shards_acknowledged":true,"index":".migration
s_working_state"}])
09:01:57.562 INFO  Setting up the Documents Work Items...
09:01:58.200 INFO  Index nyc_taxis has 1 shards
09:01:58.201 INFO  Creating Documents Work Item for index: nyc_taxis, shard: 0
09:01:58.214 INFO  Index .test_should_not_migrate rejected by allowlist <----- SEE HERE
09:01:58.215 INFO  Index sonested has 1 shards
09:01:58.215 INFO  Creating Documents Work Item for index: sonested, shard: 0
09:01:58.222 INFO  Index logs-241998 has 5 shards
09:01:58.222 INFO  Creating Documents Work Item for index: logs-241998, shard: 0
09:01:58.227 INFO  Creating Documents Work Item for index: logs-241998, shard: 1
09:01:58.231 INFO  Creating Documents Work Item for index: logs-241998, shard: 2
09:01:58.233 INFO  Creating Documents Work Item for index: logs-241998, shard: 3
09:01:58.236 INFO  Creating Documents Work Item for index: logs-241998, shard: 4
.
.
.

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chris Helma <chelma+github@amazon.com>
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 45 lines in your changes missing coverage. Please review.

Project coverage is 68.41%. Comparing base (73f2092) to head (63fb90b).
Report is 2 commits behind head on main.

Files Patch % Lines
...rc/main/java/com/rfs/worker/ShardWorkPreparer.java 0.00% 19 Missing ⚠️
RFS/src/main/java/com/rfs/worker/IndexRunner.java 0.00% 15 Missing ⚠️
RFS/src/main/java/com/rfs/common/FilterScheme.java 0.00% 7 Missing ⚠️
RFS/src/main/java/com/rfs/RunRfsWorker.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #763      +/-   ##
============================================
- Coverage     68.50%   68.41%   -0.09%     
+ Complexity     1583     1579       -4     
============================================
  Files           270      273       +3     
  Lines         11175    11361     +186     
  Branches        736      734       -2     
============================================
+ Hits           7655     7773     +118     
- Misses         3118     3187      +69     
+ Partials        402      401       -1     
Flag Coverage Δ
gradle-test 61.31% <0.00%> (-0.36%) ⬇️
python-test 88.53% <ø> (-0.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if (indexAllowlist.isEmpty()) {
accepted = !index.getName().startsWith(".");
} else {
accepted = indexAllowlist.contains(index.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a backlog item for this to support regex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

};
repoDataProvider.getIndicesInSnapshot(snapshotName).stream()
.filter(FilterScheme.filterIndicesByAllowList(indexAllowlist, logger))
.peek(index -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should be able to do a .forEach instead of the peek and count

};
repoDataProvider.getIndicesInSnapshot(snapshotName).stream()
.filter(FilterScheme.filterIndicesByAllowList(indexAllowlist, logger))
.peek(index -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above with foreach

if (indexAllowlist.isEmpty()) {
accepted = !index.getName().startsWith(".");
} else {
accepted = indexAllowlist.contains(index.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

() -> log.info("Index " + index.getName() + " already existed; no work required")
);
})
.count(); // Force the stream to execute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach() will force the stream to run. I think that this will generate a linting error as it is since the output of count() is discarded.

Signed-off-by: Chris Helma <25470211+chelma@users.noreply.github.com>
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add test coverage for this filtering logic.

@chelma chelma merged commit 9b3c190 into opensearch-project:main Jun 25, 2024
10 of 13 checks passed
@chelma chelma deleted the index-filtering branch June 25, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants