RFS now uses reactor-netty for bulk indexing #607

chelma · 2024-04-23T21:41:46Z

Description

Updated RFS to use the reactor-netty library to perform asynchronous HTTP operations
Updated RFS to perform indexing against the target cluster w/ bulk operations
Added logging to capture when there's a wacky Lucene document we can't parse correctly (such as not having an _id)

Issues Resolved

Testing

Added unit tests for the changes to the ConnectionDetails class
Manually tested the changes to the reindexing behavior. Example output below:

16:31:59.167 INFO  Blob files unpacked successfully
16:31:59.167 INFO  ==================================================================
16:31:59.167 INFO  Reindexing the documents...
16:31:59.171 INFO  === Index Id: logs-241998, Shard ID: 0 ===
16:31:59.586 INFO  199 documents found in the current Lucene index
16:31:59.627 INFO  199 documents in current bulk request
16:31:59.999 INFO  Shard reindexing completed
16:31:59.999 INFO  === Index Id: logs-241998, Shard ID: 1 ===
16:32:00.008 INFO  203 documents found in the current Lucene index
16:32:00.028 INFO  203 documents in current bulk request
16:32:00.030 INFO  Shard reindexing completed
16:32:00.030 INFO  === Index Id: logs-241998, Shard ID: 2 ===
16:32:00.037 INFO  201 documents found in the current Lucene index
16:32:00.048 INFO  201 documents in current bulk request
16:32:00.049 INFO  Shard reindexing completed
16:32:00.049 INFO  === Index Id: logs-241998, Shard ID: 3 ===
16:32:00.057 INFO  201 documents found in the current Lucene index
16:32:00.062 INFO  201 documents in current bulk request
16:32:00.063 INFO  Shard reindexing completed
16:32:00.063 INFO  === Index Id: logs-241998, Shard ID: 4 ===
16:32:00.070 INFO  196 documents found in the current Lucene index
16:32:00.076 INFO  196 documents in current bulk request
16:32:00.078 INFO  Shard reindexing completed
16:32:00.079 INFO  === Index Id: logs-191998, Shard ID: 0 ===
16:32:00.082 INFO  206 documents found in the current Lucene index
16:32:00.090 INFO  206 documents in current bulk request
16:32:00.091 INFO  Shard reindexing completed
16:32:00.092 INFO  === Index Id: logs-191998, Shard ID: 1 ===
16:32:00.101 INFO  204 documents found in the current Lucene index
16:32:00.117 INFO  204 documents in current bulk request
16:32:00.118 INFO  Shard reindexing completed
16:32:00.118 INFO  === Index Id: logs-191998, Shard ID: 2 ===
16:32:00.124 INFO  192 documents found in the current Lucene index
16:32:00.129 INFO  192 documents in current bulk request
16:32:00.131 INFO  Shard reindexing completed
16:32:00.131 INFO  === Index Id: logs-191998, Shard ID: 3 ===
16:32:00.135 INFO  196 documents found in the current Lucene index
16:32:00.141 INFO  196 documents in current bulk request
16:32:00.142 INFO  Shard reindexing completed
16:32:00.143 INFO  === Index Id: logs-191998, Shard ID: 4 ===
16:32:00.148 INFO  202 documents found in the current Lucene index
16:32:00.156 INFO  202 documents in current bulk request
16:32:00.157 INFO  Shard reindexing completed
16:32:00.158 INFO  === Index Id: logs-221998, Shard ID: 0 ===
16:32:00.177 INFO  190 documents found in the current Lucene index
16:32:00.184 INFO  190 documents in current bulk request
16:32:00.185 INFO  Shard reindexing completed
16:32:00.185 INFO  === Index Id: logs-221998, Shard ID: 1 ===
16:32:00.218 INFO  207 documents found in the current Lucene index
16:32:00.235 INFO  207 documents in current bulk request
16:32:00.236 INFO  Shard reindexing completed
16:32:00.237 INFO  === Index Id: logs-221998, Shard ID: 2 ===
16:32:00.255 INFO  188 documents found in the current Lucene index
16:32:00.281 INFO  188 documents in current bulk request
16:32:00.283 INFO  Shard reindexing completed
16:32:00.283 INFO  === Index Id: logs-221998, Shard ID: 3 ===
16:32:00.373 INFO  215 documents found in the current Lucene index
16:32:00.483 INFO  215 documents in current bulk request
16:32:00.484 INFO  Shard reindexing completed
16:32:00.484 INFO  === Index Id: logs-221998, Shard ID: 4 ===
16:32:00.535 INFO  200 documents found in the current Lucene index
16:32:00.553 INFO  200 documents in current bulk request
16:32:00.554 INFO  Shard reindexing completed
16:32:00.554 INFO  === Index Id: logs-231998, Shard ID: 0 ===
16:32:00.587 INFO  208 documents found in the current Lucene index
16:32:00.594 INFO  208 documents in current bulk request
16:32:00.595 INFO  Shard reindexing completed
16:32:00.595 INFO  === Index Id: logs-231998, Shard ID: 1 ===
16:32:00.603 INFO  192 documents found in the current Lucene index
16:32:00.610 INFO  192 documents in current bulk request
16:32:00.611 INFO  Shard reindexing completed
16:32:00.611 INFO  === Index Id: logs-231998, Shard ID: 2 ===
16:32:00.614 INFO  190 documents found in the current Lucene index
16:32:00.629 INFO  190 documents in current bulk request
16:32:00.630 INFO  Shard reindexing completed
16:32:00.630 INFO  === Index Id: logs-231998, Shard ID: 3 ===
16:32:00.637 INFO  224 documents found in the current Lucene index
16:32:00.655 INFO  224 documents in current bulk request
16:32:00.656 INFO  Shard reindexing completed
16:32:00.656 INFO  === Index Id: logs-231998, Shard ID: 4 ===
16:32:00.662 INFO  186 documents found in the current Lucene index
16:32:00.665 INFO  186 documents in current bulk request
16:32:00.666 INFO  Shard reindexing completed
16:32:00.666 INFO  === Index Id: reindexed-logs, Shard ID: 0 ===
16:32:00.669 INFO  0 documents found in the current Lucene index
16:32:00.672 INFO  Shard reindexing completed
16:32:00.672 INFO  === Index Id: reindexed-logs, Shard ID: 1 ===
16:32:00.676 INFO  0 documents found in the current Lucene index
16:32:00.676 INFO  Shard reindexing completed
16:32:00.676 INFO  === Index Id: reindexed-logs, Shard ID: 2 ===
16:32:00.679 INFO  0 documents found in the current Lucene index
16:32:00.680 INFO  Shard reindexing completed
16:32:00.680 INFO  === Index Id: reindexed-logs, Shard ID: 3 ===
16:32:00.682 INFO  0 documents found in the current Lucene index
16:32:00.682 INFO  Shard reindexing completed
16:32:00.682 INFO  === Index Id: reindexed-logs, Shard ID: 4 ===
16:32:00.685 INFO  0 documents found in the current Lucene index
16:32:00.685 INFO  Shard reindexing completed
16:32:00.685 INFO  === Index Id: logs-201998, Shard ID: 0 ===
16:32:00.715 INFO  222 documents found in the current Lucene index
16:32:00.723 INFO  222 documents in current bulk request
16:32:00.723 INFO  Shard reindexing completed
16:32:00.724 INFO  === Index Id: logs-201998, Shard ID: 1 ===
16:32:00.760 INFO  193 documents found in the current Lucene index
16:32:00.765 INFO  193 documents in current bulk request
16:32:00.765 INFO  Shard reindexing completed
16:32:00.766 INFO  === Index Id: logs-201998, Shard ID: 2 ===
16:32:00.783 INFO  188 documents found in the current Lucene index
16:32:00.786 INFO  188 documents in current bulk request
16:32:00.787 INFO  Shard reindexing completed
16:32:00.787 INFO  === Index Id: logs-201998, Shard ID: 3 ===
16:32:00.800 INFO  191 documents found in the current Lucene index
16:32:00.811 INFO  191 documents in current bulk request
16:32:00.812 INFO  Shard reindexing completed
16:32:00.812 INFO  === Index Id: logs-201998, Shard ID: 4 ===
16:32:00.831 INFO  206 documents found in the current Lucene index
16:32:00.841 INFO  206 documents in current bulk request
16:32:00.843 INFO  Shard reindexing completed
16:32:00.843 INFO  === Index Id: sonested, Shard ID: 0 ===
16:32:00.854 INFO  2977 documents found in the current Lucene index
16:32:00.856 ERROR Unable to parse Document id from Document.  The Document's Fields: 
16:32:00.877 INFO  Shard reindexing completed
16:32:00.877 INFO  === Index Id: nyc_taxis, Shard ID: 0 ===
16:32:00.882 INFO  1000 documents found in the current Lucene index
16:32:01.179 INFO  1000 documents in current bulk request
16:32:01.181 INFO  Shard reindexing completed
16:32:01.181 INFO  === Index Id: logs-211998, Shard ID: 0 ===
16:32:01.187 INFO  206 documents found in the current Lucene index
16:32:01.190 INFO  206 documents in current bulk request
16:32:01.191 INFO  Shard reindexing completed
16:32:01.191 INFO  === Index Id: logs-211998, Shard ID: 1 ===
16:32:01.199 INFO  189 documents found in the current Lucene index
16:32:01.201 INFO  189 documents in current bulk request
16:32:01.202 INFO  Shard reindexing completed
16:32:01.202 INFO  === Index Id: logs-211998, Shard ID: 2 ===
16:32:01.211 INFO  190 documents found in the current Lucene index
16:32:01.213 INFO  190 documents in current bulk request
16:32:01.214 INFO  Shard reindexing completed
16:32:01.214 INFO  === Index Id: logs-211998, Shard ID: 3 ===
16:32:01.230 INFO  223 documents found in the current Lucene index
16:32:01.233 INFO  223 documents in current bulk request
16:32:01.233 INFO  Shard reindexing completed
16:32:01.234 INFO  === Index Id: logs-211998, Shard ID: 4 ===
16:32:01.271 INFO  192 documents found in the current Lucene index
16:32:01.279 INFO  192 documents in current bulk request
16:32:01.280 INFO  Shard reindexing completed
16:32:01.280 INFO  === Index Id: logs-181998, Shard ID: 0 ===
16:32:01.313 INFO  214 documents found in the current Lucene index
16:32:01.317 INFO  214 documents in current bulk request
16:32:01.318 INFO  Shard reindexing completed
16:32:01.318 INFO  === Index Id: logs-181998, Shard ID: 1 ===
16:32:01.331 INFO  192 documents found in the current Lucene index
16:32:01.338 INFO  192 documents in current bulk request
16:32:01.339 INFO  Shard reindexing completed
16:32:01.340 INFO  === Index Id: logs-181998, Shard ID: 2 ===
16:32:01.364 INFO  183 documents found in the current Lucene index
16:32:01.369 INFO  183 documents in current bulk request
16:32:01.370 INFO  Shard reindexing completed
16:32:01.370 INFO  === Index Id: logs-181998, Shard ID: 3 ===
16:32:01.384 INFO  193 documents found in the current Lucene index
16:32:01.388 INFO  193 documents in current bulk request
16:32:01.388 INFO  Shard reindexing completed
16:32:01.388 INFO  === Index Id: logs-181998, Shard ID: 4 ===
16:32:01.405 INFO  218 documents found in the current Lucene index
16:32:01.410 INFO  218 documents in current bulk request
16:32:01.411 INFO  Shard reindexing completed
16:32:01.411 INFO  === Index Id: geonames, Shard ID: 0 ===
16:32:01.438 INFO  206 documents found in the current Lucene index
16:32:01.453 INFO  206 documents in current bulk request
16:32:01.464 INFO  Shard reindexing completed
16:32:01.464 INFO  === Index Id: geonames, Shard ID: 1 ===
16:32:01.473 INFO  210 documents found in the current Lucene index
16:32:01.479 INFO  210 documents in current bulk request
16:32:01.495 INFO  Shard reindexing completed
16:32:01.495 INFO  === Index Id: geonames, Shard ID: 2 ===
16:32:01.506 INFO  201 documents found in the current Lucene index
16:32:01.515 INFO  201 documents in current bulk request
16:32:01.521 INFO  Shard reindexing completed
16:32:01.521 INFO  === Index Id: geonames, Shard ID: 3 ===
16:32:01.531 INFO  188 documents found in the current Lucene index
16:32:01.535 INFO  188 documents in current bulk request
16:32:01.543 INFO  Shard reindexing completed
16:32:01.543 INFO  === Index Id: geonames, Shard ID: 4 ===
16:32:01.550 INFO  195 documents found in the current Lucene index
16:32:01.555 INFO  195 documents in current bulk request
16:32:01.564 INFO  Shard reindexing completed

Check List

New functionality includes testing
- All tests pass, including unit test, integration test and doctest
New functionality has been documented
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chris Helma <chelma+github@amazon.com>

AndreKurait · 2024-04-23T21:58:56Z

RFS/src/main/java/com/rfs/common/ConnectionDetails.java

+            protocol = null;
+        } else {
+            // Parse the URL to get the protocol, host name, and port
+            String[] urlParts = url.split("://");


can we use java.net.URI for

if (url == null) { hostName = null; port = -1; protocol = null; } else { try { URI uri = new URI(url); hostName = uri.getHost(); port = uri.getPort(); protocol = uri.getScheme(); } catch (URISyntaxException e) { throw new IllegalArgumentException("Invalid URL format", e); } }

Oh, interesting - this makes sense. I was thinking about adding regex checking for the user inputs at beginning of too, but defense-in-depth is a good approach.

AndreKurait · 2024-04-23T22:07:51Z

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

+
+    public static void reindex(String indexName, Flux<Document> documentStream, ConnectionDetails targetConnection) throws Exception {
+        String targetUrl = "/" + indexName + "/_bulk";
+        HttpClient client = HttpClient.create()


Can we split the http client into a separate class which may get reused by different operations

I'd prefer not to at this point, actually. While I would historically agree with you, I'm trying out a new approach on this project and been really happy with how it has worked out. Specifically - avoiding being too speculative about abstractions and letting the needs of the project shape what gets created. In this case, we only have one thing that needs this reactor-netty client, and I honestly don't know what interface I would provide if I were to carve it out because I don't know how another potential part of the code might use it. Avoiding speculation on past abstractions in this project's history has been one of the key things that has enabled me to make so much progress so fast.

I don't know if you need a separate HttpClient interface yet, but I do think that it might help & in general, you'll want to look to the future and not think about the past.
From my view, you've got some leaky abstractions with a couple other needless Flux contaminations within your codebase.
Once you do that, it will become harder to test your code too (less tests help us write application code faster too). If you want to write test code fast too - keep it as generic as you can with the cleanest interfaces that you can strive for. Simpler pieces -> smoother integrations -> faster delivery of quality solutions.

AndreKurait · 2024-04-23T22:14:57Z

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

-        // Assemble the request details
-        String path = indexName + "/_doc/" + id;
-        String body = source;
+    private static String convertToBulkJson(List<String> bulkSections) {


nit: the function name...BulkJson was a bit confusing since it's just delimited jsons, maybe convertToBulkBody or convertToDelimitedJsons`

Sure, will do.

AndreKurait · 2024-04-23T22:16:55Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java

+                return null; // Skip documents with missing id
+            }
+            if (source_bytes == null || source_bytes.bytes.length == 0) {
+                logger.warn("Document " + id + " is deleted or doesn't have the _source field enabled");


Would this be better suited for info if this expected for deleted documents

I felt (and I guess still feel) that warn is probably the right level. It's something that we should highlight the occurence of without being an error, per se.

Which one is it - can you tell the difference? Could the reader of the log tell the difference? Is there something in the beginning of the log that would give the user a clue?

If _source wasn't enabled, could this flood the logs?

Is there any chance that docId could have PII in it? The docId could be customer generated, right? Or are they only internal ids that are separately mapped to the customer-given ones?

If they're customer driven, I'd push this to debug to promote the policy that no PII could be shown for INFO and above logs. This feels like it isn't a great spot to be in. I'm hoping that there's a way to show an identifier without risking divulging a customer value.

Which one is it - can you tell the difference?

I am not currently aware of how to tell the difference. We have a task to look into this more (see: https://opensearch.atlassian.net/browse/MIGRATIONS-1629)

Is there any chance that docId could have PII in it

The docId is an integer value used by Lucene to tell which Lucene Document in the Lucene Index is being referred to. The _id field of the Lucene Document is a user-set alphanumeric string, and so can contain whatever the user wants it to.

Regarding PII - that's a larger discussion for the team to have. I'll book a timeslot to discuss as a reminder.

Signed-off-by: Chris Helma <chelma+github@amazon.com>

codecov · 2024-04-24T13:49:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.93%. Comparing base (384401e) to head (060fdbe).
Report is 21 commits behind head on main.

❗ Current head 060fdbe differs from pull request most recent head bd87ce5. Consider uploading reports for the commit bd87ce5 to get more accurate results

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #607      +/-   ##
============================================
+ Coverage     75.91%   75.93%   +0.02%     
- Complexity     1491     1496       +5     
============================================
  Files           162      165       +3     
  Lines          6348     6362      +14     
  Branches        572      573       +1     
============================================
+ Hits           4819     4831      +12     
+ Misses         1152     1149       -3     
- Partials        377      382       +5

Flag	Coverage Δ
unittests	`75.93% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Chris Helma <chelma+github@amazon.com>

gregschohn · 2024-04-24T18:22:42Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java

+                return Flux.range(0, reader.maxDoc()) // Extract all the Documents in the IndexReader
+                .handle((i, sink) -> {
+                    Document doc = getDocument(reader, i);
+                    if (doc != null) { // Skip malformed docs


You should at least log when doc == null (or whatever malformed documents that you might be skipping).

We log that in getDocument()

gregschohn · 2024-04-24T18:24:30Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java

-                    continue;
+                    reader.close();
+                } catch (IOException e) {
+                    logger.error("Failed to close IndexReader", e);


This seems like it's probably a really bad exception. Why should the program keep running?
This seems like a spot where throw Lombok.sneakyThrow(e) would be a better option.

Good question; probably does make sense to kill the process at this point. I realized just now that Reactor was unhappy with the fact that a checked exception was being thrown but I totally could have thrown an unchecked exception here or something.

gregschohn · 2024-04-24T18:26:05Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java

+                return null; // Skip documents with missing id
+            }
+            if (source_bytes == null || source_bytes.bytes.length == 0) {
+                logger.warn("Document " + id + " is deleted or doesn't have the _source field enabled");


Which one is it - can you tell the difference? Could the reader of the log tell the difference? Is there something in the beginning of the log that would give the user a clue?

If _source wasn't enabled, could this flood the logs?

gregschohn · 2024-04-24T18:28:18Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java

+                return null; // Skip documents with missing id
+            }
+            if (source_bytes == null || source_bytes.bytes.length == 0) {
+                logger.warn("Document " + id + " is deleted or doesn't have the _source field enabled");


Is there any chance that docId could have PII in it? The docId could be customer generated, right? Or are they only internal ids that are separately mapped to the customer-given ones?

If they're customer driven, I'd push this to debug to promote the policy that no PII could be shown for INFO and above logs. This feels like it isn't a great spot to be in. I'm hoping that there's a way to show an identifier without risking divulging a customer value.

gregschohn · 2024-04-24T18:32:15Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java

+                StringBuilder errorMessage = new StringBuilder();
+                errorMessage.append("Unable to parse Document id from Document.  The Document's Fields: ");
+                document.getFields().forEach(f -> errorMessage.append(f.name()).append(", "));
+                logger.error(errorMessage.toString());


logger.atError().setCause(e).setMessage(()->...).log() will do two more things for you. 1) get the exception and its backtrace into the logs and 2) use the fluent style, where everything within '...' will only be evaluated when you're logging that level. It can make your log statements tighter (all one statement rather than 4 as they are here) and much more efficient since work can often be elided. Even if you stay at warn/error, I'd like to routinely filter the repo for usages of immediate logging because its performance hit can be the single greatest impact on a program.

Consider PII for ERROR. I think that it's fair, but you should call it out... maybe PII possible loggers should have their own logger name convention so that operators could easily mask them out if necessary.

logger.atError().setCause(e).setMessage(()->...).log()

Cool - will look into that for the future.

Consider PII for ERROR

I think we need to have a larger discussion around stuff like PII concerns, because I suspect they will impact many aspects of the implementation if we're designing to address them up front.

gregschohn · 2024-04-24T18:41:32Z

RFS/src/main/java/com/rfs/common/LuceneDocumentsReader.java


 public class LuceneDocumentsReader {
    private static final Logger logger = LogManager.getLogger(LuceneDocumentsReader.class);

-    public static List<Document> readDocuments(Path luceneFilesBasePath, String indexName, int shardId) throws Exception {
+    public Flux<Document> readDocuments(Path luceneFilesBasePath, String indexName, int shardId) {


Why does your LuceneDocumentReader now take a hard dependency on your HTTP client library?
It might be better to make this a collection or stream & then adapt later so that you can switch client implementations out.

Maybe, but it seems like this is how the Reactor framework wants to be used. I can see both the LuceneDocumentsReader and DocumentReindexer classes being implementation specific. So far it's paid off for me in this project not to speculate on stuff like this until there's a specific need.

gregschohn · 2024-04-24T18:42:50Z

RFS/src/main/java/com/rfs/common/DocumentReindexer.java



 public class DocumentReindexer {
    private static final Logger logger = LogManager.getLogger(DocumentReindexer.class);
+    private static final int MAX_BATCH_SIZE = 1000; // Arbitrarily chosen
+
+    public static void reindex(String indexName, Flux<Document> documentStream, ConnectionDetails targetConnection) throws Exception {


Question from above - why should this take a Flux in? What would be lost/what would the impact be if you took in a stream and adapted it within this method?

gregschohn · 2024-04-24T18:45:36Z

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

+
+    public static void reindex(String indexName, Flux<Document> documentStream, ConnectionDetails targetConnection) throws Exception {
+        String targetUrl = "/" + indexName + "/_bulk";
+        HttpClient client = HttpClient.create()


I don't know if you need a separate HttpClient interface yet, but I do think that it might help & in general, you'll want to look to the future and not think about the past.
From my view, you've got some leaky abstractions with a couple other needless Flux contaminations within your codebase.
Once you do that, it will become harder to test your code too (less tests help us write application code faster too). If you want to write test code fast too - keep it as generic as you can with the cleanest interfaces that you can strive for. Simpler pieces -> smoother integrations -> faster delivery of quality solutions.

gregschohn · 2024-04-24T18:51:05Z

RFS/src/main/java/com/rfs/common/RestClient.java

@@ -34,7 +34,7 @@ public RestClient(ConnectionDetails connectionDetails) {
    }

    public Response get(String path, boolean quietLogging) throws Exception {
-        String urlString = connectionDetails.host + "/" + path;
+        String urlString = connectionDetails.url + "/" + path;

        URL url = new URL(urlString);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();


If the plan is to deprecate this class, use the @deprecate annotation for it (before class RestClient) so that we know that the plan is to rally all of the code around one HTTP client solution. As it is, it's pretty confusing with 2 different clients within one codebase/PR.

I'm not sure whether we want to deprecate this class or not in the long run. I would assume so, but honestly the only place we really need to use the greater abilities of the reactor-netty client is for reindexing; this is fine elsewhere. For that reason, I left this in place for the time being.

gregschohn · 2024-04-24T18:54:52Z

RFS/src/test/java/com/rfs/common/LuceneDocumentsReaderTest.java

+        doc5.add(new StringField("_id", new BytesRef(encodeUtf8Id("id5")), Field.Store.YES));
+
+        // Set up our mock reader
+        IndexReader mockReader = mock(IndexReader.class);


Let's sync-up on mockito. I wonder if this could have been clearer and tighter without mockito.

* Checkpoint: improved ConnectionDetails; unit tested it Signed-off-by: Chris Helma <chelma+github@amazon.com> * RFS now uses reactor-netty and bulk indexing Signed-off-by: Chris Helma <chelma+github@amazon.com> * Fixes per PR; unit tested LuceneDocumentsReader Signed-off-by: Chris Helma <chelma+github@amazon.com> * Updated a unit test name Signed-off-by: Chris Helma <chelma+github@amazon.com> * Updated a method name per PR feedback Signed-off-by: Chris Helma <chelma+github@amazon.com> --------- Signed-off-by: Chris Helma <chelma+github@amazon.com>

chelma added 2 commits April 23, 2024 10:31

Checkpoint: improved ConnectionDetails; unit tested it

4541617

Signed-off-by: Chris Helma <chelma+github@amazon.com>

RFS now uses reactor-netty and bulk indexing

c690be2

Signed-off-by: Chris Helma <chelma+github@amazon.com>

chelma requested review from AndreKurait, gregschohn, kartg, lewijacn, mikaylathompson, okhasawn and sumobrian as code owners April 23, 2024 21:41

AndreKurait requested changes Apr 23, 2024

View reviewed changes

chelma added 2 commits April 24, 2024 08:27

Fixes per PR; unit tested LuceneDocumentsReader

482b9b8

Signed-off-by: Chris Helma <chelma+github@amazon.com>

Updated a unit test name

060fdbe

Signed-off-by: Chris Helma <chelma+github@amazon.com>

AndreKurait approved these changes Apr 24, 2024

View reviewed changes

Updated a method name per PR feedback

bd87ce5

Signed-off-by: Chris Helma <chelma+github@amazon.com>

chelma merged commit 98ee1fd into opensearch-project:main Apr 24, 2024
5 checks passed

chelma deleted the MIGRATIONS-1600-2 branch April 24, 2024 16:43

gregschohn reviewed Apr 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFS now uses reactor-netty for bulk indexing #607

RFS now uses reactor-netty for bulk indexing #607

chelma commented Apr 23, 2024 •

edited

Loading

AndreKurait Apr 23, 2024

chelma Apr 24, 2024

AndreKurait Apr 23, 2024

chelma Apr 24, 2024 •

edited

Loading

gregschohn Apr 24, 2024

AndreKurait Apr 23, 2024

chelma Apr 24, 2024

AndreKurait Apr 23, 2024

chelma Apr 24, 2024

gregschohn Apr 24, 2024

gregschohn Apr 24, 2024

chelma Apr 24, 2024

codecov bot commented Apr 24, 2024 •

edited

Loading

gregschohn Apr 24, 2024

chelma Apr 24, 2024

gregschohn Apr 24, 2024

chelma Apr 24, 2024

gregschohn Apr 24, 2024

gregschohn Apr 24, 2024

gregschohn Apr 24, 2024

gregschohn Apr 24, 2024

chelma Apr 24, 2024

gregschohn Apr 24, 2024

chelma Apr 24, 2024

gregschohn Apr 24, 2024

gregschohn Apr 24, 2024

gregschohn Apr 24, 2024

chelma Apr 24, 2024

gregschohn Apr 24, 2024

RFS now uses reactor-netty for bulk indexing #607

RFS now uses reactor-netty for bulk indexing #607

Conversation

chelma commented Apr 23, 2024 • edited Loading

Description

Issues Resolved

Testing

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chelma Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 24, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chelma commented Apr 23, 2024 •

edited

Loading

chelma Apr 24, 2024 •

edited

Loading

codecov bot commented Apr 24, 2024 •

edited

Loading