-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Console] Backfill -- Add interface and YAML specification #728
[Console] Backfill -- Add interface and YAML specification #728
Conversation
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #728 +/- ##
============================================
+ Coverage 63.10% 64.78% +1.67%
- Complexity 1585 1586 +1
============================================
Files 226 238 +12
Lines 9219 9879 +660
Branches 771 771
============================================
+ Hits 5818 6400 +582
- Misses 2991 3069 +78
Partials 410 410
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
if not v.validate({"backfill": self.config}): | ||
raise ValueError("Invalid config file for backfill", v.errors) | ||
|
||
def create(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, and obviously this will all evolve, but it would probably be helpful to have a high level description of what each of these commands are intended to do embedded somewhere in here. The process of writing those explanations/comments will force understanding them well enough to express to others what possible implementations could look like, which will then feed back into the interface spec. Maybe you have that elsewhere already. That said - there is not much information to be gleaned from the interface spec in terms of what would be an acceptable implementation and why it would be acceptable. As a result, users of objects that implement Backfill will necessarily need to take on more responsibility because the boundary of the black box around the interface is necessarily smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. Right now these descriptions pretty much just exist in my head. I'll start adding them as I go through.
def get_status(self): | ||
raise NotImplementedError | ||
|
||
def scale(self, units: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm on board with this expression of capacity; an integer is inherently flexible here.
def __init__(self, config: Dict, target_cluster: Cluster) -> None: | ||
super().__init__(config) | ||
self.target_cluster = target_cluster | ||
self.docker_config = self.config["reindex_from_snapshot"]["docker"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though we can use cerberus to enforce that the model only has a Docker or ECS section, it feels like we're better off having a factory or something to return the correct Backfill implementation based on the input model. That would hypothetically enforce cleaner abstractions because the caller requesting an RFSBackfill
would not know which implementing class it was receiving.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now we're actually doing that elsewhere.
"snapshot_repo": {"type": "string", "required": False}, | ||
"scale": {"type": "integer", "required": False, "min": 1} | ||
}, | ||
"check_with": contains_one_of({'docker', 'ecs'}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handy
} | ||
|
||
|
||
class Backfill: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we should be using ABCs to enforce contract compliance if we're using inheritance in this way. Is there a reason you have in mind not to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right. No good reason, mostly just hadn't gotten to it yet.
with requests_mock.Mocker() as rm: | ||
rm.get(f"{env.source_cluster.endpoint}/_cat/indices", | ||
status_code=200, | ||
text=source_cat_indices) | ||
rm.get(f"{env.target_cluster.endpoint}/_cat/indices", | ||
status_code=200, | ||
text=target_cat_indices) | ||
result = runner.invoke(cli, ['--config-file', str(VALID_SERVICES_YAML), 'clusters', 'cat-indices'], | ||
catch_exceptions=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learned something new here - thanks!
def get_backfill(config: Dict, source_cluster: Optional[Cluster], target_cluster: Optional[Cluster]) -> Backfill: | ||
if BackfillType.opensearch_ingestion.name in config: | ||
if source_cluster is None: | ||
raise ValueError("source_cluster must be provided for OpenSearch Ingestion backfill") | ||
if target_cluster is None: | ||
raise ValueError("target_cluster must be provided for OpenSearch Ingestion backfill") | ||
logger.debug("Creating OpenSearch Ingestion backfill instance") | ||
return OpenSearchIngestionBackfill(config=config, | ||
source_cluster=source_cluster, | ||
target_cluster=target_cluster) | ||
elif BackfillType.reindex_from_snapshot.name in config: | ||
if target_cluster is None: | ||
raise ValueError("target_cluster must be provided for RFS backfill") | ||
|
||
if 'docker' in config[BackfillType.reindex_from_snapshot.name]: | ||
logger.debug("Creating Docker RFS backfill instance") | ||
return DockerRFSBackfill(config=config, | ||
target_cluster=target_cluster) | ||
elif 'ecs' in config[BackfillType.reindex_from_snapshot.name]: | ||
logger.debug("Creating ECS RFS backfill instance") | ||
return ECSRFSBackfill(config=config, | ||
target_cluster=target_cluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I see this, I'm even more inclined to use ABCs to enforce the interface contract we're currently implying, but not guaranteeing, exists.
if as_json: | ||
return json.dumps(response) | ||
return yaml.safe_dump(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, just curious - why do we need both, and when should I use either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in preparation for the web API, which will likely want jsons for almost everything. So the idea is to eventually have a json version and optionally a non-json version for every call.
raise UnsupportedBackfillTypeError(next(iter(config.keys()))) | ||
|
||
|
||
def describe(backfill: Backfill, as_json=False) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about this paradigm - why wrap the Backfill in this way? Having the additional layer here could be surprising to folks and I'm interested to hear what, if anything, would go wrong if someone came in and used the object directly?
As an alternative, why not have the return-type modularity exist in something like an OutputType enum or an Outputter object that the Backfill object contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, why the logic layer?
The idea is two-fold:
- The CLI and API are intended to be very thin clients, potentially generated automatically from a spec. This logic/middleware layer will be the layer that those clients interact with, so all they have to know is what functions are available from the library. Potentially, these files can even be used as the spec to generate the clients.
- The CLI and API should never have to know exactly what underlying backend adapters they're dealing with. But in some cases (e.g. CLI params that only apply to one type) someone in between the CLI and the class itself needs to know what's happening under the hood. That's what can happen in this layer.
Someone who's building on top of the library directly is welcome to use the underlying classes, but then they're taking on responsibility for that.
Very open to feedback on a better way to do it.
For the return type specifically, I like the idea of standardizing it more--I'm not sure I quite understand what you mean by "an Outputter object that the Backfill object contains"
|
||
|
||
def create(backfill: Backfill, *args, **kwargs) -> str: | ||
if isinstance(backfill, RFSBackfill): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we're doing runtime type checking, which is sometimes an indication that encapsulation is wonky. Why not have this behavior owned by the RFSBackfill object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i'm not crazy about this piece. I think you're right that this is a good case to have the class itself handle it.
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
We discussed in depth, and had the following takeaways:
My personal suspicion is that longer term we'll end up ditching the ABCs and leaning into the divergent behavior of the different implementation types directly, but that's definitely not for certain and I'm curious to see how things evolve. |
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, handful of outstanding comments
TrafficCapture/dockerSolution/src/main/docker/migrationConsole/lib/console_link/README.md
Show resolved
Hide resolved
TrafficCapture/dockerSolution/src/main/docker/migrationConsole/lib/console_link/README.md
Show resolved
Hide resolved
...Capture/dockerSolution/src/main/docker/migrationConsole/lib/console_link/console_link/cli.py
Show resolved
Hide resolved
...dockerSolution/src/main/docker/migrationConsole/lib/console_link/console_link/environment.py
Show resolved
Hide resolved
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some non-blocking comments, but overall looks good. Thanks Mikayla!
self.env = Environment(config_file) | ||
try: | ||
self.env = Environment(config_file) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but - should we be more specific here? E.g. it isn't necessarily a user input issue if they don't have access to the file, for example.
def stop_backfill_cmd(ctx, pipeline_name): | ||
exitcode, message = logic_backfill.stop(ctx.env.backfill, pipeline_name=pipeline_name) | ||
if exitcode != 0: | ||
raise click.ClickException(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, just curious - why raise ClickException rather than anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It exits with a non-zero exit code and also displays the error "nicely" --not with a full traceback, etc. I think subclassing that exception to create custom ones could be a good way to go in the future to get more specific errors while keeping the convenience features.
Docs are here: https://click.palletsprojects.com/en/8.1.x/exceptions/#exception-handling
raise ValueError("Invalid config file for backfill", v.errors) | ||
|
||
@abstractmethod | ||
def create(self, *args, **kwargs) -> CommandResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not gonna block on it, but if we have a create, we should have a teardown too.
Description
This sets the groundwork for implementing backfill backend adapters for RFS by establishing the base class/interface and logic layer.
The OSI Backfill implementation already exists, but this PR does some refactoring around it. It creates--but does not implement--the RFS Backfill backends (one for docker and one for ECS).
I also laid out a starting point for the backfill section of the services.yaml (this is very much not stable and is highly subject to change as we figure out what we need) and enforce it with Cerberus schemas.
Issues Resolved
MIGRATIONS-1776 (fully completed) & MIGRATIONS-1777 (partially completed -- the remaining component is to populate the backfill info from CDK for a cloud-deployed migration console)
Testing
Unit tests added, manually tested in docker setup.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.