Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract transformation functions into migration-specific classes #1155

Conversation

eecavanna
Copy link
Collaborator

@eecavanna eecavanna commented Oct 3, 2023

Summary of changes

  1. I implemented the MigratorBase class, which is a migration-agnostic base class from which migration-specific child classes will inherit things
    • It contains an empty dictionary, accompanied by lots of documentation about that dictionary
    • That dictionary will be populated by child classes
    • That dictionary maps Mongo collection names to transformation functions
    • The class also contains a helper function that can be used to get things from that dictionary without knowing it's a dictionary
    • Finally, the class also contains the various helper functions that already existed in this module
  2. I implemented some production-ready migration-specific child classes
    • One for the v7.7.2 to v7.8.0 migration
    • One for v7.8.0 to v8.0.0 migration
  3. I implemented a tutorial/example migration-specific child class, named Migrator_from_A_B_C_to_X_Y_Z
    • Migration authors can use this as a reference
  4. I updated the click handler function to "follow the agenda" (i.e. invoke the specified transformation functions) of whatever migration-specific child class the function instantiates
    • It currently instantiates the Migrator_from_7_8_0_to_8_0_0 child class
    • We could make it choose a child class based on CLI arguments, but I have not done that here (maybe later)

@eecavanna
Copy link
Collaborator Author

I don't plan to update the click CLI part of the file until after @turbomam and I have discussed the existing changes.

@eecavanna
Copy link
Collaborator Author

Now that we have discussed the changes, I will update the click portion of the file to work with the newly-structured classes.

@eecavanna
Copy link
Collaborator Author

I tried setting up a local nmdc-schema development environment in order to test this PR locally, but failed to. I don't know what the procedure is for setting up such an environment. In a Debian container that has Python, pandoc, yq (mikefarah version), Apache Jena, and poetry installed; I ran $ make setup and that failed (details in this issue).

@eecavanna eecavanna added the help wanted Extra attention is needed label Oct 4, 2023
@eecavanna
Copy link
Collaborator Author

I added the "help wanted" tag because I want help with testing this branch: either (a) setting up a test environment on my local machine or (b) someone else doing the testing.

@eecavanna eecavanna removed the help wanted Extra attention is needed label Oct 6, 2023
# logger.info(f"Starting {tdk}-specific migrations")
# for current_study in tdv:
# migrator.migrate_uc_gold_study_identifiers_7_8_0_to_8_0_0(current_study)
end_dict[tdk] = migrator.apply_changes_recursively_by_key(tdv, set(migrateable_slots))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_changes_recursively_by_key looks at a root element and recurses through the object and coerces CURIEs so that they all have a prefix (the salvage prefix).

@eecavanna
Copy link
Collaborator Author

eecavanna commented Oct 17, 2023

Acceptance criteria (at least):

  • make-rdf is a make target that must pass (the LinkML validate subtarget must complete)
    • Build schema, download data, run migrations, validate output of migration
  • Other team members can easily contribute transformations

@eecavanna eecavanna marked this pull request as ready for review October 20, 2023 02:23
@eecavanna
Copy link
Collaborator Author

The example/tutorial "migration" class is named: Migrator_from_A_B_C_to_X_Y_Z

@eecavanna
Copy link
Collaborator Author

eecavanna commented Oct 20, 2023

Hi @turbomam, I am done making changes to this branch and am ready for it to be reviewed and, if appropriate, merged into main. When we looked at this branch together on Tuesday, it was at commit 602eab5. Since then, I did the following things:

  1. Merged the main branch into this one
  2. Simplified and documented the example Migrator_from_... class we wrote together on Tuesday
  3. Simplified the name of the "collections-to-transformers" dictionary (now named agenda)
  4. Defined some type aliases in an attempt to make a couple lines of code more concise

Here's a link to a diff that shows all of those changes (except the merge of main into this branch)
https://github.com/microbiomedata/nmdc-schema/compare/e669a93..514dbc7

@eecavanna
Copy link
Collaborator Author

In bullet point 4.2 of the description, I wrote:

We could make it choose a child class based on CLI arguments, but I have not done that here (maybe later)

Here's roughly what I have in mind for that (this would be in the main function, in place of the hard-coded, migration-specific migrator = Migrator_from_7_8_0_to_8_0_0() line):

##- Instead of this hard-coded, migration-specific line...
##- migrator = Migrator_from_7_8_0_to_8_0_0()
##- We could do this...

    initial_version = "<comes from CLI options>"
    target_version = "<comes from CLI options>"

    # Select a migrator class based upon the specified initial and target schema versions.
    migrator_registry = [
        {"class": Migrator_from_7_7_2_to_7_8_0, "from": "7.7.2", "to": "7.8.0"},
        {"class": Migrator_from_7_8_0_to_8_0_0, "from": "7.8.0", "to": "8.0.0"},
    ]
    migrator = None  # assume no compatible migrator class exists
    for item in migrator_registry:  # look for a compatible migrator class
        if item["from"] == initial_version and item["to"] == target_version:
            migrator = item["class"]()  # instantiate the compatible migrator class
            break

    if migrator is None:
        raise NotImplementedError("No migrator has been implemented for that version jump.")

I don't plan to include that in this branch because I don't want to complicate the branch (it would involve updating at least one command in a Makefile and adding two @click.options).

@eecavanna eecavanna added the enhancement New feature or request label Oct 20, 2023
Copy link
Member

@turbomam turbomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@eecavanna
Copy link
Collaborator Author

@turbomam , I just realized from typing import Dict, List, TypeAlias doesn't work in Python versions older than 3.10 (the TypeAlias wasn't introduced until Python 3.10). Is there a minimum Python version required for nmdc-schema? If it's 3.10, no problemo. If it's 3.9 (or undefined), I'll remove the type aliases with an additional commit.

Docs: https://docs.python.org/3/library/typing.html#typing.TypeAlias

@eecavanna
Copy link
Collaborator Author

I went ahead and removed the type aliases. I'm OK with this being merged in.

@turbomam turbomam merged commit 91d6e05 into main Oct 20, 2023
2 checks passed
@eecavanna eecavanna deleted the 1148-add-a-data-structure-that-specifies-a-migration-collections-to-be-migrated-source-schema-and-destination-schema branch October 20, 2023 19:14
@Shalsh23 Shalsh23 mentioned this pull request Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants