-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk import JSON FHIR resources #187
Conversation
Since we have referential integrity enabled, there is a high chance of getting a lot of failures with the data the way it currently is. Some possible solutions include : Option 1: Require the data to be separated based on resourceType Option 2: Allow creation of placeholder resources Option 3: Sort the resources in the dataset |
Is option 3 the same as building a dependency graph (I think this could be many graphs) and importing based on the order (from leaves to roots) of that graph? @ndegwamartin do you have a link to the Android FHIR SDK in progress code that orders resource by dependency graph? If that's straightforward maybe we could lift and reimplement that |
@ndegwamartin please confirm if this is what @pld is talking about |
Looked a bit at that PR and it looks like it is building a mapping that is then used to pull the resources from the database? I think the worst part about this is probably the fact that we might need to load the whole file in order to get the references/info we need to build the mapping. Or maybe iterate over the file twice? The first time getting the info and the second time actually building the chunks. Considering the size estimate (8k Patient resources each having 5 or more related resources) we definitely have to think more on how this can be optimized best. Looking more into the implementation to see how much we can borrow |
Concurred on both the above i.e. the logic behind the SDK's implementation could be (partially) lifted or borrowed from but also that the performance for our use case may be more negatively impacted than in the SDK since we are dealing with raw files while the SDK working with an (indexed) DB. We could also look at this from a different perspective. Migration is meant to be a one off (or sporadic) process and done especially at the beginning of any server deployment's timeline. For the issue on referential integrity we could disable the server validation for the migration process then re-enable it after. If the tool guarantees processing of all resources, any references that were present before the migration will still be present after the migration. Server validation should mainly be in place for real time data creation e.g. by users. This approach guarantees the mirroring of the data's state across systems which IMO is the objective of a migration. |
After a chat with @pld yesterday we agreed on a first pass to simply sort the resources in the file based on resourceTypes and post those payloads one after the other To test the current implementation run
The "chunk_size" is used when reading from the file If you pass "--sync sort" this will sort your resources into the different resourceTypes The default sync is "direct" which does not sort, just tries to push the data as is. Faster sync, useful if you don't have differential integrity turned on Will pull most of this comments into the docs |
@Rkareko if you are able to test this with your dataset and provide feedback, that would be awesome |
@Wambere is it possible to display the progress of processed resources ? |
IMPORTANT: Where possible all PRs must be linked to a Github issue
Fixes #184
Engineer Checklist
./gradlew spotlessApply
to check my code follows the project's style guide