-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port to PostgreSQL #1085
Port to PostgreSQL #1085
Conversation
177d522
to
e258cd3
Compare
38aab29
to
d390f76
Compare
e7bbce4
to
88bf1aa
Compare
The Content Store database is being migrated from MongoDB to PostgreSQL. See alphagov/content-store#1085. Nightly backups of the postgres database are now available in S3. By copying them to Google Cloud Platform, we make it possible to adapt the GOV.UK Knowledge Graph to use them.
The Content Store database is being migrated from MongoDB to PostgreSQL. See alphagov/content-store#1085. Nightly backups of the postgres database are now available in S3. By copying them to Google Cloud Platform, we make it possible to adapt the GOV.UK Knowledge Graph to use them.
The Content Store database is being migrated from MongoDB to PostgreSQL. See alphagov/content-store#1085. Nightly backups of the postgres database are now available in S3. By copying them to Google Cloud Platform, we make it possible to adapt the GOV.UK Knowledge Graph to use them.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
The content store is being migrated from MongoDB to Postgres. See alphagov/content-store#1085. This is a first attempt to adapt to using the Postgres version. 1. Restore the backup of the Postgres database. 2. Export the `content_items` table as lines of JSON. 3. Import the JSON into MongoDB. 4. Query as before. - Easy to develop, similar to existing steps in the data pipeline - Avoids translating the MongoDB queries into Postgres ones - Not in the spirit of GOV.UK's policy to stop using MongoDB - Extends the data pipeline in both time and complexity - Misses the opportunity to improve the whole pipeline, such as by using the Publishing API database for everything, instead of using the Content Store for some things.
This will allow us to cross-reference PostgreSQL records to MongoDB records post-migration if needed
Support import of doubly-nested mongo date fields Add field mappings for ScheduledPublishingLogEntry and PublishIntent Add mongo_id field to user & scheduled_publishing_log_entry Add `rails_timestamp` method to remove conflicts with ActiveRecord behaviour when doing .insert with some-but-not-all values given Add support for batch_size in JsonImporter
This will allow us to perform side-by-side performance comparisons of the Mongo and Postgres content-stores on the same hardware (e.g. local dev laptop) and prove that the PostgreSQL content-store is at least as performant as the Mongo version.
This improves response times to around 30% of previous values
These will no longer be needed after migration to PostgreSQL.
Some records in the MongoDB have nil values in `created_at` or `updated_at`. ActiveRecord's `timestamps` migration method by default creates these fields without allowing nil values, so we must explicitly add support for this after the fact.
Some records in the old MongoDb have `description` as a simple value, some have it as a Hash. We need to support both, and make sure that we only wrap the given value in a Hash if it's not already like that.
This fixes a bug where unpublished redirects in short URL manager aren't removed from the content store (so continue to work on the website). Users of the content-store API (i.e. publishing-api) might make API calls with values they want to reset provided as `nil`. For example, if you wanted to clear some redirects on a content item, you might do something like: ``` PUT /content/some-made-up-url { ... "redirects": nil ... } ``` The intent of the user of the API is clear here - they want no redirects. However, ContentItem has a default value for redirects: ``` field :redirects, type: Array, default: [] ``` And the rest of the content-store expects this value to be an Array, not to be nil. By passing in potentially nil values in assign_attributes we allow a situation where fields that content-store expects not to be nil (because they have defaults), can be nil. This tends to result in NoMethodErrors, such as this one: ``` NoMethodError undefined method `map' for nil:NilClass redirects = item.redirects.map(&:to_h).map(&:deep_symbolize_keys) ^^^^ /app/app/models/route_set.rb:30:in `from_content_item' /app/app/models/content_item.rb:215:in `route_set' /app/app/models/content_item.rb:225:in `should_register_routes?' /app/app/models/content_item.rb:193:in `register_routes' /app/app/models/content_item.rb:33:in `create_or_replace' /app/app/controllers/content_items_controller.rb:32:in `block in update' ``` I can't think of any valid reason for overriding default attributes with nils, so it feels like calling .compact is the right thing to do here.
The `.any?` method, when called on a Relation, seems to instantiate the objects in the resultset, which is very slow on Kubernetes. It worked fine in Mongoid, but not in ActiveRecord. If we replace this with `.count.positive?`, it's much faster.
Whitehall doesn't do much validation of a given scheduled publishing time. As a result, it can sometimes send us really extreme values for `scheduled_publishing_delay_seconds` (e.g. 400 years into the future). This can cause problems in the importer when Mongo has accepted the value, but PostgreSQL can't. Changing the field type to `bigint` fixes the issue.
...it will fail due to the read-only filesystem in prod. Also, some whitespace-only corrections to the migration
It turns out that you can't call `LogStasher.add_custom_fields` twice. Because Content Store's custom fields config was being run after the govuk_app_config gem's own custom fields config, the gem's config was being overwritten. So, fields like `govuk_request_id` and `varnish_id` weren't appearing in Content Store's controller request logs (but `govuk_dependency_resolution_source_content_id` was). The gem (version 9.7.0) now provides a mechanism for setting custom fields that doesn't overwrite the gem's own settings https://docs.publishing.service.gov.uk/repos/govuk_app_config.html#logger-configuration: ``` GovukJsonLogging.configure do add_custom_fields do |fields| fields[:govuk_custom_field] = request.headers["GOVUK-Custom-Header"] end end ```
An earlier commit on main (#d5422b46 in PR #1136) fixed a subtle issue when overriding default values with nil, by explicitly setting `.created_at` and other attributes from the existing item when it was being replaced. This caused issues on this PostgreSQL branch after rebasing, as ActiveRecord behaves more as expected with respect to `created_at` and therefore the line creating a local `created_at` variable had been removed. This commit reintroduces that variable, and tests now pass again.
32b571a
to
7919408
Compare
For the record, I have just force-pushed the Just in case something were to go wrong with the merge into main, I have tagged the previous head commit of this branch as |
Port content-store to run on RDS PostgreSQL, rather than MongoDB.
As of Monday 18th December 2023, all
content-store
anddraft-content-store
applications in all environments are running this branch (not main), via thecontent-store
container.As of Tuesday 2nd Jan at 13:45, the commit history in this branch is now cleaned up, rebased, and ready to be merged into main (force-pushed PR #1199 onto this branch).
This application is owned by the publishing platform team. Please let us know in #govuk-publishing-platform when you raise any PRs.
Follow these steps if you are doing a Rails upgrade.