Metadata rds #201

mzappitello · 2023-12-12T15:04:53Z

Create a Metadata RDS for the project that can be used by multiple applications to query the status of files stored on S3

move postgres_schema into rpm and metadata schemas
create methods for connecting to both of them at the same time
update ingestion and performance manager to use both of them
add flow for copying old metadata over from rpm to metadata

Asana Task: https://app.asana.com/0/1189492770004753/1205798784639666/f

mzappitello · 2023-12-12T19:49:43Z

.env

new env variables used to distinguish what db connection params are for.

mzappitello · 2023-12-12T19:50:15Z

.github/workflows/ci_python.yaml

two dbs are needed by the performance manager tests.

mzappitello · 2023-12-12T19:51:01Z

docker-compose.yml

two dbs in local docker compose.

mzappitello · 2023-12-12T19:51:16Z

python_src/alembic.ini

+[metadata_prod]
+sqlalchemy.url = driver://user:pass@localhost/dbname
+script_location = src/lamp_py/migrations
+version_locations = src/lamp_py/migrations/versions/metadata_prod
+


add metadata db migrations.

mzappitello · 2023-12-12T19:51:37Z

python_src/pyproject.toml

@@ -12,7 +12,6 @@ authors = [
 ingestion = 'lamp_py.ingestion.pipeline:start'
 performance_manager = 'lamp_py.performance_manager.pipeline:start'
 seed_metadata = 'lamp_py.postgres.seed_metadata:run'
-snapshot = 'lamp_py.postgres.snapshot:run'


we don't use this so i removed it.

mzappitello · 2023-12-12T19:51:46Z

python_src/pyproject.toml

-ignored-classes = ['sqlalchemy.orm.session.sessionmaker','pyarrow.compute']
+ignored-classes = ['sqlalchemy.orm.session.sessionmaker', 'pyarrow.compute']


yaml linting

mzappitello · 2023-12-12T20:07:59Z

python_src/src/lamp_py/ingestion/pipeline.py

+    alembic_upgrade_to_head(db_name="metadata_prod")
+


migrations for the metadata db are owned by ingestion. this should make our deployment experience better.

This does complicate local development, since we don't have/don't really want to run a local ingestion image, standing up the local metadata rds would need a different configuration.

One option would be to allow the 'seed_metadata` image to do this, but it would require some re-working of that logic (probably some different env vars)

good note. lets discuss in person, but i think adding to seed_metadata makes sense.

mzappitello · 2023-12-12T20:09:56Z

python_src/src/lamp_py/ingestion/pipeline.py

+    # run metadata rds migrations
+    alembic_upgrade_to_head(db_name="metadata_prod")
+
+    move_metadata()


temp function to move all the metadata over from rail performance manager. we should only have to do this once.

I think we would want to do this in the migration file for the new metadata RDS, which essentially operates as a "temp" function.

my instinct was that it would be wrong to have a connection to the RPM database in the migration file as that migration could run when there is no longer a metadata table in that database.

mzappitello · 2023-12-12T20:17:13Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

initial migration to establish the metadata table in the database.

mzappitello · 2023-12-12T21:31:50Z

python_src/src/lamp_py/performance_manager/l0_gtfs_rt_events.py

@@ -425,7 +425,10 @@ def update_events_from_temp(db_manager: DatabaseManager) -> None:
    process_logger.log_complete()


-def process_gtfs_rt_files(db_manager: DatabaseManager) -> None:
+def process_gtfs_rt_files(


we need both dbs now. one to figure out what files we have to extract and one to do the compute for the tranfromations.

mzappitello · 2023-12-12T21:32:52Z

python_src/src/lamp_py/performance_manager/l0_gtfs_static_load.py

+def process_static_tables(
+    rpm_db_manager: DatabaseManager,
+    md_db_manager: DatabaseManager,
+) -> None:


we need the md db manager to figure out what the new feed info files are and the rpm db manager to load the new data into.

mzappitello · 2023-12-12T21:33:42Z

python_src/src/lamp_py/performance_manager/pipeline.py

+def legacy_metadata_exists(rpm_db_manager: DatabaseManager) -> bool:
+    """
+    are there any records in the legacy metadata table
+    """
+    metadata = rpm_db_manager.select_as_dataframe(
+        sa.select(LegacyMetadataLog.path)
+    )
+
+    return not metadata.empty
+
+


check to see if there is legacy metadata that hasn't been copied over yet.

mzappitello · 2023-12-12T21:34:23Z

python_src/src/lamp_py/performance_manager/pipeline.py

+            if legacy_metadata_exists(rpm_db_manager):
+                raise EnvironmentError("Legacy Metadata Detected")


raise an exception if there is legacy metadata. this will prevent us from running any of the performance manager logic, waiting for the next loop to give it a shot.

mzappitello · 2023-12-13T15:17:40Z

python_src/src/lamp_py/postgres/rail_performance_manager_schema.py

@@ -3,10 +3,10 @@
 import sqlalchemy as sa
 from sqlalchemy.ext.declarative import declarative_base

-SqlBase: Any = declarative_base()
+RpmSqlBase: Any = declarative_base()


new base, one for each database.

mzappitello · 2023-12-13T15:18:04Z

python_src/src/lamp_py/postgres/rail_performance_manager_schema.py

@@ -177,7 +177,7 @@ class TempEventCompare(SqlBase):  # pylint: disable=too-few-public-methods
    )


-class MetadataLog(SqlBase):  # pylint: disable=too-few-public-methods
+class LegacyMetadataLog(RpmSqlBase):  # pylint: disable=too-few-public-methods


rename this to avoid two tables with the same name.

mzappitello · 2023-12-13T15:18:15Z

python_src/src/lamp_py/postgres/snapshot.py

we weren't using this.

mzappitello · 2023-12-13T15:19:47Z

python_src/tests/performance_manager/test_performance_manager.py

no functional changes. just creating the db managers and passing them in correctly and resetting the correct tables.

rymarczy · 2023-12-20T16:36:55Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

+    op.drop_index(
+        "ix_metadata_log_not_processed",
+        table_name="metadata_log",
+        postgresql_where=sa.text("rail_pm_processed = false"),


I don't think this should be required when calling to drop the index, just the index name and table name should be sufficient.

rymarczy · 2023-12-21T13:00:36Z

python_src/src/lamp_py/postgres/rail_performance_manager_schema.py

-class MetadataLog(SqlBase):  # pylint: disable=too-few-public-methods
+class LegacyMetadataLog(RpmSqlBase):  # pylint: disable=too-few-public-methods


Should we just be dropping this table?

i think that depends on the deployment order.

we'll discuss in a call.

rymarczy · 2023-12-21T13:05:50Z

python_src/src/lamp_py/performance_manager/pipeline.py

+def legacy_metadata_exists(rpm_db_manager: DatabaseManager) -> bool:
+    """
+    are there any records in the legacy metadata table
+    """
+    metadata = rpm_db_manager.select_as_dataframe(
+        sa.select(LegacyMetadataLog.path)
+    )
+
+    return not metadata.empty


I don't think this is really required. If everything is the performance manager pipeline is now configured to read from the new MetadataLog table, whether or not the LegacyMetadataLog exists shouldn't matter.

i was worried about some deployment error on our part and wanted to protect against that. i had planned on taking this out on a subsequent PR where everything is complete.

rymarczy

I think we need to discuss the migration process in a little more detail along with the plan for local development with the MetadataLog table residing in a different DB instance.

Before deployment, we should also change the configuration of the metadata-rds terraform module. The current metadata log table on the dev instance is 34 MB in size, so something like an allocated_storage of 2 gibibytes and max_allocated_storage of 5 should be sufficient. The m5.large instance class is also probably overkill for a DB with one table, should drop that down to something like t3.small

rymarczy · 2023-12-21T13:24:26Z

docker-compose.yml

+
+  metadata_rds:
+    container_name: metadata_rds
+    image: postgres:14.4


The terraform module has postgres 15 set for metadata-rds need to sync this between the two.

mzappitello · 2023-12-28T17:42:17Z

python_src/src/lamp_py/ingestion/pipeline.py

@@ -88,8 +85,6 @@ def start() -> None:
    # run metadata rds migrations
    alembic_upgrade_to_head(db_name="metadata_prod")

-    move_metadata()


do this in migration now.

mzappitello · 2023-12-28T17:43:44Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

@@ -40,6 +45,52 @@ def upgrade() -> None:
        unique=False,
        postgresql_where=sa.text("rail_pm_processed = false"),
    )
+
+    # pull metadata from the rail performance manager database into the


i think this works how we would like. it uses a test based query to trya and select metadata from the rail performance manager and eats an exception iff its caused by an "undefined table" error, i.e. the metadata log table does not exist.

mzappitello · 2023-12-28T17:44:07Z

.../lamp_py/migrations/versions/performance_manager_staging/005_96187da84955_remove_metadata.py

@@ -0,0 +1,67 @@
+"""remove_metadata


new migration to remove metadata table from performance manager db.

mzappitello · 2023-12-28T17:44:23Z

python_src/src/lamp_py/performance_manager/pipeline.py

@@ -9,11 +9,8 @@
 import signal
 from typing import List

-import sqlalchemy as sa


reset this file back to how it was.

mzappitello · 2023-12-28T17:45:00Z

python_src/src/lamp_py/postgres/postgres_utils.py

-    METADATA = auto
+    METADATA = auto()
    RAIL_PERFORMANCE_MANAGER = auto()

-    def get_env_prefix(self) -> str:
+    @property
+    def env_prefix(self) -> str:
        """
        in the environment, all keys for this database have this prefix
        """
-        if self == self.RAIL_PERFORMANCE_MANAGER:
+        if self is DatabaseIndex.RAIL_PERFORMANCE_MANAGER:
            return "RPM"
-        if self == self.METADATA:
+        if self is DatabaseIndex.METADATA:
            return "MD"
        raise NotImplementedError("No environment prefix for index {self.name}")

    def get_args_from_env(self) -> PsqlArgs:
        """
        generate a sql argument instance for this ind
        """
-        prefix = self.get_env_prefix()
-        return PsqlArgs(prefix)
+        return PsqlArgs(self.env_prefix)


more correct way of doing this. (and fix the stupid auto typo i left)

mzappitello · 2023-12-28T17:45:32Z

python_src/src/lamp_py/postgres/postgres_utils.py

-    def add_metadata_paths(self, paths: List[str]) -> None:
-        """
-        add metadata filepaths to metadata table for testing
-        """
-        print(paths)
-        with self.session.begin() as session:
-            session.execute(
-                sa.insert(MetadataLog.__table__),
-                [{"path": p} for p in paths],
-            )


moved this out of db manager class since it only makes sense for one of the databases.

mzappitello · 2023-12-28T17:45:42Z

python_src/src/lamp_py/postgres/rail_performance_manager_schema.py

@@ -177,27 +177,6 @@ class TempEventCompare(RpmSqlBase):  # pylint: disable=too-few-public-methods
    )


-class LegacyMetadataLog(RpmSqlBase):  # pylint: disable=too-few-public-methods


rymarczy

Most changes related to getting the local environment up and running correctly.

rymarczy · 2023-12-29T10:59:01Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

Probably need one of these for metadata_staging also?

rymarczy · 2023-12-29T11:03:25Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

+        with rpm_db_manager.session.begin() as session:
+            result = session.execute(
+                "SELECT path, processed, process_fail FROM metadata_log"
+            )
+            for row in result:
+                (path, processed, process_fail) = row
+                insert_data.append(
+                    {
+                        "path": path,
+                        "rail_pm_processed": processed,
+                        "rail_pm_process_fail": process_fail,
+                    }
+                )


~~nitpick: the select_as_list method of DatabaseManager does the same thing as this block.~~

Ah, nevermind, I see you're renaming the dictionary values here.

rymarczy · 2023-12-29T11:04:34Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

+                psql_insert(MetadataLog.__table__)
+                .values(insert_data)
+                .on_conflict_do_nothing()
+            )


Do we want to do nothing on a conflict during this operation? It seems like a conflict would indicate some process failure.

i'm a little concerned in a case where this migration gets run out of sync with the migration on the production manager database and we find ourselves in a situation where we can't do anything to unblock prod since we can't manually adjust the databases. ingestion should only ever have flags marked as false, so i don't think we would be at risk of missing processing of incoming data.

happy to revert this though if you think its risky.

rymarczy · 2023-12-29T11:07:35Z

.../lamp_py/migrations/versions/performance_manager_staging/005_96187da84955_remove_metadata.py

+    op.create_table(
+        "static_service_id_lookup",
+        sa.Column("dummy_pk", sa.Integer(), nullable=False),
+        sa.Column("route_id", sa.String(length=60), nullable=True),
+        sa.Column("service_id", sa.String(length=60), nullable=True),
+        sa.Column("service_date", sa.Integer(), nullable=True),
+        sa.Column("static_version_key", sa.Integer(), nullable=True),
+        sa.PrimaryKeyConstraint("dummy_pk"),
+    )


any migration operations for the static_service_id_lookup VIEW should be removed.

rymarczy · 2023-12-29T11:12:49Z

.../lamp_py/migrations/versions/performance_manager_staging/005_96187da84955_remove_metadata.py

+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###


The logic you previously had, to check if records exist in the new metadata log table before starting the event loop, could be added into these migration files for performance manager.

But that shouldn't really be required, since the PM event loop will just query an empty table, until it's not empty.

Actually, we need to check if records exist in the new metadata log table before executing the drop table command, otherwise the table could be dropped before the records are transferred to the new metadata_log table.

rymarczy · 2023-12-29T11:49:44Z

docker-compose.yml

@@ -32,7 +45,7 @@ services:
    build:
      context: ./python_src
    depends_on:
-      - local_rds
+      - metadata_rds


seed_metadata container will need to depend on both db's since it invokes creation of both DatabaseManager objects

Suggested change

- metadata_rds

- metadata_rds

- rail_pm_rds

rymarczy · 2023-12-29T11:52:00Z

docker-compose.yml

    ports:
-      - "5432:5432"
+      - "${RPM_DB_PORT}:5432"


These alternate ports cause issues with local development, additional logic is required in postgres_utils.py when setting port variables (running inside/outside of docker/aws)

rymarczy · 2023-12-29T12:02:47Z

python_src/src/lamp_py/ingestion/pipeline.py

+    # run metadata rds migrations
+    alembic_upgrade_to_head(db_name="metadata_prod")


db_name should be pulled from env var here, and then the env var added to the validate_environment function

rymarczy · 2023-12-29T12:03:38Z

python_src/src/lamp_py/performance_manager/pipeline.py

@@ -102,7 +107,7 @@ def start() -> None:
        validate_db=True,
    )

-    # run rds migrations
+    # run rail performance manager rds migrations
    alembic_upgrade_to_head(db_name=os.getenv("ALEMBIC_DB_NAME"))


update to ALEMBIC_RPM_DB_NAME along with same input to validate_environment

validate_environment function also requires updates because of assumed validate_db environmental variables.

rymarczy · 2023-12-29T13:33:12Z

python_src/src/lamp_py/performance_manager/pipeline.py

+    md_db_manager = DatabaseManager(
+        db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER, verbose=args.verbose
+    )


Suggested change

md_db_manager = DatabaseManager(

db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER, verbose=args.verbose

)

md_db_manager = DatabaseManager(

db_index=DatabaseIndex.METADATA, verbose=args.verbose

)

We're going to have two RDS schemas to maintain, one for the rail performance manager and one for the metadata. Move the rail performance manager into its own file thats better named. Remove the snapshot postgres feature since we don't use it.

* create the schema with a different sql base class. * create the migrations and update alembic env and ini. * use if for inserting metadata in the ingestion app. * use it for getting unprocessed files in the rail performance manager application. * update tests to use both database managers

We need existing metadata to be copied from the rail performance manager rds to the metadata rds. Do so in the ingestion application before starting the main method. In the performance manager, don't enter the biz logic of an event loop until the legacy metadata table has been copied over.

Squash this with the docker-compose commit before merging. yamllint docker-compose as well.

* add proper migrations to seed metadata script * fix mypy error in DataBaseIndex enum * update metadata postgres version in yaml files * insert metadata from rail performance manager rds in migrations script. * remove legacy metadata table from rail performance manager schema

* Better Handling of hostnames and ports so that they work when the local performance manager is running in a docker image and directly on the command line. * Create a metadata_staging migrations directory. * In the performance manager staging migration, ensure that all metadata has been copied over to the metadata database before dropping the metadata table. * Get alembic migration names from the environment. * Improve environment validation to handle multiple databases.

rymarczy · 2024-01-10T18:24:13Z

python_src/src/lamp_py/runtime_utils/env_validation.py

@@ -8,7 +8,7 @@ def validate_environment(
    required_variables: List[str],
    private_variables: Optional[List[str]] = None,
    optional_variables: Optional[List[str]] = None,
-    validate_db: bool = False,
+    db_prefixes: Optional[List[str]] = None,


nitpick:

Suggested change

db_prefixes: Optional[List[str]] = None,

db_prefixes: Iterable[str] = (),

I know this is following the convention of private_variables and optional_variables because pylint complains of a default parameter values of [] but Iterable[str] = () could be used for all of those parameters and we could drop the if checks before our for loops and setting private_variables=[] at the start of the function.

rymarczy · 2024-01-10T18:25:34Z

python_src/src/lamp_py/migrations/versions/metadata_prod/07903947aabe_initial_changes.py

prefix file names with 001_ to help with sorting future migrations, same for metadata_staging migration file

rymarczy

Almost there!

The last significant issue is the insert operation on the new metdata table. Everything else appears to be running correctly for me, locally.

rymarczy · 2024-01-10T18:41:39Z

python_src/src/lamp_py/runtime_utils/alembic_migration.py

@@ -22,6 +22,7 @@ def get_alembic_config(db_name: str) -> Config:
        "performance_manager_dev",
        "performance_manager_staging",
        "performance_manager_prod",
+        "metadata_prod",


Add metadata_staging

rymarczy · 2024-01-10T21:48:57Z

python_src/src/lamp_py/migrations/versions/metadata_staging/07903947aabe_initial_changes.py

+    try:
+        rpm_db_manager = DatabaseManager(
+            db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER
+        )
+        md_db_manager = DatabaseManager(db_index=DatabaseIndex.METADATA)
+
+        insert_data = []
+        # pull metadata from the rail performance manager database via direct
+        # sql query. the metadata_log table may or may not exist.
+        with rpm_db_manager.session.begin() as session:
+            result = session.execute(
+                "SELECT path, processed, process_fail FROM metadata_log"
+            )
+            for row in result:
+                (path, processed, process_fail) = row
+                insert_data.append(
+                    {
+                        "path": path,
+                        "rail_pm_processed": processed,
+                        "rail_pm_process_fail": process_fail,
+                    }
+                )
+
+        # insert data into the metadata database
+        with md_db_manager.session.begin() as session:
+            # do nothing on conflicts in file paths
+            result = session.execute(
+                sa.insert(MetadataLog.__table__).values(insert_data)
+            )
+
+    except ProgrammingError as error:
+        # Error 42P01 is an 'Undefined Table' error. This occurs when there is
+        # no metadata_log table in the rail performance manager database
+        #
+        # Raise all other sql errors
+        if error.orig.pgcode == "42P01":
+            logging.info("No Metadata Table in Rail Performance Manager")
+        else:
+            raise


Suggested change

try:

rpm_db_manager = DatabaseManager(

db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER

)

md_db_manager = DatabaseManager(db_index=DatabaseIndex.METADATA)

insert_data = []

# pull metadata from the rail performance manager database via direct

# sql query. the metadata_log table may or may not exist.

with rpm_db_manager.session.begin() as session:

result = session.execute(

"SELECT path, processed, process_fail FROM metadata_log"

)

for row in result:

(path, processed, process_fail) = row

insert_data.append(

{

"path": path,

"rail_pm_processed": processed,

"rail_pm_process_fail": process_fail,

}

)

# insert data into the metadata database

with md_db_manager.session.begin() as session:

# do nothing on conflicts in file paths

result = session.execute(

sa.insert(MetadataLog.__table__).values(insert_data)

)

except ProgrammingError as error:

# Error 42P01 is an 'Undefined Table' error. This occurs when there is

# no metadata_log table in the rail performance manager database

#

# Raise all other sql errors

if error.orig.pgcode == "42P01":

logging.info("No Metadata Table in Rail Performance Manager")

else:

raise

try:

rpm_db_manager = DatabaseManager(

db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER

)

insert_data = []

# pull metadata from the rail performance manager database via direct

# sql query. the metadata_log table may or may not exist.

with rpm_db_manager.session.begin() as session:

result = session.execute(

"SELECT path, processed, process_fail FROM metadata_log"

)

for row in result:

(path, processed, process_fail) = row

insert_data.append(

{

"path": path,

"rail_pm_processed": processed,

"rail_pm_process_fail": process_fail,

}

)

except ProgrammingError as error:

# Error 42P01 is an 'Undefined Table' error. This occurs when there is

# no metadata_log table in the rail performance manager database

#

# Raise all other sql errors

insert_data = []

if error.orig.pgcode == "42P01":

logging.info("No Metadata Table in Rail Performance Manager")

else:

raise

if insert_data:

op.bulk_insert(MetadataLog.__table__, insert_data)

This insert operation needs to use op.bulk_insert or something similar to insert anything into the Metadata table that was created during this migration operation. An outside session will not be able to see the Metadata table until the upgrade has completed and all operations have been committed.

rymarczy · 2024-01-11T11:25:17Z

.../lamp_py/migrations/versions/performance_manager_staging/005_96187da84955_remove_metadata.py

+    missing_paths = legacy_paths - modern_paths
+    if len(missing_paths) != 0:
+        raise Exception(
+            f"Detected {len(missing_paths)} in Legacy Metadata Table not found in Metadata Database."
+        )


non-blocking:

This failure mechanism will result in the ECS dying an continually restarting if performance manager is trying to run before the ingestion migration completes. An alternate would be this check running in a while loop and breaking once there are no outstanding legacy_paths, compared to modern_paths.

* rename migration files to use standard format * enter while loop to detect legacy metadata thats not in the metadatabase. this will keep the service running until its set rather than stopping the service and restarting it. * Use alembic.op to bulk insert metadata in the Metadata migration rather than sql alchemy insert.

rymarczy

One more change..

rymarczy · 2024-01-12T14:58:38Z

python_src/src/lamp_py/migrations/versions/metadata_staging/001_07903947aabe_initial_changes.py

+
+    # insert data into the metadata database
+    if insert_data:
+        op.bulk_insert(MetadataLog.__table, insert_data)


Suggested change

op.bulk_insert(MetadataLog.__table, insert_data)

op.bulk_insert(MetadataLog.__table__, insert_data)

i'm mad that mypy didn't catch this one.

maybe thats why we need sqlalchemy 2.0

rymarczy · 2024-01-12T14:58:52Z

python_src/src/lamp_py/migrations/versions/metadata_prod/001_07903947aabe_initial_changes.py

+
+    # insert data into the metadata database
+    if insert_data:
+        op.bulk_insert(MetadataLog.__table, insert_data)


Suggested change

op.bulk_insert(MetadataLog.__table, insert_data)

op.bulk_insert(MetadataLog.__table__, insert_data)

rymarczy · 2024-01-12T15:58:03Z

python_src/src/lamp_py/migrations/versions/metadata_prod/001_07903947aabe_initial_changes.py

+        rpm_db_manager = DatabaseManager(
+            db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER
+        )
+        md_db_manager = DatabaseManager(db_index=DatabaseIndex.METADATA)


non-blocking:

Suggested change

md_db_manager = DatabaseManager(db_index=DatabaseIndex.METADATA)

same with this, but pylint.

rymarczy · 2024-01-12T15:58:19Z

python_src/src/lamp_py/migrations/versions/metadata_staging/001_07903947aabe_initial_changes.py

+        rpm_db_manager = DatabaseManager(
+            db_index=DatabaseIndex.RAIL_PERFORMANCE_MANAGER
+        )
+        md_db_manager = DatabaseManager(db_index=DatabaseIndex.METADATA)


non-blocking:

Suggested change

md_db_manager = DatabaseManager(db_index=DatabaseIndex.METADATA)

mzappitello · 2024-01-17T16:53:07Z

replaced by #224

mzappitello force-pushed the metadata_rds branch 5 times, most recently from c0f871e to fc6ad50 Compare December 12, 2023 17:59

mzappitello commented Dec 12, 2023

View reviewed changes

.env

Copy link

Contributor Author

mzappitello Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new env variables used to distinguish what db connection params are for.

mzappitello commented Dec 12, 2023

View reviewed changes

.github/workflows/ci_python.yaml Outdated

Copy link

Contributor Author

mzappitello Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two dbs are needed by the performance manager tests.

mzappitello commented Dec 12, 2023

View reviewed changes

docker-compose.yml

Copy link

Contributor Author

mzappitello Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two dbs in local docker compose.

mzappitello commented Dec 12, 2023

View reviewed changes

mzappitello commented Dec 13, 2023

View reviewed changes

python_src/src/lamp_py/postgres/snapshot.py Outdated

Copy link

Contributor Author

mzappitello Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we weren't using this.

mzappitello commented Dec 13, 2023

View reviewed changes

rymarczy reviewed Dec 20, 2023

View reviewed changes

rymarczy reviewed Dec 21, 2023

View reviewed changes

rymarczy requested changes Dec 21, 2023

View reviewed changes

rymarczy reviewed Dec 21, 2023

View reviewed changes

mzappitello force-pushed the metadata_rds branch 2 times, most recently from 70a7932 to 3d7718f Compare December 28, 2023 17:41

mzappitello commented Dec 28, 2023

View reviewed changes

mzappitello force-pushed the metadata_rds branch from 3d7718f to 47dad0e Compare December 28, 2023 17:47

rymarczy requested changes Dec 29, 2023

View reviewed changes

mzappitello added 7 commits January 8, 2024 14:54

Move postgres_schema -> rail_pm_schema

9461ad8

We're going to have two RDS schemas to maintain, one for the rail performance manager and one for the metadata. Move the rail performance manager into its own file thats better named. Remove the snapshot postgres feature since we don't use it.

update docker compose and env file for multiple dbs

9f2f43c

Update github workflow to use multiple dbs

01f3dbb

Squash this with the docker-compose commit before merging. yamllint docker-compose as well.

mzappitello force-pushed the metadata_rds branch from 47dad0e to 61323f8 Compare January 9, 2024 02:34

rymarczy reviewed Jan 10, 2024

View reviewed changes

rymarczy requested changes Jan 11, 2024

View reviewed changes

rymarczy requested changes Jan 12, 2024

View reviewed changes

mzappitello force-pushed the metadata_rds branch from f76f9bd to a9e0338 Compare January 12, 2024 17:03

last changes from PR 201

6f2f519

mzappitello force-pushed the metadata_rds branch from a9e0338 to 6f2f519 Compare January 12, 2024 19:48

mzappitello mentioned this pull request Jan 17, 2024

MetaData RDS 2 #224

Merged

mzappitello closed this Jan 17, 2024

		ignored-classes = ['sqlalchemy.orm.session.sessionmaker','pyarrow.compute']
		ignored-classes = ['sqlalchemy.orm.session.sessionmaker', 'pyarrow.compute']

		if legacy_metadata_exists(rpm_db_manager):
		raise EnvironmentError("Legacy Metadata Detected")

		class MetadataLog(SqlBase): # pylint: disable=too-few-public-methods
		class LegacyMetadataLog(RpmSqlBase): # pylint: disable=too-few-public-methods

		@@ -177,27 +177,6 @@ class TempEventCompare(RpmSqlBase): # pylint: disable=too-few-public-methods
		)


		class LegacyMetadataLog(RpmSqlBase): # pylint: disable=too-few-public-methods

		def upgrade() -> None:
		# ### commands auto generated by Alembic - please adjust! ###

		# run metadata rds migrations
		alembic_upgrade_to_head(db_name="metadata_prod")

	db_prefixes: Optional[List[str]] = None,
	db_prefixes: Iterable[str] = (),

	op.bulk_insert(MetadataLog.__table, insert_data)
	op.bulk_insert(MetadataLog.__table__, insert_data)

Metadata rds #201

Metadata rds #201

Conversation

mzappitello commented Dec 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rymarczy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rymarczy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rymarczy Dec 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rymarczy Dec 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rymarczy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rymarczy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzappitello commented Jan 17, 2024

rymarczy Dec 29, 2023 •

edited

Loading

rymarczy Dec 29, 2023 •

edited

Loading