-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't Install Tableau API on arm64 #218
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
use asdf | ||
|
||
dotenv | ||
dotenv |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
poetry 1.4.2 | ||
poetry 1.7.1 | ||
python 3.10.13 | ||
direnv 2.32.2 |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,7 @@ | |
from lamp_py.runtime_utils.env_validation import validate_environment | ||
from lamp_py.runtime_utils.process_logger import ProcessLogger | ||
|
||
from lamp_py.tableau.pipeline import start_parquet_updates | ||
from lamp_py.tableau import start_parquet_updates | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the only part i don't like. if we import directly from the file we're doing to get the import error and it felt like catch that error should be left to the subdir. |
||
|
||
from .flat_file import write_flat_files | ||
from .l0_gtfs_rt_events import process_gtfs_rt_files | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Tableau Publisher | ||
|
||
The Tableau Publisher is an application that takes data created by the Rail Performance Manager application as parquet files and publishes them to the ITD Managed Tableau Instance as hyper files. | ||
|
||
## Application Operation | ||
|
||
The application itself is run via a cloudwatch event that is set to trigger on a cronlike schedule. | ||
|
||
On each run, it iterates through a list of jobs that generate hyper files and uploads them to the ITD Tableau server, where they can be used to generate dashboards and reports for external users. To generate the job reads a parquet file that has been created by upstream LAMP applications and converts it to a hyper file using the [Tableau Hyper API](https://www.tableau.com/developer/tools/hyper-api). The file is generated on local storage, and then uploaded to the ITD Managed Tableau server using the [Tableau Server Client](https://tableau.github.io/server-client-python/), a python library wrapping the [Tableau REST API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm). | ||
|
||
### Upstream Applications | ||
|
||
To simplify the conversion from parquet to hyper, the schemas for both are defined within this module. We also store the hardcoded S3 filepaths. Because of this, components of this library are used by other applications when writing the parquet files. | ||
|
||
## Developer Note | ||
|
||
The Tableau Hyper API is not currently supported on Apple Silicon. This means that local execution on Mac OSX with arm64 processors will not work without emulation. In light of that, imports from this directory will trigger `ModuleNotFound` exceptions if running on the wrong system. To avoid that, the `__init__.py` file includes a wrapper around components that are consumed by other applications. These functions will log an error when run without the desired dependencies. | ||
|
||
### Installation without Tableau dependencies | ||
|
||
In `pyproject.toml`, there is an additional dependency group that contains the tableau dependencies. It is not marked optional, so these modules will be installed with `poetry install`. If you are on an arm64 architecture, you can avoid installing the tableau dependencies with `poetry install --without tableau`. This behavior is encoded in the `.envrcy`, `docker-compose.yml`, and `Dockerfile` files in this repository, so you should get the desired behavior without additional arguments. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,32 @@ | ||
"""Utilites for Interacting with Tableau and Hyper files""" | ||
"""Utilities for Interacting with Tableau and Hyper files""" | ||
import logging | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. update the tableau init file to catch exceptions when the tableau modues arent found. |
||
from types import ModuleType | ||
from typing import Optional | ||
|
||
from lamp_py.postgres.postgres_utils import DatabaseManager | ||
|
||
# pylint: disable=C0103 (invalid-name) | ||
# pylint wants pipeline to conform to an UPPER_CASE constant naming style. its | ||
# a module though, so disabling to allow it to use normal import rules. | ||
pipeline: Optional[ModuleType] | ||
|
||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion(non-blocking): try:
import .pipeline as pipeline
except ModuleNotFoundError:
pipeline = None
def start_parquet_updates(db_manager: DatabaseManager) -> None
if pipeline is None:
logging.exception(...)
else:
pipeline.start_parquet_updates(db_manager) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh. i like this much better. |
||
from . import pipeline | ||
except ModuleNotFoundError: | ||
pipeline = None | ||
|
||
# pylint: enable=C0103 (invalid-name) | ||
|
||
|
||
def start_parquet_updates(db_manager: DatabaseManager) -> None: | ||
""" | ||
wrapper around pipeline.start_parquet_updates function. if a module not | ||
found error occurs (which happens when using osx arm64 dependencies), log | ||
an error and do nothing. else, run the function. | ||
""" | ||
if pipeline is None: | ||
logging.error( | ||
"Unable to run parquet files on this machine due to Module Not Found error" | ||
) | ||
else: | ||
pipeline.start_parquet_updates(db_manager=db_manager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add dependency group for tableau dependencies. its not optional, meaning a user will have to change behavior to build without it.