Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add renv lockfile for development dependencies #65

Merged
merged 8 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1 +1,13 @@
Config/renv/profiles/reporting/dependencies: quarto, leaflet, plotly, sf
Config/renv/profiles/dev/dependencies:
DBI,
igraph,
lubridate,
openxlsx,
readr,
rmarkdown,
RJDBC
Comment on lines +1 to +8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the additional dependencies needed for 00-ingest.R, 07-export.R, 08-api.R, and building README.Rmd.

Config/renv/profiles/reporting/dependencies:
leaflet,
plotly,
quarto,
sf
21 changes: 17 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -759,14 +759,27 @@ Both [Tidymodels](https://tune.tidymodels.org/articles/extras/optimizations.html
* The number of threads is set via the [num_threads](https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_threads) parameter, which is passed to the model using the `set_args()` function from `parsnip`. By default, `num_threads` is equal to the full number of physical cores available. More (or faster) cores will decrease total training time.
* This repository uses the CPU version of LightGBM included with the [LightGBM R package](https://lightgbm.readthedocs.io/en/latest/R/index.html). If you'd like to use the GPU version you'll need to [build it yourself](https://lightgbm.readthedocs.io/en/latest/R/index.html#installing-a-gpu-enabled-build) or wait for the [upcoming CUDA release](https://github.com/microsoft/LightGBM/issues/5153).

## Updating R dependencies
## Managing R dependencies

We use multiple renv lockfiles in order to manage R dependencies:
### Profiles and Lockfiles

We use multiple renv lockfiles to manage R dependencies:

1. **`renv.lock`** is the canonical list of dependencies that are used by the **core model pipeline**. Any dependencies that are required to run the model itself should be defined in this lockfile.
2. **`renv/profiles/reporting/renv.lock`** is the canonical list of dependencies that are used to **generate a model performance report** in the `finalize` step of the pipeline. Any dependencies that are required to generate that report or others like it should be defined in this lockfile.
2. **`renv/profiles/reporting/renv.lock`** is the canonical list of dependencies that are used to **generate model reports** in the `finalize` step of the pipeline. Any dependencies that are required to generate reports should be defined in this lockfile.
3. **`renv/profiles/dev/renv.lock`** is the canonical list of dependencies that are used **for local development**, running the `ingest`, `export`, and `api` steps of the pipeline, and building the README. These dependencies are required only by CCAO staff and are not required to run the model itself.

Our goal in maintaining multiple lockfiles is to keep the list of dependencies required to run the model as short as possible. This choice adds overhead to the process of updating R dependencies, but incurs the benefit of a more maintainable model over the long term.

### Using Lockfiles for Local Development

When working on the model locally, you'll typically want to install non-core dependencies _on top of_ the core dependencies. To do this, simply run `renv::restore("<path_to_lockfile")` to install all dependencies from the lockfile.

For example, if you're working on the `ingest` stage and want to install all its dependencies, start with the main profile (run `renv::activate()`), then install the `dev` profile dependencies on top of it (run `renv::restore("renv/profiles/dev/renv.lock")`).

> :warning: WARNING: Installing dependencies from a dev lockfile will **overwrite** any existing version installed by the core one. For example, if `ggplot2@3.3.0` is installed by the core lockfile, and `ggplot2@3.2.1` is installed by the dev lockfile, renv will **overwrite** `ggplot2@3.3.0` with `ggplot2@3.2.1`.
Comment on lines +774 to +780
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically exactly what is happening inside the Docker container. It's a little non-standard but works fine since this is only for our use.


Our goal in maintaining multiple lockfiles is to keep the list of dependencies that are required to run the model as short as possibile. This choice adds overhead to the process of updating R dependencies, but incurs the benefit of a more maintainable model over the long term.
### Updating Lockfiles

The process for **updating core model pipeline dependencies** is straightforward: Running `renv::install("<dependency_name>")` and `renv::snapshot()` will ensure that the dependency gets added or updated in `renv.lock`, as long is it is imported somewhere in the model pipeline via a `library(<dependency_name>)` call.

Expand Down
52 changes: 41 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,11 @@ Table of Contents
- [Output](#output)
- [Getting Data](#getting-data)
- [System Requirements](#system-requirements)
- [Updating R dependencies](#updating-r-dependencies)
- [Managing R dependencies](#managing-r-dependencies)
- [Profiles and Lockfiles](#profiles-and-lockfiles)
- [Using Lockfiles for Local
Development](#using-lockfiles-for-local-development)
- [Updating Lockfiles](#updating-lockfiles)
- [Troubleshooting](#troubleshooting)
- [License](#license)
- [Contributing](#contributing)
Expand Down Expand Up @@ -340,7 +344,7 @@ districts](https://gitlab.com/ccao-data-science---modeling/models/ccao_res_avm/-
and many others. The features in the table below are the ones that made
the cut. They’re the right combination of easy to understand and impute,
powerfully predictive, and well-behaved. Most of them are in use in the
model as of 2023-12-02.
model as of 2023-12-04.

| Feature Name | Category | Type | Possible Values | Notes |
|:------------------------------------------------------------------------|:---------------|:------------|:-----------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------|
Expand Down Expand Up @@ -1264,23 +1268,49 @@ sped up using the parallel processing built-in to LightGBM. Note that:
or wait for the [upcoming CUDA
release](https://github.com/microsoft/LightGBM/issues/5153).

## Updating R dependencies
## Managing R dependencies

We use multiple renv lockfiles in order to manage R dependencies:
### Profiles and Lockfiles

We use multiple renv lockfiles to manage R dependencies:

1. **`renv.lock`** is the canonical list of dependencies that are used
by the **core model pipeline**. Any dependencies that are required
to run the model itself should be defined in this lockfile.
2. **`renv/profiles/reporting/renv.lock`** is the canonical list of
dependencies that are used to **generate a model performance
report** in the `finalize` step of the pipeline. Any dependencies
that are required to generate that report or others like it should
be defined in this lockfile.
dependencies that are used to **generate model reports** in the
`finalize` step of the pipeline. Any dependencies that are required
to generate reports should be defined in this lockfile.
3. **`renv/profiles/dev/renv.lock`** is the canonical list of
dependencies that are used **for local development**, running the
`ingest`, `export`, and `api` steps of the pipeline, and building
the README. These dependencies are required only by CCAO staff and
are not required to run the model itself.

Our goal in maintaining multiple lockfiles is to keep the list of
dependencies that are required to run the model as short as possibile.
This choice adds overhead to the process of updating R dependencies, but
incurs the benefit of a more maintainable model over the long term.
dependencies required to run the model as short as possible. This choice
adds overhead to the process of updating R dependencies, but incurs the
benefit of a more maintainable model over the long term.

### Using Lockfiles for Local Development

When working on the model locally, you’ll typically want to install
non-core dependencies *on top of* the core dependencies. To do this,
simply run `renv::restore("<path_to_lockfile")` to install all
dependencies from the lockfile.

For example, if you’re working on the `ingest` stage and want to install
all its dependencies, start with the main profile (run
`renv::activate()`), then install the `dev` profile dependencies on top
of it (run `renv::restore("renv/profiles/dev/renv.lock")`).

> :warning: WARNING: Installing dependencies from a dev lockfile will
> **overwrite** any existing version installed by the core one. For
> example, if `ggplot2@3.3.0` is installed by the core lockfile, and
> `ggplot2@3.2.1` is installed by the dev lockfile, renv will
> **overwrite** `ggplot2@3.3.0` with `ggplot2@3.2.1`.

### Updating Lockfiles

The process for **updating core model pipeline dependencies** is
straightforward: Running `renv::install("<dependency_name>")` and
Expand Down
28 changes: 15 additions & 13 deletions renv.lock
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"R": {
"Version": "4.2.2",
"Version": "4.3.2",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version bump on R to match the version in the Docker container.

"Repositories": [
{
"Name": "CRAN",
Expand Down Expand Up @@ -116,7 +116,7 @@
},
"arrow": {
"Package": "arrow",
"Version": "14.0.0.1",
"Version": "14.0.0.2",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
Expand All @@ -134,7 +134,7 @@
"utils",
"vctrs"
],
"Hash": "75782a533f9cddf70709455abcc52d5d"
"Hash": "042f2ee2286a91abe5a3d66c9be92380"
},
"askpass": {
"Package": "askpass",
Expand Down Expand Up @@ -415,13 +415,13 @@
},
"cpp11": {
"Package": "cpp11",
"Version": "0.4.6",
"Version": "0.4.7",
"Source": "Repository",
"Repository": "RSPM",
"Requirements": [
"R"
],
"Hash": "707fae4bbf73697ec8d85f9d7076c061"
"Hash": "5a295d7d963cc5035284dcdbaf334f4e"
},
"crayon": {
"Package": "crayon",
Expand Down Expand Up @@ -856,7 +856,7 @@
"Package": "jsonlite",
"Version": "1.8.7",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Requirements": [
"methods"
],
Expand Down Expand Up @@ -1413,10 +1413,12 @@
"renv": {
"Package": "renv",
"Version": "1.0.3",
"OS_type": null,
"NeedsCompilation": "no",
"Source": "Repository",
"Repository": "CRAN",
"Source": "Repository"
"Requirements": [
"utils"
],
"Hash": "41b847654f567341725473431dd0d5ab"
},
"rlang": {
"Package": "rlang",
Expand Down Expand Up @@ -1764,17 +1766,17 @@
},
"vctrs": {
"Package": "vctrs",
"Version": "0.6.4",
"Version": "0.6.5",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Requirements": [
"R",
"cli",
"glue",
"lifecycle",
"rlang"
],
"Hash": "266c1ca411266ba8f365fcc726444b87"
"Hash": "c03fa420630029418f7e6da3667aac4a"
},
"viridisLite": {
"Package": "viridisLite",
Expand Down Expand Up @@ -1900,7 +1902,7 @@
"Package": "xml2",
"Version": "1.3.5",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Requirements": [
"R",
"methods"
Expand Down
Loading