Release slendr 0.5.0 · bodkan/slendr

Minor breaking change! Python environments of slendr are no longer automatically activated upon calling library(slendr)! Using the coalescent msprime back end and slendr's tree-sequence functions now requires making an explicit call to a new function init_env() after library(slendr) is executed. (PR #102)

Motivation for the change: A small proportion of users have been experiencing issues with broken conda environments and various other issues with Python virtual environments in general. It's hard to guess how frequent this has been, but experience from workshops and courses suggests perhaps 1 in 20 of users experiencing Python issues which hindered their ability to use slendr .(Fun fact: the first user-submitted GitHub issue upon releasing the first version of the slendr R package was... a Python virtual environment issue).

Explanation: Activating Python environments automatically upon calling library(slendr) has been a popular feature because it hid away most of the complexities of the R-Python interface that powers slendr's tree-sequence functionality. This was particularly convenient for many slendr users, particularly those who have no experience with Python at all.

Unfortunately, in cases where a Python virtual environments with tskit/msprime/pyslim on a user's system ended up corrupted (or if anything else at the Python level got broken), the automatic Python environment activation performed by the library(slendr) call failed and slendr was not even loaded. Sadly, this completely pulled the rug from under slendr and there was nothing that could be done about it from its perspective (the issue happened at a low-level layer of embedded-Python before slendr could've been loaded into R). Solving these issues was not difficult for experienced users, but many slendr users have no experience with Python at all, they have never used conda, they don't understand the concept of "Python virtual environments" or how the R-Python interface works. And nor should they! After all, slendr is an R package.

Splitting the Python virtual environment activation step into its own init_env() function means that library(slendr) now always succeeds (regardless of potential underlying Python issues on a user's sytem), making it much easier to diagnose and fix Python problems from R once the package is loaded.

So, to recap: library(slendr) no longer activates slendr's isolated Python virtual environment. In order to simulate tree sequences and analyse them using its interface to tskit, it is necessary to call init_env(). This function performs the same Python-activation steps that library(slendr) used to call automagically in earlier slendr versions. No other change to your scripts is necessary.

Related to the previous point: slendr now requires Python 3.11, msprime 1.2.0, tskit 0.5.4, and pyslim 1.0.1, to keep up with recent releases of its Python dependencies. Again, this presents no hassle to the user, and the only thing required is re-running setup_env(). (PR #112).
When a named list is provided as a sample_sets = argument to a oneway statistic function, the names are used in a set column of the resulting data frame even if only single samples were used. (#2a6781)
It is now possible to have non-spatial populations in an otherwise spatial model. Of course, when plotting such models on a map, only spatial components of the model will be plotted and slendr will give a warning. To be absolutely sure that users intends to do this, slendr will also give a warning when running compile_model() on models like this. Please consider this option experimental for the time-being as it is hard to predict which edge cases might break because of this (all unit tests and documentation tests are passing though). Feedback is more than welcome. (PR #112).
It is now possible to label groups of samples in slendr's tskit interface functions which should make data frames with statistics results more readable. As an example, running ts_f3(ts, A = c("p1_1", "p1_2", "p1_3"), B = c("p2_1", "p2_3"), C = c("p3_1", "p3_2", "p3_")) resulted in a following data-frame output:

> ts_f3(ts, A = c("p1_1", "p1_2", "p1_3", "p1_4", "p1_5"),
            B = c("p2_1", "p2_2", "p2_3"),
            C = c("p3_1", "p3_2", "p3_3", "p3_4"))

# A tibble: 1 × 4
  A                        B              C                         f3
  <chr>                    <chr>          <chr>                  <dbl>
1 p1_1+p1_2+p1_3+p1_4+p1_5 p2_1+p2_2+p2_3 p3_1+p3_2+p3_3+p3_4 0.000130

This gets unwieldy rather quickly, especially when dozens or hundreds of samples are grouped together as populations. The new syntax allows the following shortcut via customised group names leveraging the standard named list functionality in R:

> ts_f3(ts, A = list(group_one = c("p1_1", "p1_2", "p1_3", "p1_4", "p1_5")),
            B = list(group_two = c("p2_1", "p2_2", "p2_3")),
            C = list(group_three = c("p3_1", "p3_2", "p3_3", "p3_4")))
# A tibble: 1 × 4
  A         B         C                 f3
  <chr>     <chr>     <chr>          <dbl>
1 group_one group_two group_three 0.000130

This is more readable and in line with some other tskit-interface functions of slendr which used this functionality via their sample_sets = argument (ts_divergence(), ts_diversity(), etc.). (#ac5e484)

The default state of the parent = argument of population() is now NULL instead of "ancestor". This prevents silly surprising clashes in situation where some population's name really is "ancestor". The only change internally is that for populations which are ancestral, the splits data frame element of a slendr model object which includes this population carries a formal "ancestral parent population" as "__pop_is_ancestor" instead of just "ancestor". Note that this is an internal implementation detail and not something that particularly has to involve the user. Still, if you have been somehow using slendr's internal data structures, keep this in mind. (#f8a39a2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slendr 0.5.0