Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"NA" groups are omitted from plotly plots #888

Open
adkinsrs opened this issue Sep 17, 2024 · 0 comments
Open

"NA" groups are omitted from plotly plots #888

adkinsrs opened this issue Sep 17, 2024 · 0 comments
Assignees
Labels
bug Something isn't working Low Priority Not the biggest concern at the moment. Also includes "nice to haves"

Comments

@adkinsrs
Copy link
Member

adkinsrs commented Sep 17, 2024

https://nemoanalytics.org/dataset_curator.html?dataset_id=6b3f05b3-0885-4765-8d6b-c449753823b6

Observed while fixing #875

If you add a data series with groups like "NA" or other designated Pandas missing values, Pandas will convert it into a "nan" missing value after the AnnData object is read in. I believe this functionality may have changed from Pandas 1.x to 2.x but I cannot confirm. This has lead to various issues, which I have resolved but one particular one is that the NA groups are omitted from the plot. I believe this is actually related to the Plotly package, as I have not seen this in tSNE static plots. This has been observed using "dev_state" or "cell_type" as the x-axis with no color.

In addition, passing "nan" to one of the Plotly Express functions throws an error KeyError: (nan, '', '', '', '') where the KeyError is originating from the plotting args. In this particular example (link above), I used the "dev_state" as the x-axis and the color param for a scatter plot, and "dev_state" (or "cell_type") has a designated "NA" group. When I switch to a series with no "NA", such as "stage_ord" for the x-axis and leave color as "dev_state", the issue still persists, which tells me this is related to the color mapping and name. This is not an issue with violin plots, where I had to write a custom function a few years back.

I think the solution is to find any "nan" values in the adata.obs object, and fill in the value to be "NA". It's not a fool-proof solution, since the used missing value from the dataset may be something else, but it at least makes this a string, and less prone to weirdness with downstream things.

@adkinsrs adkinsrs added bug Something isn't working Low Priority Not the biggest concern at the moment. Also includes "nice to haves" labels Sep 17, 2024
@adkinsrs adkinsrs self-assigned this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Low Priority Not the biggest concern at the moment. Also includes "nice to haves"
Projects
None yet
Development

No branches or pull requests

1 participant