Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion on constructing SpatialData object from the ground up #203

Open
MiTPenguin opened this issue Sep 6, 2024 · 2 comments
Open

Confusion on constructing SpatialData object from the ground up #203

MiTPenguin opened this issue Sep 6, 2024 · 2 comments

Comments

@MiTPenguin
Copy link

Hi, I am putting this in the -io package git page, but let me know if it's better to go in the general package page.

I'm dealing with a set of mIF data that have been previously processed (similar to mcmicro, but not exactly). The data set consists of:

  • Multiple samples
  • Each samples containing multiple ROI in "cyx" format
  • Segmentation and other masks (drawn by human input) for each ROI, in 2d format (binary masks, or labeled with integer for segmentation)
  • A table of meta data and extracted values.

I have previously been able to translate, with a lot of trial & error, this format of data set into Squidpy compatible AnnData, and do analysis and plotting, on individual ROIs, sample, etc.

With the new SpatialData object, it's not clear to me how I should best approach constructing it. Here are my questions:

  1. With squidpy, plotting of individual ROI from an AnnData containing multiple ROI can be controlled supplying the parameter library_id and library_key. Is there an equivalent concept in SpatialData? For example, if I'm rendering an Image (a ROI), a set of cell Shapes, and want to color it with a meta data from the table (AnnData), from a SpatialData object containing multiple ROIs, how would the function determine the correct/corresponding data to pull from each different modules? Is it primarily through unique coordinate systems?
  2. I know there's an "instance_id" parameter (that's inserted when I use the legacy conversion function): but does the instance_id have to be unique across the entire dataset? How about for cell segmentation mask, where the ID is necessarily integer only?
  3. How should we set coordinate system in a dataset like this? Should it be a coordinate for each sample? for each ROI? And is there a faster way to set it up instead of just looping through set_transformation multiple times?
  4. graphing connectivity maps: is there a built in function that would graph connectivity maps? or do we just have to layer it using the sq.pl package?

Hopefully this is clear. I'm looking through the different example datasets, but I haven't found one that seems to emulate this dataset format.

@LucaMarconato
Copy link
Member

LucaMarconato commented Oct 1, 2024

Hi, thanks for reaching out!

With squidpy, plotting of individual ROI from an AnnData containing multiple ROI can be controlled supplying the parameter library_id and library_key. Is there an equivalent concept in SpatialData? For example, if I'm rendering an Image (a ROI), a set of cell Shapes, and want to color it with a meta data from the table (AnnData), from a SpatialData object containing multiple ROIs, how would the function determine the correct/corresponding data to pull from each different modules? Is it primarily through unique coordinate systems?

Yes, you can plot a specific ROI even if the table contains multiple ROIs. The spatialdata-plot and napari-spatialdata libraries take care of matching the table to the ROIs. This is link is given by the region, region_key and instance_key metadata of the table, explained in the docs for [TableModel.parse()](https://spatialdata.scverse.org/en/latest/generated/spatialdata.models.TableModel.html) and shows in this example notebook.

For very large objects, manually subsetting before plotting may be more performant (but if performance is an issue please report and we can optimize the automatic subsetting). An example of subsetting the data for a dataset similar to yours is found here (3 ROIs, 1 table). Minor note, in the next release pp.get_elements() will be replaced by .subset(), just that you know.

I know there's an "instance_id" parameter (that's inserted when I use the legacy conversion function): but does the instance_id have to be unique across the entire dataset? How about for cell segmentation mask, where the ID is necessarily integer only?

region_key and instance_key are the name of two columns that must be present in a table that is annotating 1 or more samples. Each pair of values (=each row of these two columns) must be unique. Uniqueness for instance_key values alone is not required as it would be too restrictive.

How should we set coordinate system in a dataset like this? Should it be a coordinate for each sample? for each ROI? And is there a faster way to set it up instead of just looping through set_transformation multiple times?

I suggest to have one coordinate system per sample, and in addition one coordinate system per ROI. Currently the only way to proceed is looping over, for instance this is what we do in this notebook in the function postpone_transformation(). We prepared a new design that will remove the need for loops, but it will take quite some time before we finish implementing it.

graphing connectivity maps: is there a built in function that would graph connectivity maps? or do we just have to layer it using the sq.pl package?

We don't have such function, please use indeed squidpy.

@LucaMarconato
Copy link
Member

LucaMarconato commented Oct 1, 2024

Another comment. In your case you may benefit from what we discussed in this issue: scverse/spatialdata#398, what do you think about it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants