-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performances by having one assay by level #70
Comments
@bedroesb and @proccaserra , what is your opinion on this ? |
@cpommier Could you send the changes you propose in |
@cpommier as far as I remember (It is already a long time ago), we had the discussion about the which 'observation level' to support from BRAPI to ISA. So we need to go back to the mapping between BRAPI and ISA entities. |
Thanks @bedroesb and @proccaserra ! A quick call might indeed be better, I'll send you both an email and we will see what we can do, thanks! Curently, the code works, but is not too slow.
I hope I am not oversimplyfing things.
Does that make sense ? |
@cpommier, I ran a quick test on a notebook with the isa-api. If an isa.Source is declared but not used in any protocol to create another materials, even though the isa.Source and associated to a Study, the serialisation to ISA-tab does not write that source. The ISA-JSON serialisation is fine shows the unused Source. So it seems we need to look at the ISA-Tab serialisation code. @djcomlab @Zigur |
Thanks a lot @proccaserra |
So this is kind of expected behaviour and down to design decisions in ISA API. The ISA-Tab tables serialization focuses on describing the experimental workflow, and if there's a source declared that is not used in a workflow, then the assumption is has been that it has not used in the experiment and therefore not rendered in a table. ISA-JSON, however, has no such constraints as the data model has the structures to cleanly hold sources independent of the experimental workflow. |
Thanks for the clarification @djcomlab . I understand the logic. |
No, I don't think this would work as things are at the moment. The problem is that in ISA-Tab there is no separate "container" for material nodes (Source, Sample, other stuff) to exist independently from the experimental workflows. It is of course possible to just add sources into a study table file with no linked processes and derived materials or data etc. but in ISA API the design assumption has been that this would not be correct, and I suspect this raises errors at validation time (disclaimer: but I have not tried it!). |
Ok, but would it be possible to attach all the sources to a single assay ? |
In https://github.com/elixir-europe/plant-brapi-to-isa/blob/master/brapi_to_isa.py lines 103 and following, we create one sample attached to one source, that gets attached to the study line 154
Then it get attached to a dedicated assay again line 168: If we have only ` isa_study.samples.append(this_isa_sample) and create a single assay, woud that work ? @erikkimmel, what do you think ? |
For the sources and samples to appear in the tables they must be associated with a process. e.g. at the study-level, source -> process -> sample must be described for it to appear in the study table. Similarly to the sources in the sources container in the model and ISA-JSON not coming out in the ISA-Tab study tables, samples must also be part of an experimental workflow (i.e. part of a process sequence) to appear in the assay table files. @proccaserra can probably elaborate on this. |
Thanks, that clarifies the choices that were made |
@cpommier so I ran a few more test this morning, writing up variations of ISA objects and now confirm that:
I'll document all that in a jupyter-notebook |
@cpommier a suggestion: if brapi2isa code can identify 'unused Sources', then a possible solution would be to create "dummy" Samples so the ISA-Tab serialisation dumps the ISA.Source Information while clearly marking the ISA.Sample as "dummies". This could be done using a special string (e.g. <NOT_USED> string as value for the Sample Name), complemented by the addition of an ISA.Comment element qualifying said ISA.Sample object to make it clear to human agent. |
A question: Are the From what I can see in the code it looks like there is no branching or pooling. The ISA API's ability to deal with complex graphs is what slows it down. |
@djcomlab, based on the set of converted brapi studies, the graph complexity is very low. The amount of information in the ISA assay table is actually very low. |
@proccaserra That's what I figured. I could probably put together a fast custom ISA-Tab writer just for this, until we address the overarching ISA-API performance. The idea being that ISA API is more like the Swiss Army knife that we can use to do many things with, but not necessarily all of these things optimally in all cases, but when we want to do one specific kind of task on a larger scale, we need to craft something optimised the task. |
@proccaserra @djcomlab , thanks! We had the same idea @erikkimmel and I and even tried an implementation, but it prooved to be too much work for us since we don't know the isatool library as well as you. Anyway, thanks a lot for proposing this optimised writer, it would be highly usefull! |
Good morning all, |
We just reviewed the code and here is the current logic and todo list:
|
Hi,
Regards |
@erikkimmel which isa-tools version did you use? |
|
with isa-tools 12 i hope to finally solve #62 and to have improved isa json dumping, lets see! Do we after testing on the VIB dataset merge the test/single_assay_by_level branch into master ? or do you want to work on the other points first? @erikkimmel |
the changes proposed in test/single_assay_by_level could impove performances but have unwanted side effects. We should I think first check the impact of isatools 12, then asses the impact of this branch on metadata. It is still work in progress and there might be undesirable side effects. We can cope with it as long as there is a significative speed imporvment between master and test/single_assay_by_level. If that's not the case, the fast dumper suggested by @djcomlab might be more advisable. |
@bedroesb, @cpommier here are the results of my tests using the 0.12.0 version of
Cheers |
Thanks Erik. isatools 0.12 is definitely an improvement. The single assay by level approach can take advantage of it, but 143m is still rather long. This is only a 3 year, 12 locations, 300 samples dataset. Extract it in a few minutes, without validation, would be really a great thing. |
I ran the tests on the
|
this is such great news! Thanks for testing |
The one assay per level seems to be a bad approach. Performances improvement is still important, but probably with other approach, using latest isatools mainly. |
The current bottelneck is the number of assays. There is curently one assy per observationunit.
MIAPPE allows to have one assay per level.
By handling only one observation unit per level in brapi_to_isa line 95 :
def create_study_sample_and_assay
the performances are dramatically improved.But, the germplasm list isn't present in the study file anymore.
What we don't understand, is that the germplasms should be handled by the following piece of code, line 534:
What are we suppose to do to ensure that the germplams end in the right study charachteristics ?
The text was updated successfully, but these errors were encountered: