Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated bioavailability_ma_et_al #366

Conversation

bethanyconnolly
Copy link
Collaborator

Updated the templates for this task to improve grammar and that this data is only for the bioavailability via oral pathway.
Note: Oral bioavailability is not really a binary task, it should be a fraction. This paper classified oral bioavailability as “positive” if its bioavailability≥ 20%, otherwise “negative” because they needed it to be binary for their SVM model to do classification.
This means that we might need to think more about this task, some options:

  • Leave it as it is right now
  • Use the raw data instead of the SVM binary data: this would change it to a continuous task: http://modem.ucsd.edu/adme/databases/databases_bioavailability.htm
  • Change the templates to be a bit more vague but stick with the existing data e.g. instead of this SMILES 'is orally bioavailable', we could say this SMILES has 'high oral bioavailability', or 'is likely to show oral bioavailability'.
    Happy to discuss what we think is best :)

@kjappelbaum
Copy link
Collaborator

Wow! That's a detailed analysis 🕵🏽‍♀️

personally, I'd go with the raw data as we could then also do the regression tasks (as you said)

@jackapbutler
Copy link
Collaborator

jackapbutler commented Jul 20, 2023

Depending on how much work it is to duplicate the template we could create a new one for option 2) and change the current one to be more like 3) so it has numerical + "high / low" language views of the data.

@MicPie
Copy link
Contributor

MicPie commented Jul 26, 2023

Yeah, I guess the idea from Jack would be the best approach to work with the data from the current PR and add the regression dataset as a new one.
In the templates we should only need to change
is {bioavailable#not &NULL}{bioavailable__names__adjective}
to
has a {bioavailable#low &high }{bioavailable__names__noun}.
I'll check that. :-)

@MicPie
Copy link
Contributor

MicPie commented Aug 23, 2023

PR #388 adds the proposed changes from this PR.
Issue #389 covers the addition of the raw data in a new tabular dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants