Skip to content

Commit

Permalink
adding bucket table import from GNPS
Browse files Browse the repository at this point in the history
  • Loading branch information
mwang87 committed Dec 19, 2018
1 parent 960a22c commit 33d78ef
Show file tree
Hide file tree
Showing 8 changed files with 441 additions and 16 deletions.
21 changes: 14 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,13 @@ This function will take as input an existing GNPS Molecular Networking task and
qiime metabolomics import-gnpsnetworkingclusteringtask
```

#### MS2 GNPS Clustering Bucket Table Import Command
This function will take as input an existing Bucket Table from GNPS Molecular Networking Clustering to produce a biom qza file.

```
qiime metabolomics import-gnpsnetworkingclusteringbuckettable
```

#### MZmine2 Feature Import Command
This function will take as input a feature quantification file from MZmine2 and a manifest file and produce a biom qza file.

Expand Down Expand Up @@ -159,11 +166,11 @@ Select Export->CSV File

### Manifest File Format

The manifest file specifies the location of the files that will be processed by the metabolomics plugin. It is a .CSV (comma separated value) formatted table that contains two columns. The first column indicates the ‘sample-sample’ for each file, while the second column indicates its corresponding relative file path (relative to where qiime commands are called). The gnps-clustering and the mzmine2-clustering tools are using both the same manifest file.
The manifest file specifies the location of the files that will be processed by the metabolomics plugin. It is a .CSV (comma separated value) formatted table that contains two columns. The first column indicates the ‘sample_name’ for each file, while the second column indicates its corresponding relative file path (relative to where qiime commands are called). The gnps-clustering and the mzmine2-clustering tools are using both the same manifest file.

View of the manifest file (.CSV format). The first column indicates the ‘sample-same for each file, while the second column indicates its corresponding relative file path. The example file can be [downloaded here](https://github.com/mwang87/q2_metabolomics/raw/master/q2_metabolomics/tests/data/manifest.tsv).
View of the manifest file (.CSV format). The first column indicates the sample_name for each file, while the second column indicates its corresponding relative file path. The example file can be [downloaded here](https://github.com/mwang87/q2_metabolomics/raw/master/q2_metabolomics/tests/data/manifest.tsv).

| sample-name | filepaths |
| sample_name | filepaths |
| ------------- |:-------------:|
| sample1 | data/121114_nanoDESI_polar_ISP2_control_DD_MS2.mzXML |
| sample2 | data/121119_VM37_FT-IT.mzXML |
Expand Down Expand Up @@ -267,7 +274,7 @@ qiime diversity pcoa \
--output-dir pcoa_canberra_qiime2
```

To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample-id’s provided in the metadata file correspond to the sample-ids in the canberra distance_matrix.qza file:
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:

```
qiime emperor plot \
Expand Down Expand Up @@ -372,7 +379,7 @@ qiime diversity pcoa \
--output-dir pcoa_canberra_qiime2
```

To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample-id’s provided in the metadata file correspond to the sample-ids in the canberra distance_matrix.qza file:
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:

```
qiime emperor plot \
Expand Down Expand Up @@ -541,7 +548,7 @@ qiime diversity pcoa \
--output-dir pcoa_canberra_qiime2
```

To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample-id’s provided in the metadata file correspond to the sample-ids in the canberra distance_matrix.qza file:
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:

```
qiime emperor plot \
Expand Down Expand Up @@ -642,7 +649,7 @@ qiime diversity pcoa \
--output-dir pcoa_canberra_qiime2
```

To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample-id’s provided in the metadata file correspond to the sample-ids in the canberra distance_matrix.qza file:
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:

```
qiime emperor plot \
Expand Down
2 changes: 1 addition & 1 deletion examplefiles/metadata_long.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample-id filepath description age age_units sample_set fermented empo_1 empo_2 empo_3
sample_name filepath description age age_units sample_set fermented empo_1 empo_2 empo_3
G95952 G95952_BA6_01_31659.mzXML milk with yogurt culture 0 hrs A yes host associated animal animal secretion
G95982 G95982_repeat_RB8_01_31543.mzXML milk with yogurt culture 11 hrs A yes host associated animal animal secretion
G96107 G96107_repeat_RD4_01_31616.mzXML milk with yogurt culture 24 hrs A yes host associated animal animal secretion
Expand Down
2 changes: 1 addition & 1 deletion meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package:
name: q2-metabolomics
version: "0.0.3"
version: "0.0.5"

source:
path: .
Expand Down
1 change: 1 addition & 0 deletions q2_metabolomics/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@

from ._method import import_gnpsnetworkingclustering
from ._method import import_gnpsnetworkingclusteringtask
from ._method import import_gnpsnetworkingclusteringbuckettable
from ._method import import_mzmine2

__version__ = "0.0.1"
25 changes: 21 additions & 4 deletions q2_metabolomics/_method.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,19 @@ def import_gnpsnetworkingclusteringtask(manifest: str, taskid: str) -> biom.Tabl
return _create_table_from_task(taskid, sid_map)


def import_gnpsnetworkingclusteringbuckettable(manifest: str, buckettable: str) -> biom.Table:
sid_map = {}
with open(manifest) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
sid = row["sample_name"]
filepath = row["filepath"]
fileidentifier = os.path.basename(os.path.splitext(filepath)[0])
sid_map[fileidentifier] = sid

return _create_table_from_buckettable(buckettable, sid_map)


def _create_table_from_task(task_id, sid_map):
"""Pulling down BioM"""
url_to_biom = "http://gnps.ucsd.edu/ProteoSAFe/DownloadResultFile?task=%s&block=main&file=cluster_buckets/" % (
Expand All @@ -181,16 +194,20 @@ def _create_table_from_task(task_id, sid_map):
local_file.write(requests.get(url_to_biom).text)
local_file.close()

with open(f.name) as fh:
table = biom.Table.from_tsv(fh, None, None, None)

table.update_ids(sid_map, axis='sample', inplace=True)
table = _create_table_from_buckettable(f.name, sid_map)

# Cleanup Tempfile
os.unlink(f.name)

return table

def _create_table_from_buckettable(buckettable_path, sid_map):
with open(buckettable_path) as fh:
table = biom.Table.from_tsv(fh, None, None, None)

table.update_ids(sid_map, axis='sample', inplace=True)

return table

def import_mzmine2(manifest: str, quantificationtable: str) -> biom.Table:
"""Loading Manifest Mapping"""
Expand Down
22 changes: 19 additions & 3 deletions q2_metabolomics/plugin_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
input_descriptions={},
outputs=[('feature_table', FeatureTable[Frequency])],
parameter_descriptions={
'manifest': 'Manifest file for describing information about each file. Headers of sample-id and filepath',
'manifest': 'Manifest file for describing information about each file. Headers of sample_name and filepath',
'credentials': 'GNPS login credentials json'
},
output_descriptions={'feature_table': 'Resulting feature table'},
Expand All @@ -36,7 +36,7 @@
input_descriptions={},
outputs=[('feature_table', FeatureTable[Frequency])],
parameter_descriptions={
'manifest': 'Manifest file for describing information about each file. Headers of sample-id and filepath',
'manifest': 'Manifest file for describing information about each file. Headers of sample_name and filepath',
'taskid': 'GNPS Task ID'
},
output_descriptions={'feature_table': 'Resulting feature table'},
Expand All @@ -45,6 +45,22 @@
citations=[]
)

plugin.methods.register_function(
function=q2_metabolomics.import_gnpsnetworkingclusteringbuckettable,
inputs={},
parameters={'manifest': qiime2.plugin.Str, 'buckettable': qiime2.plugin.Str},
input_descriptions={},
outputs=[('feature_table', FeatureTable[Frequency])],
parameter_descriptions={
'manifest': 'Manifest file for describing information about each file. Headers of sample_name and filepath',
'buckettable': 'Path to Bucket Table from GNPS Molecular Networking Clustering'
},
output_descriptions={'feature_table': 'Resulting feature table'},
name='GNPS Metabolomics MS/MS Spectral Counts - Import Existing GNPS Bucket Table',
description=("Computes feature BioM for metabolomics by importing GNPS Molecular Networking Bucket Table"),
citations=[]
)


plugin.methods.register_function(
function=q2_metabolomics.import_mzmine2,
Expand All @@ -53,7 +69,7 @@
input_descriptions={},
outputs=[('feature_table', FeatureTable[Frequency])],
parameter_descriptions={
'manifest': 'Manifest file for describing information about each file. Headers of sample-id and filepath',
'manifest': 'Manifest file for describing information about each file. Headers of sample_name and filepath',
'quantificationtable': 'Quantification Table output from MZMine2'
},
output_descriptions={'feature_table': 'Resulting feature table'},
Expand Down
Loading

0 comments on commit 33d78ef

Please sign in to comment.