Skip to content

Code of feature selection and machine learning modeling of a serum proteomics study

License

Notifications You must be signed in to change notification settings

PHOENIXcenter/SerumStudyOfLiverDiseases

Repository files navigation

Serum Study of Liver Diseases

Link to repository: github.com/PHOENIXcenter/SerumStudyOfLiverDiseases

  • Summary of scripts used for feature selection and machine learning in the project
  • MS-based proteome data are from two independent cohort:
Cohort HC (Healthy Control) CHB (Chronic Hepatitis B) LC (Liver Cirrhosis) HCC (Hepatocellular Carcinoma)
Discovery cohort (n=125) 21 29 29 46
Validation cohort (n=75) 15 15 15 30

Contents

Script Description
Feature_Importance.py Contains feature selection part. we used the LASSO model to executing feature selection, randomly select 80% of the samples each time to build a total of 500 sub-models, and chose proteins selected in at least 10% of the sub-models as candidate protein features.
Classifier_Model.py Contains machine learning modeling and performance evaluation part. We separately evaluated the performance of 6 classical machine learning models (SVM, Ridge, KNN, Naïve Bayes, Decision Tree and Random Forest) in each pairwise comparisons using corresponding candidate markers in both 5-fold discovery cohort and independent validation cohort. The main evaluation index are F1-score and AUROC.
Model_Visualization.py Contains visualization of machine learning models, including confusion matrix and ROC curve of both 5-fold discovery cohort and validation cohort.

Data Availability

The complete serum proteomics data and clinical data are available from the authors upon reasonable request due to the need to maintain patient confidentiality.

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the iProX partner repository with the dataset identifier PXD034201.

Workflow of Machine Learning

Workflow of machine learning modeling

About

Code of feature selection and machine learning modeling of a serum proteomics study

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages