Polus is an open source collaborative project to create an automated, reproducible, secure, and quantitative data analysis platform for researchers; designed from the ground up for ease of use to scientists, developers and administrators. This platform will cover every step of data processing from ingestion/upload of data, through analysis/exploration, to generation of graphs and methods for publications. It is built on modern web technologies and is scalable to meet the needs of modern research in both hardware (supports cutting edge hardware acceleration) and software (the latest in artificial intelligence, machine learning, and deep learning.) The Polus platform aims to be for data analysis what the Adobe cloud is for graphic design. Each application in Polus shares authentication and authorization, roles, data, metadata, storage, and computational infrastructure with every other application. Enabling full interoperability between all applications. However, each Polus application can also serve as a stand-alone application in isolation. Much like Adobe Photoshop can stand alone from Adobe Lightroom or Illustrator or all three can be chained together in a workflow.
The Polus team will accomplish this by focusing on four key areas of development:
- Standards creation and usage
- Interactive analysis and data exploration
- Production analysis pipeline execution at scale
- Traceability/reproducibility of data and methodologies
By focusing on these four key concepts and using only open source code, the Polus team enables researchers to truly understand and tailor their analysis to whatever their computational problem might be including: image analysis, molecular modeling and simulation, cheminformatics and synthesis, informatics/omics, data modeling and statistics, deep learning, visualization, or data quality control. The Polus platform will be the delivery mechanism by which generalizable computational tools can both be developed and, once developed, used by researchers to analyze their data and eventually shared with the wider scientific community.
Modern automated microscopes have the capability to acquire hundreds of thousands of images a day across thousands of samples. Additionally, imaging modalities are diverse; covering everything from two-dimensional single channel acquisitions to three-dimensional acquisitions tracked across time in dozens of wavelengths, to three-dimensional molecular/spectral maps in which every pixel of the image contains an entire spectra of information. Further, relevant resolutions and sample sizes can vary greatly and can span anything from relatively large regions of interest in which many fields of view need to be stitched together to view a single gigapixel (or larger) image that encompasses the entire sample, to 1000s of different samples in a single array in which each sample needs only 1-2 images.
Polus was the Greek Titan of Knowledge and Insight. As such, Polus represented rational intelligence, an inquisitive mind, resolve and foresight. Due to his inquisitive mind and desire to learn, he was also thought to have gained knowledge and understanding that enabled him to see beyond the obvious. He was also often thought of as the Titan of "Oracles" because he was able to understand things as they stood so well that he could make very accurate predictions of the future.
Our platform is designed to "see beyond what is obvious" in scientific data and from that to gain insight and understanding, enabling us to predict disease, therapy outcomes, or possible drugs to cure diseases.
Also, Polus was often depicted with an Owl symbol, and as it happens, an owl has been used as a logo to represent a past iteration of the imaging pipeline. Thus, referring to the project as Polus provides a historical link to that earlier version.
Regardless of the approach automated acquisitions can provide researchers with unprecedented levels of visual information about their samples that could be used to deconvolve relationships, discover underlying mechanisms of action, identify and/or classify structures/states, or understand fundamental properties of their samples that would not be possible using any other means. However, there is a need to assist imaging scientists with computational solutions that convert these raw images to calibrated images which are quickly and easily viewable. Once viewed interactive and traceable measurements are needed to analyze the images and then sophisticated modeling and statistical tools can be used to view the relationships and differences between samples from these measurements. Finally, a platform in which the output of these models and statistical tools can be quickly and easily graphed/viewed for use in publications in which all graphs and tables have a reproducible and traceable link back to the original source images is necessary to move science forward. In other words, a need for a solution to go from terabytes of raw images to traceable published graphs/tables in a single platform has been shown and Polus is being developed to address this need.
Provide an automated end-to-end data analysis solution that enables scientists to analyze large volumes of data reproducibly, accurately, and quickly.
- Standards creation and usage
- Interactive analysis and data exploration
- Production analysis pipeline execution at scale
- Traceability/reproducibility of data and methodologies
- Enable Scientists to:
- Form and assess hypotheses about samples from raw data and data/metadata derived from raw data.
- Simply and easily model information from raw data to create novel insights that lead to development of new therapies for diseases
- Automate
- Securely work on, share, and publish their data
Key capabilities of the desired end-to-end solution include (but are not limited to):
Automation of data analysis Advanced visualization Simple and reproducible quantification Secure and robust data analysis, sharing, and publishing
- For researchers who are collecting complex data (genomic, transcriptomic, metabolomic, imaging, molecular, etc.)
- Who want to understand and find cures for disease
- Who need a scalable data analysis pipeline to process and visualize that data
- Who require a reproducible, traceable, automated, and scalable data analysis
See also Potential Client Types of Polus
- Visual User: mostly biologists and clinicians
- Custom User: Engineers, computer scientists, data scientists, statisticians, etc. (anyone who can program)
- Macro User: users from either (1) or (2) above or of an intermediate skill level between the two.
- Developer Compute: A software developer working on novel computational solutions to data analysis
- Developer Framework: A software developer working on extending the base functionality/Interface/capabilities of the Polus Platform
- Network Administrator: Person deploying and maintaining data processing pipelines for their facility
- Scientific Administrator: Person overseeing the lab/scientists/microscopes using the data processing pipeline
- Nathan Hotaling - Senior Vice President of Data Science
- Sunny Yu - Director of Data Platforms and Software Integration
- Hythem Sidky - Senior Director Of Data Solutions
- Nick Schaub - Associate Director of Artificial Intelligence
For questions, comments, and discussion, you can reach out on image.sc under the polusai tag.
For bug reports, please open an issue on the relevant repository.