Find attributes about genes and proteins for machine learning
The BD2K-LINCS DCIC created a resource that contain processed datasets ready for machine learning to learn new knowledge about genes and proteins. This resource is called the Harmonizome. The Harmonizome datasets are organized as large feature tables, where the genes are the rows and the attributes are the columns. Every attribute (column label) is associated with a gene set (rows in the column). For example, a machine learning expert can select any gene set from any dataset to represent classification labels, and then build a classifier to train and predict gene labels from the remaining datasets.
The Harmonizome has over one hundred preprocessed datasets ready for machine learning. These datasets are free and available for download.
For more information on the datasets that are available, and how these were processed, you can watch the three lectures the DCIC prepared for their course on Coursera: