Most biological maps and catalogs allow comparisons between human and non-human model organisms and provide information about homologous genes or corresponding processes across different organisms. Model organisms such as yeast, fruit flies and mice are used in research as a proxy for understanding human biology because they have genes and biological processes in common with humans.
Genomics databases
- All of Us Research Program (NIH) aims to build a diverse health database to study the impact of biology, lifestyle and environment on health
- Genomics Health Futures Mission (Australia)
- Initiative on Rare and Undiagnosed Diseases (IRUD)—a Japan-based program aimed to improve data sharing in relation to clinical diagnostics with genetic analysis
Phenomics is the systematic study of measurable phenotypes or characteristics on a genome-wide scale. Ontologies are used for cataloging the biological roles of gene products using a common language. The characterization of the phenotype of organisms with gene mutations or gene knock outs is used to understand gene function. Both genotype and environment determine phenotype. Since not all genetic differences between individuals have an impact on phenotype, genotype-phenotype association projects aim to catalogue the genotypes that are associated with observable or measurable phenotypes such as disease or altered cell function.
Phenome-wide association studies (PheWAS) search for phenotypes associated with specific single-nucleotide variants (SNVs) across thousands of human phenotypes (the phenome). PheWAS are analogous to genome-wide association studies (GWAS), but the research strategy is inverted. GWAS begin with families or populations which have members affected or unaffected for a disease or trait and search for genetic variants that are associated with the disease or trait. PheWAS begins with a genetic variant (single or set of variants) and searches across a set of human phenotypes to find associated phenotypes.
- Chemical Phenomics Initiative - chemotype-phenotype data in zebrafish for drug discovery
- Consortium for Neuropsychiatric Phenomics (CNP) is a large study funded by NIH Roadmap Initiative intended to facilitate discovery of the genetic and environmental bases of variation in psychological and neural system phenotypes, understand the links between the human genome and complex psychological syndromes and promote the development of novel treatments for neuropsychiatric disorders.
- The Gene Ontology (GO) resource and GO Consortium - Computational models of biological systems from genes to molecular pathways to organism for the understanding of gene function
- Human Phenotype Ontology (HPO)
- The Knockout Mouse Project (KOMP), coordinated by the International Mouse Phenotyping Consortium (IMPC), began in 2006 with the aim of developing knockout mutations for every protein-coding gene in the mouse genome
- The Knockout Mouse Phenotyping Program (KOMP2) began in 2011 as the second phase of the large-scale mutagenesis effort, with an expanded focus on CRISPR technology, funded by the NIH Common Fund and Trans-NIH program and continuing to work with the IMPC.
- International Mouse Phenotyping Consortium (IMPC)
- Mouse Phenome Database
- Rat Phenome at the National Bio Resource Project (NBRP) in Japan
- Navigome is a collection of 465 phenotypes from different GWAS studies that allow browsing of pathways, gene and tissue analysis and interactive visualizations.
- Phenoscape
- Functional Annotation of ANimal Genomes (FAANG) project genotype-to-phenotype (G2P) consortium works to decipher the genotype-to-phenotype (G2P) link in farmed animals. Other project goals are to link G2P in diverse animal populations and model G2P at the cell, tissue and whole animal scale.
Proteomics is the study of the full protein complement of a cell, tissue or organism. Proteomics and protein databases can be used to map differences in protein composition and shape to health and disease.
- AlphaFold
- Human Proteome Folding Project
- UniProt
- Human Infectious Diseases HPP initiative (HID-HPP) through the Human Proteome Organization (HPO) is an initiative to promote collaboration between scientists working on proteomics studies related to infectious diseases caused by virus, bacteria, fungi and parasites.
- The Human Protein Atlas
Non-coding RNA (ncRNA) regulate gene expression through RNA processing and translation, protect genomes from foreign nucleic acids, function in DNA synthesis and genome rearrangement. Ribozymes are RNA-based enzymes.
- Animal-eRNAdb is a database of enhancer RNA (eRNA), a type of ncRNA transcribed from an active enhancer thought to play a role in gene regulation. This database of eRNA from various animals was established by Gong Lab, College of informatics, HZAU, China.
- RNAcentral is a database of ncRNA from a range or organisms coordinated by European Bioinformatics Institute (EMBL-EBI) and supported by the charitable foundation Wellcome.
- NONCODE is a non-coding RNA database (excluding tRNA and rRNA) by the China National Center for Bioinformation (CNCB)
- LncBook 2.0 integrates human long non-coding RNAs with multi-omics annotations (CNCB).
- LNCipedia is a public database for long non-coding RNA (lncRNA) sequence and annotation (Ghent University, Belgium).
- Long non-coding RNA Knowledgebase (lncRNAKB) is an integrated resource for exploring lncRNA biology in the context of tissue-specificity and disease association (NIH).
- GENCODE
- Comprehensive Human Expressed SequenceS (CHESS) includes protein-coding genes and lncRNA genes (Johns Hopkins University Center for Computational Biology)
- Broad Bioimage Benchmark Collection
- Broad Cancer Cell Line Encyclopedia
- Cell Atlas Initiative
- Cell Image Library
- Human Cell Atlas
- Human Cytome Project
- Mitotic Cell Atlas
- Single cell RNA sequencing map of embryogenesis in Xenopus
- Single cell RNA sequencing map of embryogenesis in zebrafish
- Worm Atlas (C. elegans)
- Blue Brain Project
- BrainBase is a curated knowledgebase of brain diseases, with a focus on glioma, that includes published articles, gene associations, drug-target associations, glioma-multi-omics molecular profiles and is part of the National Genomics Data Center, Chinese Academy of Sciences.
- BrainMaps
- The Animal Microbiome Database (AMDB) contains bacterial 16S rRNA gene profiles from various animals and was established by the Laboratory of Evolutionary Bioinformatics, Seoul National University,to better understand the relationship between gut microbiota and animal hosts.
- The Microsetta Initiative (Included American Gut Project and British Gut Project)
- The Allele-Specific DNA Methylation Database (ASMdb) is a database and web tool that displays DNA methylation level and differential DNA methylation in diverse organisms from humans to plants established by Huazhong Agricultural University.
- Functional Annotation of Animal Genomes project (FAANG) - genotype-to-phenotype
- AgBase - Resource for functional analysis of agricultural plant and animal gene products and Gene Ontology annotations
- 1000 Bull Genomes Project
- Pig Genome Database
- Chimpanzee Genome Project
- AgBase - Resource for functional analysis of agricultural plant and animal gene products and Gene Ontology annotations
- The Arabidopsis Information Resource (TAIR) - database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana
- AgBase - Resource for functional analysis of agricultural plant and animal gene products and Gene Ontology annotations
- Australian Plant Phenomics Facility
- The European Infrastructure for Multi-scale Plant Phenomics and Simulation (EMPHASIS)
- International Plant Phenotyping Network
- National Plant Phenomics Centre (IBERS Gogerddan, Wales, UK)
- PHENOME, the French plant phenomic Infrastructure
- Plant Ontology - structured vocabulary and database resource that links plant anatomy, morphology and growth and development to plant genomics data
- Mycocosm - the fungal genomic resource
- Candida Genome Database
- BacMap - zoomable and searchable chromosome maps for prokaryotic species (archaebacterial and eubacterial)
- ARTS (Antibiotic Resistant Target Seeker Version 2) is a genome mining tool for antibiotics with novel targets. The approach searches for antibiotic producers that would be predicted to have resistance against their own products, called target-directed or self resistance. For example, antibiotic producers may have a duplicated antibiotic-resistant homologue of an essential housekeeping gene. ARTS detects possible resistant housekeeping genes based on three criteria: duplication, localization within a biosynthetic gene cluster and evidence of Horizontal Gene Transfer. ARTS perform analyis of the entire kingdom of bacteria, metagenomic data, and the comparison of multiple genomes.
- Virus-Host DB
- Virus Pathogen Database
- Earth Microbiome Project
- Project Acari - microbiome of ticks
- The Barcode of Life Data System (BOLD) developed at the Centre for Biodiversity Genomics in Canada
- The Biogeographic Atlas of the Southern Ocean
- Earth BioGenome Project
- International Barcode of Life (iBOL), a research alliance that overseeing BARCODE 500K, BIOSCAN and the Planetary Biodiversity Mission (PBM)
- Map of Life (MOL)
- Ocean Biogeographic Information System (OBIS)
- COVID19db
- GENCODE is updating the annotation of human protein-coding genes that are linked with COVID-19 and SARS-CoV-2 infection
- Esc (Immune escape variants in SARS-CoV-2) is a manually curated compendium of genetic variants associated with immune escape supported by the Institute of Genomics and Integrative Biology (CSIR_IGIB), Delhi, India.
- EMBL-EBI and partners operate the COVID-19 Data Portal, which brings together relevant datasets submitted to EMBL-EBI and other major centres for biomedical data
- SCoV2-MD organizes atomistic simulations of the SARS-CoV-2 proteome using a database that includes simulations produced using molecular dynamics methods to investigate structure-dynamics-function relationships of viral proteins
- SCovid is a collection of single-cell datasets of COVID-19 across 10 human tissues paired with control datasets that includes cell types, stably expressed genes, significantly differentially expressed genes (DEGs) and functional analysis of DEGs (Harbin Medical University, China)
- T-CoV (T-cell COVID-19 Atlas) contains data on the affects of SARS-CoV-2 mutations on CD8 and CD4 T-cell epitopes and is based at HSE University, Moscow
- VarEPS (SARS-CoV-2 Variations Evaluation and Prewarning System) contains known and theoretical SARS-CoV-2 variants in relation to metrics such as antibody binding (National Microbiology Data Center, China)
- Illumina
- Y Combinator
- Polaris Partners
- Google Ventures
- Arch Venture Partners
- Lightspeed Venture Partners
- Sequoia Capital
- IndieBio
- Longitude Capital