Determining the phylogenetic composition of a sample is key to metagenomic data analysis and we have compared different methods that deduce it directly from metagenomics data. We assessed the reliability of current taxonomic annotations of reference genomes, found it to be insufficiently accurate at the species level, and developed a method that resolves this issue. Next, we addressed the question of what fraction of species that compose our gut microbiomes are currently captured when relying on publicly available reference-based methods. To do this, we developed and recommend a method that facilitates identifying species that are not represented by current reference resources. Using this approach the sampling bias present in genome repositories is minimized and a global definition of a microbial species is achieved. We compared methods that use metagenomics data to derive phylogenetic compositions and suggest that reliable phylogenetic marker genes are needed, including for those species that are currently not represented by reference genomes and that such marker genes need to be quickly and reliably identifiable given the exponentially increasing amount of reference genomes and metagenomic data sets. This SOP aims to recommend methodologies for phylogenetic assignments of metagenomic samples at the level of prokaryotic species. Furthermore, these assignments are possible without the existence of a reference genome for these species.

