function 457
Several strategies as shown in Figure 3 are used for profiling the gut microbiome and 458
include: (1) targeted metagenomic sequencing, i.e., 16S rDNA amplicon or full-length 459
sequencing; (2) shotgun (deep or shallow) metagenomic sequencing; (3) shotgun 460
metatranscriptomic sequencing; (4) mass spectrometry-based metaproteomic analysis. Among 461
these high-throughput assays, targeted metagenomics – commonly referred to as 16S rDNA-462
based amplicon sequencing is a more affordable approach for gut microbiome analysis.
463
Amplicon sequence data has been generated for thousands of projects, but each project is 464
designed with little consideration to preference for selecting variable 16S rDNA region that are 465
commonly used i.e. V4, V1-V3, V3-V5 or V6-V9. Sequencing strategies are often influenced by 466
large studies such as Human Microbiome, Earth Microbiome and American Gut Projects since 467
validations have been conducted before the production phase of these studies. However, the 468
influence of amplicon length (after quality trimming), sequence orientation and 16S region on 469
taxonomic calls has not been benchmarked systematically. To ensure the same observations on 470
gut microbiome features are validated by previously published data, meta-analysis of gut 471
microbiome-associated studies might be performed via a closed-reference analysis strategy 472
allowing comparisons and concatenation of data (Wang et al., 2018), however, attention need to 473
be exercised when using such an approach: 1) quality filtering and trimming of amplicon 474
sequences should adopt expected error-based methods but not average quality-based filtering 475
(Edgar & Flyvbjerg, 2015); 2) for the sake of accuracy default similarity (97%) for clustering 476
amplicon sequences might be avoided since there is no strong evidence to support the even 477
distribution of 3% dissimilarity in nucleotides confined to the nine variable regions of the 16S 478
rRNA gene that are sequenced (Mysara et al., 2017; Nguyen, Warnow, Pop, & White, 2016;
479
Yang, Wang, & Qian, 2016), thus more stringent clustering or sequence denoising for meta-480
analysis may generate more accurate profiles; 3) methods of taxonomic classification for the 481
feature sequences or representative sequences of OTUs should be benchmarked in detail for 482
selecting the confidence threshold for controlling misclassification and overclassification (Edgar, 483
2018; Murali, Bhargava, & Wright, 2018); 4) appropriate choice of reference database is 484
important for accurate taxonomy, for example Greengenes and SILVA databases do not have 485
precisely curated databases because those collections are usually derived from GenBank and 486
European Nucleotide Archives where world-wide users can deposit sequence data (Edgar, 487
2018). An authoritative database from the training set of Ribosomal Database Project (RDP), 488
NCBI 16S rRNA RefSeq Targeted Loci Project or Genome Taxonomy Database (GTDB) is 489
commonly recommended but users should be cautious in relying exclusively on these 490
databases. Another strategy that is emerging is applying exact amplicon sequence variants 491
(ASVs) introduced by DADA2 for the meta-analysis (Callahan et al., 2016). Moreover, multiple 492
copies (ranging from 5 to 10) of prokaryotic ribosomal RNA operons in a single genome are 493
commonly found (Stoddard, Smith, Hein, Roller, & Schmidt, 2015); ideally, individual 494
microbiome OTUs and ASVs profiles should be corrected for 16S copy number so that this is 495
matched to the ribosomal RNA database. A major challenge in conducting this type of 496
bioinformatics correction is that short amplicon sequences of 16S rRNA gene do not offer 497
confident calls at the rank of species because those amplicons may be derived from either 498
known species or uncultured bacteria sharing the same sequence identity of the sequenced 499
region of 16S rRNA gene. Even with these considerations, microbiome profiling bias might still 500
be there and this merits future bioinformatics and statistical efforts (McLaren, Willis, & Callahan, 501
2019). Importantly, microbiome community profiles built on calculating relative abundances do 502
not reflect the biomass of the microbial consortia, thus total 16S copy numbers should be 503
assessed, i.e. real-time quantitative PCR with bacterial universal primers for the same DNA 504
extracts.
505
Since 16S rDNA amplicon sequencing does not offer high resolution to taxon, especially at 506
species rank, shotgun metagenomic sequencing becomes a useful strategy to provide potential 507
whole-genome information for precise taxonomic profiling of the gut microbiome. In general, two 508
types of analysis have been performed as illustrated in Figure 3B: 1) read-mapping based 509
analysis: filtered and trimmed sequence reads are mapped to the curated reference microbial 510
genomes or a set of marker genes per bacterial genomes by using sequence alignment or k-511
mers spectrum, this approach is adopted in many popular analysis packages – MetaPhlAn, 512
DIAMOND, Kraken and CLARK (McIntyre et al., 2017); 2) de novo assembly-based analysis:
513
individual or concatenated sequencing reads from all samples are first assembled prior to 514
taxonomic assignment and functional annotations. This type of analysis includes many 515
packages but the comprehensive Anvi’o platform is useful for downstream analysis (Eren et al., 516
2015). For de novo assembly, curations must be performed for misassemblies and misbinning 517
of metagenomic contigs; sometimes, authoritative genome databases are required to guide 518
those processes depending on the programs. Timely documentation from large and 519
comprehensive benchmarking studies on analytical strategies, performance of programs, and 520
databases have been generated (McIntyre et al., 2017; Sczyrba et al., 2017), and these 521
comparisons provide valuable information to users for selecting analysis strategies of shotgun 522
metagenomics.
523
Shotgun metagenomics is able to provide not only precise taxonomy but also the potential 524
functional dynamics of the gut microbiome. Although meta-transcriptomics is a preferable 525
approach providing better understanding on functional potential (Abu-Ali et al., 2018), unbiased 526
label-free metaproteome analysis of fecal specimens or biopsies also provides potential at 527
protein level to probe microbiome functional capacity (Zhang et al., 2016). A potential concern 528
using this latter technique is protein detection may be biased to abundant proteins as these are 529
more easily profiled; detection limits are associated with: 1) invalid methods for protein 530
extraction with high yields; 2) simple or lack of fractionation of tryptic peptides; 3) constraints 531
from database-driven peptide identification, while de novo peptide sequencing for tandem mass 532
spectra is still prone to errors (Muth & Renard, 2018). As such, paired metagenomics-533
assembled genomes together with public protein databases including UniProt and Unified 534
Human Gastrointestinal Protein (UHGP) (Almeida et al., 2019) are commonly recommended for 535
the database search strategy during metaproteome analysis (Heyer et al., 2017).
536
4. Potential strategies for overcoming gut dysbiosis caused by specific food ingredients