Analytical methods for profiling microbiome composition and inferring metabolic 456

function 457

Several strategies as shown in Figure 3 are used for profiling the gut microbiome and 458

include: (1) targeted metagenomic sequencing, i.e., 16S rDNA amplicon or full-length 459

sequencing; (2) shotgun (deep or shallow) metagenomic sequencing; (3) shotgun 460

metatranscriptomic sequencing; (4) mass spectrometry-based metaproteomic analysis. Among 461

these high-throughput assays, targeted metagenomics – commonly referred to as 16S rDNA-462

based amplicon sequencing is a more affordable approach for gut microbiome analysis.

463

Amplicon sequence data has been generated for thousands of projects, but each project is 464

designed with little consideration to preference for selecting variable 16S rDNA region that are 465

commonly used i.e. V4, V1-V3, V3-V5 or V6-V9. Sequencing strategies are often influenced by 466

large studies such as Human Microbiome, Earth Microbiome and American Gut Projects since 467

validations have been conducted before the production phase of these studies. However, the 468

influence of amplicon length (after quality trimming), sequence orientation and 16S region on 469

taxonomic calls has not been benchmarked systematically. To ensure the same observations on 470

gut microbiome features are validated by previously published data, meta-analysis of gut 471

microbiome-associated studies might be performed via a closed-reference analysis strategy 472

allowing comparisons and concatenation of data (Wang et al., 2018), however, attention need to 473

be exercised when using such an approach: 1) quality filtering and trimming of amplicon 474

sequences should adopt expected error-based methods but not average quality-based filtering 475

(Edgar & Flyvbjerg, 2015); 2) for the sake of accuracy default similarity (97%) for clustering 476

amplicon sequences might be avoided since there is no strong evidence to support the even 477

distribution of 3% dissimilarity in nucleotides confined to the nine variable regions of the 16S 478

rRNA gene that are sequenced (Mysara et al., 2017; Nguyen, Warnow, Pop, & White, 2016;

479

Yang, Wang, & Qian, 2016), thus more stringent clustering or sequence denoising for meta-480

analysis may generate more accurate profiles; 3) methods of taxonomic classification for the 481

feature sequences or representative sequences of OTUs should be benchmarked in detail for 482

selecting the confidence threshold for controlling misclassification and overclassification (Edgar, 483

2018; Murali, Bhargava, & Wright, 2018); 4) appropriate choice of reference database is 484

important for accurate taxonomy, for example Greengenes and SILVA databases do not have 485

precisely curated databases because those collections are usually derived from GenBank and 486

European Nucleotide Archives where world-wide users can deposit sequence data (Edgar, 487

2018). An authoritative database from the training set of Ribosomal Database Project (RDP), 488

NCBI 16S rRNA RefSeq Targeted Loci Project or Genome Taxonomy Database (GTDB) is 489

commonly recommended but users should be cautious in relying exclusively on these 490

databases. Another strategy that is emerging is applying exact amplicon sequence variants 491

(ASVs) introduced by DADA2 for the meta-analysis (Callahan et al., 2016). Moreover, multiple 492

copies (ranging from 5 to 10) of prokaryotic ribosomal RNA operons in a single genome are 493

commonly found (Stoddard, Smith, Hein, Roller, & Schmidt, 2015); ideally, individual 494

microbiome OTUs and ASVs profiles should be corrected for 16S copy number so that this is 495

matched to the ribosomal RNA database. A major challenge in conducting this type of 496

bioinformatics correction is that short amplicon sequences of 16S rRNA gene do not offer 497

confident calls at the rank of species because those amplicons may be derived from either 498

known species or uncultured bacteria sharing the same sequence identity of the sequenced 499

region of 16S rRNA gene. Even with these considerations, microbiome profiling bias might still 500

be there and this merits future bioinformatics and statistical efforts (McLaren, Willis, & Callahan, 501

2019). Importantly, microbiome community profiles built on calculating relative abundances do 502

not reflect the biomass of the microbial consortia, thus total 16S copy numbers should be 503

assessed, i.e. real-time quantitative PCR with bacterial universal primers for the same DNA 504

extracts.

505

Since 16S rDNA amplicon sequencing does not offer high resolution to taxon, especially at 506

species rank, shotgun metagenomic sequencing becomes a useful strategy to provide potential 507

whole-genome information for precise taxonomic profiling of the gut microbiome. In general, two 508

types of analysis have been performed as illustrated in Figure 3B: 1) read-mapping based 509

analysis: filtered and trimmed sequence reads are mapped to the curated reference microbial 510

genomes or a set of marker genes per bacterial genomes by using sequence alignment or k-511

mers spectrum, this approach is adopted in many popular analysis packages – MetaPhlAn, 512

DIAMOND, Kraken and CLARK (McIntyre et al., 2017); 2) de novo assembly-based analysis:

513

individual or concatenated sequencing reads from all samples are first assembled prior to 514

taxonomic assignment and functional annotations. This type of analysis includes many 515

packages but the comprehensive Anvi’o platform is useful for downstream analysis (Eren et al., 516

2015). For de novo assembly, curations must be performed for misassemblies and misbinning 517

of metagenomic contigs; sometimes, authoritative genome databases are required to guide 518

those processes depending on the programs. Timely documentation from large and 519

comprehensive benchmarking studies on analytical strategies, performance of programs, and 520

databases have been generated (McIntyre et al., 2017; Sczyrba et al., 2017), and these 521

comparisons provide valuable information to users for selecting analysis strategies of shotgun 522

metagenomics.

523

Shotgun metagenomics is able to provide not only precise taxonomy but also the potential 524

functional dynamics of the gut microbiome. Although meta-transcriptomics is a preferable 525

approach providing better understanding on functional potential (Abu-Ali et al., 2018), unbiased 526

label-free metaproteome analysis of fecal specimens or biopsies also provides potential at 527

protein level to probe microbiome functional capacity (Zhang et al., 2016). A potential concern 528

using this latter technique is protein detection may be biased to abundant proteins as these are 529

more easily profiled; detection limits are associated with: 1) invalid methods for protein 530

extraction with high yields; 2) simple or lack of fractionation of tryptic peptides; 3) constraints 531

from database-driven peptide identification, while de novo peptide sequencing for tandem mass 532

spectra is still prone to errors (Muth & Renard, 2018). As such, paired metagenomics-533

assembled genomes together with public protein databases including UniProt and Unified 534

Human Gastrointestinal Protein (UHGP) (Almeida et al., 2019) are commonly recommended for 535

the database search strategy during metaproteome analysis (Heyer et al., 2017).

536

4. Potential strategies for overcoming gut dysbiosis caused by specific food ingredients

In document Food ingredients in human health: Ecological and metabolic perspectives implicating gut microbiota function (sivua 23-26)