A complete list of the 41 phenotypes tested and their ICC values can be found in Additional file 1: Tables S4 and S11.Twin modeling approaches are used to estimate the amount of variance attributable to additive genetics , common environment or dominance , and unique environment. An ACE or ADE model was constructed for each of 946 traits including alpha diversity, principal coordinates of β-diversity of taxonomic groups, and individual OTUs. A complete list of the A, C/D, and E values for each of these phenotypes can be found in Additional file 1: Table S5. A power analysis shows that our sample is well powered to model continuous traits but is under powered for categorical traits . Traits that were not categorized as continuous were treated as categorical traits . Therefore, while still of interest, the categorical traits should be viewed with lower confidence . In the twin models both C and D cannot be modeled at the same time since each captures the same variance,dry rack cannabis but the genetic contribution can be compared between phenotypes modeled with ACE or ADE models. Of the 946 traits 55% were modeled as ACE and 44% ADE. Averaging heritability estimates for traits within each phenotype category described above a trend that PCos of measurements have the highest mean heritability estimates emerged for either the full sample or to just twin pairs that are cohabitating . The most heritable were OTU4483015 that corresponds to an unnamed species of Granulicatella and PCo 2 for Bray-Curtis .
To better understand which taxa were driving this PCo a QIIME biplot analysis identified the genus Streptococcus as the most abundant taxon on the first 3 principal coordinates from Bray-Curtis . Repeating the ACE models excluding twin pairs who reported that they had moved out after age 18 did not greatly alter the heritability estimates or other components of the model . The unique environment accounted for most of the variation of the traits tested in both the full and cohabitation sample . Little change in the common environment was observed between the full and cohabitation sample analyses . We compared phenotypes deemed to be heritable in our study with phenotypes seen to be heritable in 5 studies of gut and 1 in dental plaque. We found that 14 of the 44 traits were mentioned with heritability estimates of at least 1% in one or another study, though none showed high statistical significance . This is consistent with the possibility that genes that may drive the heritability in the salivary microbiome may also have more general influences in other human niches.It is assumed that host genes interacting with the oral microbiome are responsible for the observed heritability. The best way to identify them is by the analysis of an association between genetic variation and traits. The power to detect this is a function of the number of individuals, the number of tests and the number and types of SNPs available. The greatest power to uncover association given a fixed sample size is obtained by analyzing a limited number of phenotypes based on prior information rather than repeatedly testing multiple hypotheses on the same data. To limit hypotheses to test we focused on the traits found most heritable in twin studies. Traits found to be most heritable are expected to produce the best results in a genome-wide association study.
DNA was previously prepared from saliva and blood of 1480 individuals unrelated to the twins and to each other. Human DNA from this sample was subjected to Affymetrix Chip-based genotype analysis that resulted in 696,388 validated human SNP genotypes per individual. The age of subjects ranged from 11 to 33 years and 29% were female. Ancestry was assigned by weighting a subset of the genotyped SNPs against the 1000 genomes dataset and assigning individuals to ancestry group using principal coordinate analysis plots. The genotyped SNPs were then quality filtered and submitted to the Michigan Imputation Server for phasing and imputation . After quality filtering this produced 6,862,363 European and 8,172,048 American Admixed imputed variants respectively that were used in all subsequent analyses. Imputed SNPs from two different randomly selected chromosomal areas in 68 individuals were resequenced with Sanger sequencing to validate imputation. We found that 65/68 imputed calls validated completely with 3 apparently incorrectly imputed . We conclude that imputation provides significantly greater resolution to SNP-based maps at little cost to accuracy. The salivary microbiome of the 1480 individuals was characterized by 16S RNA sequencing identifying 2679 OTUs, where again as in the twin study, the most prevalent phyla were Firmicutes , Proteobacteria , Bacteriodetes , Actinobacteria , and Fusobacteria . Filtering by prevalence and abundance as described above produced a total of 931 OTUs used for our studies. The SNP-based heritability of microbiome phenotypes in the unrelated population was assessed using Genome Complex Trait Analysis that estimates the amount of phenotypic variance that can be explained by SNP-based composite genetic variance. To avoid false positives, the genetic relationship matrix was limited to subjects that were estimated to have IBD < 0.025. The first 10 ancestry principal components from LD-pruned SNPs were included to control for population stratification .
Given the relatively small sample size, single trait heritability estimates were not evaluated but rather gross trends were observed across all continuous traits. A positive correlation was observed between the heritability estimates from AC/DE twin models and the European GCTA analyses with a disattenuated correlation of 0.831 .However, thesetraits were not observed to be heritable in the twin models . The small sample size was not expected to result in significant GCTA P values although it has been noted that the meaning of such P values is limited but even in small samples observable trends can be meaningful. Nevertheless, it is striking that both twin studies and GCTA on separate samples show heritability across the same continuous traits . This is consistent with the expectation that genome sequence variation is a basis of observed heritability.We ranked the continuous traits based on their heritability and performed a genomewide association of the top six with the Efficient and Parallelizable Association Container Toolbox. This would be expected to reduce the loss of power due to multiple testing of hundreds of phenotypes. The family Carnobacteriacea was excluded from the GWAS analyses since it was highly correlated with the genus Granulicatella and the latter has a more refined taxonomic resolution. It is well established that continuous traits afford greater power in both twin studies and in GWAS. Therefore, although some categorical phenotypes showed high twin heritability , for GWAS we only studied continuous traits. The analyses were all controlled for age, sex, and sequencing run among other covariates . Analysis was done independently with individuals from the two major different ancestry groups of the unrelated sample, European and Admixture. Due to the limited size of the admixture sample, only the European sample is discussed and the admixture was only considered for the meta-GWAS discussed below. To control for population stratification a kinship matrix created from all the chromosomes and the first ten principal components from the LD-pruned SNPs were included as covariates . To control for the fact that 6 traits were tested, the genome wide significance level was lowered to 8.33e-09 . Using this threshold, we found that the genus Granulicatella was significantly associated with the SNP chr7:110,659,581 within an intron of the IMMP2L gene on chromosome 7. This gene is known to be involved in mitochondrial protein trafficking. The regional Manhattan Plots in Fig. 4bshow that the peak locus includes SNPs of decreasing r2 values around the peak SNP lending greater confidence to the association. Without a replication sample this result is provisional but potentially interesting. Using PLINK 1.9,roll bench which takes categorical imputed genotypes rather than the probabilistic dosage calls produced by imputation as input, produced results consistent with this association showing the association is independent of underlying computational method.
A comparison of the 100 SNPs with lowest P values in each of the six phenotypes examined in the European sample revealed that 7 SNPs were held in common between at least two of the phenotypes. Bray Curtis PCo2, Unweighted UniFrac PCo2, and Weighted PCo2, all β- diversity measures, were most often shared . After the initial analyses of the 6 most heritable traits, a GWAS was completed in the remaining 64 continuous traits in the European sample.Each consistently identified the same SNP at chr7:110,659,581 significantly associated with the trait along with nearby SNPs in high LD associated as well . No additional significant SNPs were identified consistent with the hypothesis that stratification methodology had little effect on identifying the top SNPs and that we were not “over filtering” with rigorous kinship controls. For completeness, we then carried out a GWAS analyses for the remaining 64 continuous microbial phenotypes using the EPACTS kinship only analyses adjusting significance for the additional multiple testing and found no SNPs to be significantly associated. This is perhaps not surprising given the relatively small sample size .The size of the ADM sample made it unlikely to produce statistically significant results. To glean useful information from it we combined it with the EUR data described using a meta-analysis approach that can effectively deal with population issues inherent in mixing samples of different populations. METAL is such a meta-analysis package that takes as input individual SNP P values and the direction of their effects weighted by the sample size to arrive at composite P values. The test statistics were also corrected for population stratification . The METAL analysis identified the same suggestive significant SNP on chromosome 7 that was associated with Granulicatella abundance in the EUR GWAS . However, due to the small size of the ADM sample, this SNP did not survive quality filtering in the METAL analysis and so was not a factor in the METAL analysis outcome. Analyses of Unweighted Principal Coordinate 3 yielded a SNP on chromosome 12 that reached genome wide significance in the same direction for the combined sample, though it was not robust to multiple testing correction . Again, the regional Manhattan Plots in Fig. 5c show the peak locus includes SNPs of decreasing r2 around the peak SNP consistent with the association. The minor allele C, was shown to be consistent with lower PCo3 z-scored values . The most promising single SNP association occurred with the phenotype defined as the abundance of the genus Granulicatella. We reanalyzed the association data with the gene-based tool Knowledge-based mining system for Genome-Wide Genetic studies involved in protein processing associated with mitochondrial import and a non-coding antisense RNA INHBA- AS1 . INHBA is thought to be important to tooth development, which could have potential interesting implications to the oral microbiome. The meta-GWAS results on the PCo3 of Unweighted UniFrac most highly associated region was the gene LIN7A on chromosome 12 .Tobacco use correlates with changes in the oral microbiome and the abundance of specific taxa. It was possible that tobacco or other factors influenced our observation of genetic association. For example, Streptococcus abundance, a highly heritable phenotype, has also been shown to change in smokers. In addition other substances could potentially change the oral microbiome. Among these alcohol and marijuana, though these effects have yet to be determined. However, marijuana use is correlated with poororal health, which is often indicative of changes in the oral microbiota. We had available the self-reported tobacco, alcohol and marijuana use in 92% of our subjects for the previous six months. We therefore repeated the analyses using the three substances as covariates . As seen in Additional file 2: Figures. S15 and S16, controlling for tobacco/alcohol/marijuana use had negligible impact on the top hit on chromosome 7 for the genus Granulicatella . For the 6 highly heritable continuous traits that were analyzed, both with and without substance use covariates, results appear to be consistent with and without substance .We have shown that microbe abundance and some aspects of the microbial population structure are influenced by heritable traits in saliva. We have ranked the “most heritable” traits using ACE/ADE modeling and GCTA-based SNP heritability and carried out an unbiased GWAS on the 6 most heritable traits.