Genetic diagnosis of autoinflammatory disease patients using clinical exome sequencing.

Autoinflammatory diseases comprise a wide range of syndromes caused by dysregulation of the innate immune response. They are difficult to diagnose due to their phenotypic heterogeneity and variable expressivity. Thus, the genetic origin of the disease remains undetermined for an important proportion of patients. We aim to identify causal genetic variants in patients with suspected autoinflammatory disease and to test the advantages and limitations of the clinical exome gene panels for molecular diagnosis. Twenty-two unrelated patients with clinical features of autoinflammatory diseases were analyzed using clinical exome sequencing (∼4800 genes), followed by bioinformatic analyses to detect likely pathogenic variants. By integrating genetic and clinical information, we found a likely causative heterozygous genetic variant in NFKBIA (p.D31N) in a North-African patient with a clinical picture resembling the deficiency of interleukin-1 receptor antagonist, and a heterozygous variant in DNASE2 (p.G322D) in a Spanish patient with a suspected lupus-like monogenic disorder. We also found variants likely to increase the susceptibility to autoinflammatory diseases in three additional Spanish patients: one with an initial diagnosis of juvenile idiopathic arthritis who carries two heterozygous UNC13D variants (p.R727Q and p.A59T), and two with early-onset inflammatory bowel disease harbouring NOD2 variants (p.L221R and p.A728V respectively). Our results show a similar proportion of molecular diagnosis to other studies using whole-exome or targeted resequencing in primary immunodeficiencies. Thus, despite its main limitation of not including all candidate genes, clinical exome targeted sequencing can be an appropriate approach to detect likely causative variants in autoinflammatory diseases.


Introduction
Autoinflammatory diseases (AID) constitute a family of immune disorders characterized by recurrent episodes of sterile inflammation that may appear without a known trigger. AID are considered primary immunodeficiency diseases (PID), a wide classification that includes different subgroups, all of them caused by dysregulations of specific pathways of the immune system. Although reliable epidemiological data is only available for some of the subgroups (i.e. inflammosomopathies), AID are considered rare diseases (Wekell et al., 2017). Clinically, they are characterized by recurrent episodes of fever, systemic inflammation and other symptoms such as skin rashes, abdominal pain, chest pain, lymphadenopathy or arthritis (Wekell et al., 2016). The severity of the disease is variable among patients, supporting the observations of clinical heterogeneity that makes a definitive diagnosis difficult, especially in early-onset patients without a positive family history (Almeida de Jesus and Goldbach-Mansky, 2013).
Over the past decades, several efforts were focused on better https://doi.org/10.1016/j.ejmg.2020.103920 Received 4 December 2019; Received in revised form 13 February 2020; Accepted 21 March 2020 definition and classification of the wide variety of phenotypes and clinical manifestations included in AID group (Ciccarelli et al., 2014;Wekell et al., 2016). Many molecular mechanisms underlying these diseases have been discovered, most of them affecting genes related to the inflammasome (TNFRSF1A, NLRP3 and MEFV among others (Hoffman and Broderick, 2016)). All these advances represented a significative improvement in diagnosis rates. However, there is still a significant proportion of patients with clinical features suggestive of AID that remain without a clear molecular diagnosis (Martorana et al., 2017;Notarangelo and Sorensen, 2008). Current massive sequencing technologies have raised the possibility of analysing the whole genome or whole exome of patients. Alternatively, in targeted sequencing studies the analysis is restricted to a set of previously defined candidate genes, thus reducing economic costs and data analysis efforts (Seleman et al., 2017). In recent years, targeted sequencing panels have become a useful tool to identify the molecular causes of autoinflammatory diseases (Karacan et al., 2019;Omoyinmi et al., 2017;Papa et al., 2019;Yao et al., 2016) and other primary immunodeficiencies (reviewed in Yska et al., 2019).
In this project we assess the suitability and performance of a gene panel approach, the TruSight One Sequencing Panel (Illumina), to detect the genetic origin in presumably monogenic AID. We first analyzed the proportion of candidate genes for autoinflammatory disorders included and properly sequenced in the panel. We discussed their applicability and its benefits or limitations compared to other approaches, such as whole genome or whole exome sequencing. Second, we performed targeted sequencing in 22 patients, and exhaustively analyzed the data to detect potentially causative variants, considering different genetic models for the disease and the available clinical information for each patient.

Study cohort
This study includes 22 unrelated patients (named A1-A22) with severe symptoms and clinical suspicion of AID. Written informed consent to participate in the study in accordance with the Declaration of Helsinki was obtained before the extraction of blood samples. The protocol was approved by the Comitè Ètic d'Investigació Clínica -Parc de Salut Mar (Barcelona). Genomic DNA was extracted from whole blood using AllPrep DNA/RNA kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. DNA quality and concentrations were measured using Nanodrop spectrophotometer (Thermo Fischer Scientific, Waltham, Massachusetts, USA), and Qubit (Thermo Fisher Scientific, Waltham, Massachusetts, USA) or PicoGreen (Invitrogen, Carlsbad, California, USA) fluorimetric quantification methods.

DNA library preparation and sequencing
Targeted sequencing capture was performed using the TruSight One Sequencing Panel (Illumina, San Diego, California, USA) which includes 4813 genes related to disease. After the enrichment, quality control of the pooled library was performed with the Bioanalyzer System (Agilent, Santa Clara, California, USA) and quantified with a specific qPCR for library adaptors (Kappa Biosystems-Roche, Basel, Switzerland). Sequencing was performed on a NextSeq 500 System (Illumina, San Diego, California, USA) in a 2 × 75 paired-end cycles high-output run.

Data analysis
Bcl output files were converted to Fastq and demultiplexed with bcl2fastq (Illumina, San Diego, California, USA). Mapping and alignment against the human reference genome (GRCg38/hg38) was performed with BWA. Picard (http://www.picard.source-forge.net) was used to reorder and sort SAM files before marking PCR duplicates and replace read groups. Following the Genome Analysis Toolkit (GATK (McKenna et al., 2010)) best practices we performed base quality score recalibration and local indel realignment before the variant calling. The resulting VCF file was annotated using ANNOVAR (Wang et al., 2010).

Genetic variant filtering and prioritizing
A first quality filter discarded variants with DP < 5 and QUAL < 30 (depth and quality given by GATK). We also excluded variants present in more than ten patients to avoid possible sequencing artifacts. We removed indels up to 10 base pairs next to another indel due to its high probability of being produced by mapping errors. Intronic variants not involved in splicing procedures were also discarded. We used the bed file provided by the manufacturer to exclude off-target regions and a public dataset to detect genes which are known to be enriched in sequencing errors (Fuentes Fajardo et al., 2012).
As shown in Fig. 1 we applied different filters considering the different groups of variants and gene properties. For loss of function variants, we applied a filter of MAF < 1% in gnomAD (Lek et al., 2016) database. For these variants, we also considered the constraint metrics of gnomAD such as the precalculated probability of being loss of function intolerant (pLI) for each gene. We retrieved information about the function of the gene and its implication in human disease using the Gene Ontology (GO (Ashburner et al., 2000; The Gene Ontology Consortium, 2019)). Missense variants were filtered by allele frequency (MAF < 1%) and GO terms, as well as conservation and deleteriousness scores such as GERP (Chong et al., 2015;Cooper et al., 2005) and CADD (Kircher et al., 2014), retaining variants with values higher than 2 and 10, respectively. Genes with missense variants were additionally annotated with their haploinsufficiency scores (Huang et al., 2010;Steinberg et al., 2015), intolerance to genetic variation (RVIS (Petrovski et al., 2015)) and essentiality scores (Khurana et al., 2013) to detect single heterozygous variants which caused a gain of function effect. This mechanism has been widely described in AID and it has been related to the variable clinical expression of the phenotype (Martorana et al., 2017). Finally, other in silico predictors of the protein effect were also used as a proxy of pathogenicity (PolyPhen2 (Adzhubei et al., 2010), Provean (Choi et al., 2012) and SIFT (Kumar et al., 2009)). Synonymous variants with MAF < 0.1% were considered as potential candidate variants. We annotated it using TraP score (Gelfman et al., 2017) and filtered according to the recommended threshold (Benign if TraP < 0.446) and MAF < 0.1%.
Copy number variation was studied using XHMM Fromer and Purcell, 2014) and filtered afterwards according to the best practices described by the developers. We also used the dataset provided by Audano et al. (2019) to filter out variants described at high frequencies in the population. Finally, possible somatic variants were called using VarScan2 (Koboldt et al., 2012) and filtered considering its variant allele frequency (VAF) and overall quality of the calling (pvalue given by VarScan2).
As candidate genes for AID and PID, we used those in the list provided by the International Union of Immunology Societies (IUIS) (Picard et al., 2018). For each patient, we performed an extensive clinical review of the phenotype and family history. Finally, candidate variants were visually inspected in the respective alignment using Integrative Genome Viewer (IGV (Robinson et al., 2011)). We filtered out variants only supported with poor quality reads or low mapping quality.

Validation of somatic variants
We performed ADS (Becker et al., 1996) to validate the presence of somatic variation and to calculate their frequencies. We included two positive controls to validate the precision of the technique (CP1 and CP2, previously reported in Mensa-Vilaró et al., 2019).
Nine independent amplicons using three different sets of primers were obtained for each variant using PCR. Sequencing of amplicons was performed on a MiSeq equipment (Illumina, San Diego, California, USA) in a 2 × 250 paired-end cycles Nano flow cell run. Fastq files were mapped to the human reference genome (GRCh38/hg38) using BWA and each of the resulting BAM files was inspected using Pysam Python's package (Li et al., 2009). Statistical analysis was performed with RStudio (R Studio Team, 2016) in order to distinguish sequencing errors from true somatic variation.

AID-associated genes in the TruSight One Sequencing Panel
The TruSight One Sequencing Panel, also referred to as clinical exome gene panel, consists of 4813 genes associated with human disease. With the aim of testing its applicability and performance to study AID, we analyzed the ability to capture and produce reliable sequencing data for candidate genes. We first assessed the proportion of genes related to autoinflammatory disorders included in this panel. For that, we used 35 genes described to cause AID by the IUIS (Picard et al., 2018) plus 31 additional genes selected from the literature, for a total of 66 genes (Autoinflammatory candidate genes, Acg). We also analyzed 298 genes associated with other immune diseases by the IUIS (Other PID candidate genes, OPcg). 68% of the Acg (45/66) and 77% of the OPcg (231/298) are included in the TruSight One Panel (Fig. 2, Table S1).
The mean coverage per sample was 108x, with a maximum of 149x (sample A13) and a minimum of 70x (sample A10). We performed two different analyses in order to assess the sequencing performance for the genes included in the panel. First, we analyzed the depth of coverage (average number of reads for targeted positions). For this, we measured the mean coverage per gene (targeted sequence) and we classified it into three categories: high depth (> 50x), medium depth (20-50x) and low depth (< 20x) (Table S2). 95.56%, 95.67% and 95.88% of the genes were sequenced at high depth for Acg, OPcg, and the rest of the genes of the panel, respectively (Fig. 2). Two genes were sequenced at low depth among the Acg (PSMB8, PSMB9) and three on the OPcg list (CFB, TAP1 and TAP2).
Next, we analyzed the horizontal coverage. We calculated the mean number of base pairs not covered in each gene for at least 80% of the patients and we classified it as complete (all bases sequenced), good (1-50 unsequenced bases) and poor (> 50 unsequenced bases) ( Table  S3). The horizontal coverage was complete or good for most of the Acg (95.56%, 43/45), as well as for OPcg (98.26%, 227/231). The genes with more than 50 unsequenced bases were P2RX7 and SH3BP2 of Acg, and C4B, CFB, TAP1 and TBX1 of OPcg. For the rest of the panel, 95.57% of the genes were complete (4337 out of 4538) (Fig. 2).

Genetic variants in patients with suspected AID
We explored different possible scenarios to identify potential causal genetic variants in the 22 enrolled patients. For a dominant model, we first analyzed the presence of loss of function variants which are, a priori predicted to have an important effect on the protein function and are therefore good candidates to be at the origin of genetic diseases. We included in this group nonsense variants (stop-codon), frameshift indels, and splice site disrupting variants (MacArthur et al., 2012;MacArthur and Tyler-Smith, 2010). We filtered out the variants with MAF > 1% in gnomAD (Lek et al., 2016) and we next analyzed the variants considering the gnomAD precomputed pLI. We also considered the GO terms description and kept those variants in genes related to the immune system or the inflammatory response. After this, we generated a list of 18 candidate LoF variants for Acg and OPcg, all of them heterozygous (Table S4). Any of the variants was considered as a potential candidate after the clinical review of the patient's phenotype.
We expanded the analysis to missense genetic variants, with a list of 116 mutations with MAF < 1% (Table S5). We used deleteriousness (CADD > 10) and conservation values (GERP > 2) to filter missense variants. We also considered variant features such as in silico analyses to prioritize all detected variants, and the inheritance pattern of each gene described by IUIS (Picard et al., 2018) and OMIM (http://www.omim. org/). We used essentiality and haploinsufficiency scores to better detect single heterozygous variants causing a gain of function effect. We propose two candidate heterozygous missense variants in NFKBIA (patient A11) and DNASE2 (patient A21). Both variants are novel and show high CADD and GERP scores and detrimental effects were predicted by most in silico tools (Table 1).
We next considered an autosomal recessive model including both homozygous and compound heterozygous genotypes. No candidate homozygous variant was found in any candidate gene. For the compound heterozygotes analysis, we considered together missense and other functional variants. Patient A10 has two variants located in UNC13D, a gene associated with hemophagocytic lymphohistiocytosis. Sieni et al. (2011) described a patient with these same variants and suggested that they could be considered as susceptibility factors. Due to the lack of parental samples we were not able to test if the variants are inherited from the same progenitor. We also described several variants located at NOD2 in patients A19 and A20, a gene which plays an important role in autoinflammatory disorders (Negroni et al., 2018). A19 and A20 show a clinical picture concordant with alterations in this gene (Table 2). Based on pathogenicity scores and literature search, most of the NOD2 variants described in patients A19 and A20 were considered to be most probably benign (Hugot et al., 2001), except the two missense variants shown in Table 1.
We also analyzed the presence of synonymous changes in candidate genes. Synonymous variants are traditionally excluded from disease genetics analysis, and are discarded when filtering for functional variants. However, synonymous variants have been reported to be related to disease in some cases, especially if they have extremely low allele frequency in databases and provoke the appearance of novel, cryptic splicing sites (Simons et al., 2015). As an example, a novel synonymous pathogenic variant has been reported as causal for IL-7R deficiency because of its effect on splicing (Gallego-Bustos et al., 2016). We filtered variants using a very restrictive allele frequency of MAF < 0.1% or non-described at gnomAD. We used the TraP Score (Gelfman et al., 2017) to assess its pathogenicity. All the values were below the established threshold of 0.446 for pathogenicity. The synonymous variant with the highest TraP score (0.389, probably benign) was found in NFKBIA in patient A11. As described above, an interesting missense variant was found in this same patient in the same gene.
Finally, we analyzed copy number variation using XHMM software Fromer and Purcell, 2014). Massive parallel sequencing technologies are limited to detect structural variants due to the short length of reads. Therefore, many of our calls were discarded due to low-quality values. We removed CNV's present in more than 10% of the patients due to the lack of similarities in their clinical phenotypes. We also compared our candidates with a recent structural variation reference dataset (Audano et al., 2019) to discard the variants which have been already described at high frequencies in the population. After this, no candidate variants were found in Acg or OPcg.

Somatic variation
Recent studies corroborated that gene mosaicism is an important mechanism in the pathogenesis of AID (Mensa-Vilaró et al., 2019). Thus, we considered the possibility of the disease being originated by a somatic variant. While germline variants are expected to be present in approximately 50% of the reads, somatic variants will typically depart from these expectations and will show lower variant allele frequencies (VAF). Because of the low number of reads supporting them, these variants are commonly discarded by variant calling software. With the aim of detecting possible somatic variants, we use VarScan2 which has shown good performance on these kind of analyses and allows reduction in the allelic imbalance thresholds (Suzuki et al., 2017).
We generated a table with 12 variants located in Acg or OPcg (Table  S6), two of them present in two of the patients. We performed ADS (see Methods) to validate these variants, and to calculate their VAF. We included two previously reported somatic variants as positive controls (CP1 and CP2) (Mensa-Vilaró et al., 2019). We validated the presence of somatic variants in CP1 and CP2 with similar VAF as the previously  described. We also compared the real VAFs to the ones given informatically by GATK Haplotype Caller and VarScan2 (Fig. S1, panel A). The candidate somatic variants in enrolled patients presented VAFs between 1.25 and 9.42%. We normalized the values by comparing the frequencies of the alternative alleles to the other two non-reference alleles. For most of the samples, the alternative allele frequency was indistinguishable from the other non-reference alleles, suggesting a false-positive. Only in patient A7, who carries a missense NLRP3 variant (p.G566D), we detected a slight increase in the number of reads supporting the alternative allele when compared to the other two non-reference alleles. Due to the concordance between the patient's phenotype and clinical pictures caused by NLRP3 described alterations, we performed ADS of this variant in sample A7 together with the previous CP1 and CP2 samples plus two additional DNA samples from healthy individuals (HI1 and HI2). We merged the values of each triplicate (see Methods), and the two samples in each control group, classifying the results in three categories: A7, CP and HI. After normalization, we performed a t-test and calculated the p-values (Fig. S1, panel B). The percentage of reads supporting the alternative allele was 0.069%, 0.073% and 0.082% for A7, CP and HI respectively. The difference between groups was not significant (p-value > 0.05). This indicates that the slight increase of alternative reads in sample A7 that we saw in the first batch of ADS experiments was not caused by the presence of a somatic variant.

Discussion
We performed targeted sequencing of~4800 genes of medical interest in 22 samples from patients with suspected AID, mostly with early-onset. We propose a candidate genetic variant in two of the patients, based on stringent filtering of DNA variants and the patients' clinical phenotypes. For three additional patients, we also propose candidate variants which can be acting as susceptibility factors of the disease.
Patient A11 was a Spanish male that suffered since birth from a pustular and systemic inflammatory disease resembling the deficiency of interleukin-1 receptor antagonist (DIRA), he unfortunately died soon after birth because of multiorgan failure. Our analysis revealed the patient carried a novel missense heterozygous variant at NKFBIA (p.D31N). NKFBIA is an inhibitor of the NKFB family which acts as a regulatory element of the inflammatory response (Hayden and Ghosh, 2008). When proinflammatory stimuli are detected NFKBIA is phosphorylated and degraded, leading to the nuclear translocation of NFKB (Ali et al., 2013). DiDonato et al. (1996) described that phosphorylation occurs at p.S32 and p.S36, however, they did not find evidence of a functional role for p.D31 (DiDonato et al., 1996). Mutations in NFKBIA are associated with ectodermal dysplasia in immunodeficiency (Courtois et al., 2003), which correlates with our patient's phenotype. The patient presented with sterile periostitis, pustular skin, rash, and soft tissue swelling. We suspect that the variant p.D31N could be contributing to the disease in patient A11. This variant fits to some of the standard criteria (Richards et al., 2015) to be classified as pathogenic: absence in databases; presence in a critical functional domain with reported pathogenic variants; in a gene where missense variation is a common mechanism of disease; a significant lack of missense and LoF variants; deleterious effect according to in silico predictors; and specific gene phenotype. We reported the variant to ClinVar (SCV001147010), unfortunately, we were not able to perform segregation studies to validate the possible de novo origin due to the non availability of additional patient and parental samples. Of interest, this patient also harbours a synonymous variant in homozygosity with a TraP score close to the pathogenicity threshold, a high CADD value of 17.04 and located in a promoter region (Ensembl ID: ENSR00000067656).
Patient A21 is a Spanish male with early-onset lupus-like symptomatology. A novel missense heterozygous variant at DNASE2 was found in the patient (p.G322D), this variant may be pathogenic (Richards Table 2 Clinical information of patients harboring variants likely to be causing the autoinflammatory disorder. Childhood Childhood Infancy  et al., 2015) due to its absence in databases, its in silico predictors, conservation values (Table 1), and reported pathogenic variants in the gene (Rodero et al., 2017). DNASE2 has recently been associated to inflammatory diseases (Rodero et al., 2017), and mutations in DNASE1 are well known to be causing monogenic forms of systemic lupus erythematosus (Martínez Valle et al., 2008;Yasutomo et al., 2001), with a clinical phenotype partially overlapping with that described in this patient. In our patient, the disease seems to follow an autosomal dominant pattern of inheritance, as some close relatives are also affected (father, uncle and grandmother). This diverges from the described autosomal recessive cases. We performed Sanger sequencing in the patient (Fig. S2) and we propose the variant p.G322D as a good candidate to underly the disease in patient A21. We reported the variant to ClinVar (SCV001147011).

Affected family members
In addition to the analysis to detect disease-causing variants, we also considered the possibility of mutations contributing to the phenotype although not responsible for the whole clinical picture. Patient A10 is a 12 year-old girl from North Africa with systemic-onset juvenile idiopathic arthritis (JIA). She suffered episodes of macrophage activation syndrome (MAS), which are associated with UNC13D mutations. We found two variants at UNC13D which have been already reported by Sieni et al. (2011) in the analysis of haemophagocytic lymphohistiocytosis. Both variants are heterozygous and missense (p.R727Q and p.A59T), with low population allele frequencies (0.05% and 1.6% respectively) and high predicted impact values (CADD > 20). Although the patient's main clinical diagnosis is not MAS, her phenotype resembles the ones described in Sieni's work. This led us to suspect that maybe part of the phenotype is caused by these missense variants. Of note, the presence of two or more causal genes for different Mendelian disorders has been reported in 4.9% of the 2076 molecularly diagnosed patients in an exhaustive analysis of more than 7000 cases (Posey et al., 2016). The case reported here would fit the situation where the two disease phenotypes in the same individual are partially overlapping.
We also described many variants in NOD2 gene in our cohort. NOD2 has a key role in immune function and inflammatory pathways. Numerous variants have been described in the gene in recent years, mainly associated with Blau syndrome and Crohn disease (Negroni et al., 2018). Of interest, two of the patients with NOD2 variants (A19 and A20) have early-onset inflammatory bowel disease, a phenotype which has already been associated to NOD2 gene (Cho and Abraham, 2007;Negroni et al., 2018). Each of them carries one missense variant of uncertain significance, p.L221R on patient A19 and p.A728V on A20. Both variants have been reported by Hugot et al. (2001) in inflammatory bowel disease patients accompanied by other coding variants. We hypothesize a possible additive effect which would contribute to the final phenotype.
Of interest, in this work, we show that beyond the detection of germline genetic variants, targeted sequencing also allows the detection of somatic variation if relatively high sequencing depths are achieved. Thus, while no important differences are seen in the success rate of whole exome sequencing versus candidate genes approaches (see below), reducing the number of targeted genes allows to increase the depth of coverage at still reasonable costs, which is a critical aspect for somatic variant detection. It can be of special interest in the case of PID in general and AID in particular, where somatic variants will probably be present in the blood or even saliva, the two most common starting materials for DNA extractions. We recommend using software allowing the reduction of the thresholds on the number of reads supporting the alternative allele (i.e. VarScan2) followed by ADS to validate candidate variants. ADS experiments increase the coverage of the targeted region from the range of hundreds to thousands, allowing us to discard falsepositives.
The proportion of patients with a proposed molecular diagnosis in this work is similar to other previous analyses in PID, both from whole genome and exome sequencing ( (see Introduction). However, the number of diagnosed patients is barely comparable across studies because of the use of different filters and stringency criteria to consider a genetic variant as causal. This would ultimately require functional validation experiments, which are mostly performed in a reduced subset of the candidate variants. Also, the proportion of monogenic cases in a cohort depends on other features, such as disease severity, the age of onset and the disease prevalence (de Valles-Ibáñez et al., 2018). Thus, the different inclusion criteria of patients, can also partially explain the differences in the success rates in detecting causal genes across studies. In addition to the clinical exome approach being limited to a particular gene set, other technical constraints refer to the non-inclusion of regulatory regions and limited power for structural variant detection. We note some of these factors are also common to whole exome or even whole genome strategies. Beyond technical and analytical reasons, we should always consider the possibility that the disease follows non-monogenic models, such as a digenic pattern of inheritance (Ameratunga et al., 2018;de Valles-Ibáñez et al., 2018;Hoyos-Bachiloglu et al., 2017) or gene mosaicism (Mensa-Vilaró et al., 2019).
Overall, we propose that targeted sequencing approaches of candidate genes, such as clinical exome, can be an appropriate tool for the genetic diagnoses of AID. The main limitation of targeted approaches is that the analysis is restricted to particular regions. However, this limitation to specific regions is also seen in several whole genome and exome data studies, where in spite of generating more complete data the analysis is often finally restricted to a set of candidate genes, at the cost of reducing sequencing depth (Meyts et al., 2016). In this sense, the design of specific gene panels for patients with similar disorders can represent an efficient strategy for diagnosis, which may even be able to capture the presence of somatic mosaicism which is relevant in diseases such as AID. Of importance, sequencing coverage and depth analyses to detect genes that have not been properly covered should be performed to avoid false-negative results. Also, the creation of an adequate candidate genes list is critical for this approach and should be based on a strong knowledge of the disease pathophysiology and the use of available tools to describe functionally related genes. However, generating a comprehensive but relatively short list of candidate genes list can be almost impossible for situations like this study, with an important level of clinical heterogeneity among patients even if their syndromes are included in the same group of disorders. In such situations, a more inclusive approach as the analysis of the~4800 genes more probably related to genetic disease, can be an efficient strategy for maximizing the number of diagnosed patients while reducing the costs of sequencing and analysis in whole genome or exome approaches.

Declaration of competing interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.