LoxTnSeq: Transposon mutagenesis coupled with ultra-sequencing to study large random genome reductions

Rational engineering in synthetic biology requires preliminary knowledge of which genomic regions are dispensable. Typically, these efforts are guided by transposon mutagenesis studies, coupled to ultra-sequencing (TnSeq) which determine single gene essentiality. However, epistatic interactions can rapidly alter these profiles post deletion, leading to the redundancy of these maps. Here, we present LoxTnSeq, a new methodology to generate and catalogue libraries of genome reduction mutants. LoxTnSeq combines random integration of Lox sites by transposon mutagenesis, and the generation of mutants via Cre recombinase, with ultra-sequencing. When LoxTnSeq was applied to the genome reduced bacterium Mycoplasma pneumoniae, we obtained a mutant pool containing 285 unique deletions. These deletions spanned from >50 bp to 28Kb, which represents 21% of the total genome. LoxTnSeq also highlighted large regions of non-essential genes that could be removed simultaneously, and other similar regions that could not, providing a guide for future genome reductions.


Introduction
One of the core principles behind synthetic biology is the rational engineering of an organism to elicit a modification in phenotype orientated for a specific application (D'Halluin and Ruiter, 2013;Esvelt and Wang, 2013;Mol et al., 2018). The introduction of new genes, allowing for the generation of new proteins within a system, is well documented and has myriad applications. Famous examples include modifying bacteria to allow us to study their functions in greater detail, such as the production fluorescent proteins (Prasher et al., 1992), or genetically modified bacteria capable of producing molecules for human use, such as insulin (Goeddel et al., 1979).
Live biotherapeutics have the potential to become key players in healthcare over the coming years. Engineered bacteria can be used as drug delivery systems, acting as a chassis from which therapeutic platforms can be plugged into to activate new functions (Ausländer et al., 2012;Chi et al., 2019;Claesen and Fischbach, 2015;Hörner et al., 2012;Vickers et al., 2010). A chassis will need to display growth characteristics and phenotypes that fulfil biosafety requirements, yet which may be foreign or counterproductive to wild-type cells. For example, a limited ability to proliferate in situ, or to evade the immune system might be never selected for in the organism's natural niche, but are features that might be of interest for a chassis strain. To illicit this change in phenotype, allowing the creation of an optimal chassis for a specific application, large-scale changes in genotype will have to occur. This will include both the removal of unwanted or unnecessary genomic regions, and the addition of new genes and functions (Folcher and Fussenegger, 2012;Ruder et al., 2011;Sung et al., 2016).
Fortunately, these two processes can work synergistically, with large scale genome reductions showing that the removal of superfluous genes can both increase protein production and develop beneficial characteristics for a cell. Sequential genome reduction has been an engineering tool in Bacillus subtilis, creating a species with a boosted protein production phenotype by systematically removing genes that hinder protein production, or divert energy and resources to less functionally useful applications (Ara et al., 2007).
Similarly, the removal of prophage elements in Pseudomonas putida resulted in a strain that demonstrated markedly higher tolerance to DNA damage (Martínez-García et al., 2015).
As more and more advanced engineering tools become available to researchers, the scope for genome engineering has similarly increased in scale (Annaluru et al., 2015;Cameron et al., 2014;Hsu et al., 2014). One of the grand long-term goals in systems and synthetic biology is the generation of a minimal chassis cell (Chi et al., 2019;Sung et al., 2016). While the definition of a true 'minimal cell' is almost impossible to define Glass et al., 2017;Koonin, 2000), a general consensus has emerged around a cell with a reduced genome, capable of completing a specific task with as few superfluous functions as possible (Hutchison et al., 2016;Juhas et al., 2011;Zhang, 2010).
While naturally occurring 'minimal' bacteria do exist to some degree (Fadiel et al., 2007;Fraser et al., 1995;van Ham et al., 2003), with genomes as small as the one of Mycoplasma genitalium coding for only 470 genes (Himmelreich et al., 1997), the average gene complement for a bacterial cell is roughly 5000 proteins, though this can vary by two orders of magnitude between the extremes (Land et al., 2015). Comparative genomics studies, along with functional considerations give a hypothetical minimal genome in the region of 200-350 genes (Breuer et al., 2019;Dewall and Cheng, 2011;Gil et al., 2004;Glass et al., 2006;Koonin, 2000), so engineering a bacteria into a 'minimal chassis' requires large levels of genome reduction. On top of this, there is a further layer of small proteins and non-coding RNAs that are often overlooked, all of which have their own essentialities and interactions with the rest of the genome and cell (Lluch-Senar et al., 2015).
However, our understanding of epistatic networks, and the complex web of interactions between gene circuits is far from complete (Otwinowski et al., 2018;Sailer and Harms, 2017;Weinreich et al., 2013). This has big implications for large-scale genome reduction projects, as knowledge gleaned from previous studies identifying non dispensable or essential (E) and dispensable or non-essential (NE) regions of DNA can become obsolete as soon as alterations are made to the genome. A good example of this can be found in the creation of JCVI-Syn2.0 by the team at the J. Craig Venter Institute. After the creation of their landmark JCVI-Syn1.0 strain (Gibson et al., 2010), random transposon mutagenesis was performed on the organism to identify the NE genes. The genome was then divided into 8 sections, and each section had their NE genes removed independently, while the other 7 sections were kept intact. Despite removing only NE genes, only 1/8 configurations resulted in a viable cell (Hutchison et al., 2016). This is also exemplified by studying gene essentiality and metabolism in M. pneumoniae and Mycoplasma agalactiae. It was shown that in linear metabolic pathways producing an E metabolite, all genes were essential. However, when the E metabolite could be produced by two pathways often both genes were classified as fitness (F) genes (Montero-Blay et al., 2020). Fitness genes, also known as quasi-essential genes (Glass et al., 2006) are nominally NE. However, while their loss is survivable by the bacterium, it imparts a severe growth defect. This is a key limitation in many genome reduction studies, as rationally designed reduction pathways often rely on data generated before genetic manipulation occurs. As such, any a priori assumptions about gene essentiality can be potentially redundant after any reductions have occurred. Screens by transposon mutagenesis after every cycle of reductions can be applied (Hutchison et al., 2016) but it is both time and labour intensive, and relies on the basis that genes act alone, not part of a larger network. However, the mutation or loss of certain genes has been shown to modulate the essentiality of others, known as "bypass of essentiality". In the yeast Schizosaccharomyces pombe, 27% of E genes on chromosome II-L could be rendered NE when a different gene is deleted, mutated or overexpressed (Li et al., 2019). This in turn demonstrates that reducing genomes based on a priori assumptions may not lead to the most optimal designs, as our knowledge of both how gene networks interact with each other on an epistatic level, and which genes are truly E or NE under any given configuration is still far from complete. Therefore, here we propose a new methodology, LoxTnSeq, to study the concept of genome reduction in an unbiased manner. By combining a randomised genome deletion protocol with DNA-ultra sequencing, we can identify the large putative reductions that are possible within a genome. This protocol allowed us to delete large sections of DNA without biasing the results by any potentially misguided a priori assumptions.
We have applied the technique in a genome reduced bacterium Mycoplasma pneumoniae, considered a model of minimal cell. However, this protocol could be applied to different bacterial systems in order to obtain different bacterial chassis, and thus to develop different applications in the synthetic biology field.

Results
Obtaining a high resolution library of Lox mutants Three different vectors pMTnLox66Cm, pMTnTcLox71 and pMTnCreGm were obtained to generate different libraries of transposon mutants. The first library of transposon mutants was obtained by transforming the M. pneumoniae M129 strain with the pMTnLox66Cm vector. This vector inserted a lox66 site randomly in the genome. After selection with chloramphenicol, we obtained the first mutant pool. To properly assess the coverage of the first transformation, pMTnLox66Cm transposon insertion sites were identified by HITS (Gawronski et al., 2009). Figure 1 shows the insertion pattern of the pMTnLox66Cm transposon.
The sequencing revealed 201891 unique insertions sites, which across the 816Kb genome leads to an insertion every ≈4 bases on average, similar to what has previously been described (Lluch-Senar et al., 2015).

Creation of a pool of genome reduced mutants
This first pool of mutants was then transformed with the pMTnLox71Tc vector. After applying HITS to this library the number of unique insertions for the second transposon was 90530, suggesting that the pool of mutants was exhibiting some epistatic effects in regard to changes in essential regions due to previous transposon insertions. However, while there are less unique inserts in the second round, there is still a coverage of 1 insertion every 9 bases on average, indicating a very large pool of potential deletions. In this way, we generated a population carrying two lox sites randomly distributed across the whole genome.
Of note, transposon insertions take place randomly not only in terms of the chromosome position, but also in respect to their orientation. It is known that lox sites orientation is responsible from the action mode of Cre, catalysing the excision of the DNA flanked by lox sites when they are in the same orientation, or the inversion of the region if lox sites are placed in opposite directions, as shown in Figure 2. Therefore, a population of cells carrying pMTnLox66Cm and pMTnLox71Tc insertions would undergo either genome reduction or inversion depending on the particular clone analysed when subjected to the action of Cre recombinase.
Cells containing both lox sites were then transformed with the pMTnCreGm transposon, without lox sites and containing the Cre recombinase and gentamicin resistance. As M. pneumoniae cannot support replicating plasmids, introducing the Cre via transposon ensured constitutive expression, and thus its selective effects were not lost after cell divisions. In previous work we have demonstrated that the action of the Cre recombinase on a single active lox site in M. pneumoniae produces a lethal effect on par with a double stranded break in the DNA (Shaw et al., under review). This was corroborated here, as surviving colonies were isolated and grown in media containing either gentamicin or chloramphenicol and tetracycline to assay which cells contained deletions and which cells contained inversions. 100/100 colonies picked grew in media containing gentamicin, indicating the Cre/gentamicin transposon was present, yet 0/100 colonies grew in media with chloramphenicol or tetracycline, indicating 100% of the colonies contained a reduced genome and could no longer express the resistance genes on the lox transposons and that the inversions were fully removed, as shown in Figure 2.

Identification of random deletions
Genomic DNA from the gentamycin-resistant pool of cells was sequenced via the circularisation protocol described in Methods. A total of 1291712 reads were recovered. Due to the random nature of the circularisation protocol, not all reads could reach both inverted repeats of the transposons and give accurate insertion points for each transposon.
Therefore, to allow for as accurate mapping as possible for those reads that do not include both insertion points, the genome was split into 50 bp bins, and reads were grouped into these bins. The reads were then filtered by putative deletions that contained a known E gene (1365, 83%), and those that did not (285, 17%). We found a background of deletions affecting E genes with few reads (41335 reads, 3% of total reads), which range in size from <50bp to spanning over half the entire genome (See Figure 3), which probably are an artefact resulting from the circularization protocol for deep sequencing (see Methods). Those deletions affecting genomic regions with only NE genes were less in number, span smaller regions of the genome, with the longest continuously NE region being ≈30Kb. However, although they also could also contain circularisation artefacts, they accounted for 1250337 reads (97% of total reads, see Figure 3).
Using the read length as the threshold parameter, we performed a Receiver Operating Characteristic (ROC) curve approach to define high-confidence deletions ( Figure 3B). This methodology allows the definition of a read count threshold maximizing the True Positive Rate (percentage of actual deletions properly detected) against the False Positive Rate (percentage of artefactual deletions wrongly detected as positive). Due to the fact that most of the positive and negative deletions were represented by very few reads, we observed that no discrimination could be performed without previous filtering based on read count. Thus, we iterated the process along with different pre-filtering and defined as best conditions to filter deletions with read count >900 that returned 8 deletions with a TPR >75% and no false positives ( Figure 3C). This set was classified as the 'gold standard' of deletions, as they had the highest level of reliability. When excluding all deletions that contained an essential gene, 285 unique deletions were identified and mapped to the M129 genome, detailed in Figure 4 below. We considered actual or 'positive' those deletions which strictly covered NE regions, and artefactual or 'negative' when they included E genes.
While the number of deletions within the gold standard is low, representing just 8 deletions (see Suppl. Table 1 and Figure 4), they are all deletions spanning multiple genes, all of which are clearly well tolerated by the cell as they were able to proliferate within the dataset.
However, as the data in Figure 3 shows, the vast majority of the reads generated (97%) indicated deleted regions that did not contain an essential gene, and the remaining 3% that did show an even distribution across the genome (see Suppl. Figure 1). Therefore, applying a slightly less stringent test for fidelity, and simply allowing all the deletions that do not contain the removal of a known E gene, we see a much larger coverage of deletions across the genome (see Figure 4).
Of the regions deleted, the largest was 28.7Kb, the smallest <50bp. 147 genes were deleted across the pool, accounting for 171.2Kb (21% of the genome), with the vast majority of the functions unknown. Of the genes deleted, only 29 (19.7%) had an ascribed name and function, 139 were annotated as NE, with the remaining 8 classified as F genes, according to the most recent essentiality data (Lluch-Senar et al., 2015). The full list of deletions can be found in Suppl. Table 1, and the genes deleted in Suppl. Table 2. Of the 259 genes that are classified as non-essential in M. pneumoniae (Lluch-Senar et al., 2015), we deleted 56%, with a mean deletion size of 7750 bp and median of 4750 bp, indicating the majority of genes deleted were removed as part of a larger region, not as single knockouts.
As validations, 4 regions were chosen from the list of putative deletions, labelled A, B, C, and D (see Figure 4, yellow bars), with various characteristics. Region A contained seven non-essential genes (mpn096 to mpn102), was ≈10Kb in size, was the most highly represented deletion within the pool accounting for 964628 reads and the only validation within the gold standard. Region B represented a smaller region, containing four nonessential genes (mpn397 to mpn400), a size of ≈5Kb and covered by 593 reads. Region C represented one of the largest available deletions, and contained 19 non-essential genes and 1 fitness gene (mpn493 to mpn512), with a deleted area of ≈25Kb covered by 193 reads. Region D contained 6 NE genes (mpn368 to mpn373) within 7.9Kbs, but was only represented in the population by a single read. Figure 5 shows clearly that the putative deletion in all four regions were represented in the pool of deletions, but not found in the WT cells. There are also multiple deletions visible in the deletion amplifications in conditions A, B and D, which were expected from the sequencing data. The most prominent deletion band in the three conditions was isolated and sequenced, and each showed the requisite genomic regions intersected by the lox72 site within the transposon inverted repeats.
As the distribution of read counts was not uniform among the dataset, nor was the distribution of deletions uniform across the genome. There were clear hotspots present where multiple different deletions were found across the population, as shown by Figure 6.
The highlighted regions in Figure 6 show two of the strongest deletion hotspots, located at 120Kb and 610Kb regions, respectively. These regions show multiple overlapping unique deletions with slightly different lox insertions, across various sizes and configurations. This indicates that these regions are amenable to multiple different deletions, and there is little epistatic interaction between the genes present.
Contrary to this, Figure 4 shows a large NE region centred on the 200Kb region of the chromosome. This region contains 13 NE genes (mpn141 to mpn153), yet only 4 unique deletions were found within it, all of which were variations of a deletion from mpn146 to mpn152. Despite the majority of the genes within this region having no known function, essential or otherwise, deletions within this region appear to be incompatible with cellular survival. The only known functions are linked to cytoadherence, specifically P1 adhesins (Nakane et al., 2011;Xiao et al., 2015). The canonical P1 adhesin (mpn141) was not deleted, nor were any of the other major adhesin genes hmw1 (mpn447), hmw2 (mpn310) and hmw3 (mpn452).
However, 15/22 adherence proteins were deleted across the population, so why this section is more essential than the others is unclear.
Looking at the functions of the deleted genes, they were grouped by COG category (Tatusov et al., 2000). The most commonly deleted functional category was M, genes involved in the composition and biogenesis of the cell membrane, accounting for 39% of deleted genes.
Following this was the genes of unknown function (COG category S), with 18%. In total, deleted genes involved in metabolism (COG categories E, F, G, I and P) composed 15% of all deleted genes, information storage & processing genes (COG categories K and L) composed 3%, genes of unknown function 18% and genes involved cellular processes and signalling (COG categories D, M, N, O, T, U and V) 63% (see Suppl. Table 3 for full breakdown).
Looking at the functions of the deleted genes, they were grouped by COG category (Tatusov et al., 2000). The most commonly deleted functional category was M, genes involved in the composition and biogenesis of the cell membrane, accounting for 39% of deleted genes.
Following this was the genes of unknown function (COG category S), with 18%. In total, deleted genes involved in metabolism (COG categories E, F, G, I and P) composed 15% of all deleted genes, information storage & processing genes (COG categories K and L) composed 3%, genes of unknown function 18% and genes involved cellular processes and signalling (COG categories D, M, N, O, T, U and V) 63% (see Suppl. Table 3 for full breakdown).
According to the database of E genes of 47 bacterial species generated by Shaw et al., (under revision), only 7 of the genes deleted were found to be E in any other species, and 98 of the 147 genes deleted had no homologs in the other species analysed. Of the deleted genes, the most highly conserved was mpn397 (guanosine-3'-5'-bis(diphosphate) 3'pyrophosphohydrolase spoT) which was had a direct homolog in 46/47 species and E in 26% of cases, and the gene that was the most highly E in the database was mpn324 (Ribonucleoside-diphosphate reductase subunit alpha nrdE) which had a direct homolog in 36/47 species, and was essential in 54% of cases.

Discussion
The Cre/lox system has been used as a deletion system on many occasions, due to its ability to act in both prokaryotic and eukaryotic cells. In addition, the usage of mutant lox sites to facilitate deletions that result in an inactive lox has also been demonstrated in bacteria, such as being used to knock out single genes in series (Pan et al., 2011), or to knock out large but targeted genome region using either targetrons (Cerisy et al., 2019) or via recombineering (Xin et al., 2018).
There have also been previous studies using random integration of lox sites to create deletions within a bacterial genome. Multiple random deletions were observed in Corynebacterium glutamicum using a similar method, with loxP sites contained within transposons and randomly integrated into the genome and activated via a Cre suicide vector, ranging from 400 bp to 158 Kb in size (Tsuge et al., 2007). However, the system was not self-selective, with the authors commenting that only 1.5% of final colonies contained deletions. The rest of the colonies contained inversions, and only 42 unique deletions were characterised. The authors also note that of the 42 deletion strains they recovered, only 2 had growth rates stronger than the WT strain, and many showed severe fitness defects. It is reasonable to assume that there may have been other deletion strains that could not compete against the large number of cells that had inversions, and therefore increased fitness due to no gene loss, and thus were lost from the pool.
By ensuring that our system leads to full selection against inversions between the lox sites, we can attempt to prevent this issue of competition against quasi-WT cells, and thus potentially retain as many different deletions as possible. By looking at them read numbers of each unique deletion, we can also get an estimate of how many cells were in the population, and thus an estimation of the growth rate compared to the other clones.
This study also highlighted the utility of using the Cre recombinase as its own self-selective marker. The inversion between a left-mutant and right-mutant lox sites necessitated the creation of a loxP site (Ghosh and Van Duyne, 2002;Van Duyne, 2015), and thus we propose the action of the Cre alone was enough to cause this self-selection and retention on only the cells that contained a deletions resulting in an inactivated lox72. We have previously shown that a lethal phenotype is expressed in M. pneumoniae when the Cre recombinase acts upon a lone active lox site (Shaw et al., Under Revision), and this worked perfectly as a counter-selective method here. The one caveat to this method being that 50% of potential deletions were removed due to the formation of the loxP instead of the lox72. However, looking at the overall distribution of deletions in Figure 6, it is clear that the vast majority of putatively NE regions were deleted somewhere within our pool. Therefore, the high transposons density appears to be able to counter-balance this loss in efficiency.
Due to this high transposon density in the transformations, we achieved a very high coverage of deletions across the M. pneumoniae genome, resulting in the deletion of 21% of the genome across the pool, with deletions ranging from under 50bp to 28Kb. However, the distribution of these deletions is not uniform, with the six most represented deletions accounting for over 99% of the dataset. Whether this is a true reflection of the ratio of deletion mutants, or an unforeseen by-product of the sequencing and analysis pipeline is unclear. Despite this, we were able to validate deletions with low read numbers. Regions B and C consisted of 0.047% and 0.015% of the total reads respectively and were easily identified, and region D was isolated via PCR and sequenced, despite accounting for only a single read. This indicates that the full list of 285 deletions is likely reliable. It also indicates that there was probably a bias within the library preparation for the sequencing protocol that artificially inflated the most common deletions, thus skewing the final deletion ratio.
The high levels of variation within the larger regions also indicates the pool is robust and contains multiple valid deletions, as it shows that the variation of insertion sites for the original lox site transposons is as high as we expected. The concentration of deletions in hotspots is not due to a bottleneck caused by poor transformation efficiency in either of the lox insertion stages, as the variation in the hotspots shows multiple integrations, with a vast range in the size of the deletions across the general region. If the hotspot was caused by the fact that only a small number of transposons were present in one of stages, the vast majority of the deletions would share a common end or starting point, which is not what we observe.
Instead, we are probably seeing those regions whose loss imparts the lowest reduction in cellular fitness. Looking at the region between bases 529000 and 630000 (Indicated by the blue bars in Figure 4), we see a very high density of deletions. This region is almost entirely populated by NE genes (24 NE genes, 3 F genes), of which 17 have no clearly defined function (see suppl. Table 2, mpn490 to mpn513).
Due to the uncertainty inherent to the data generated from the sequencing protocol, i.e. the majority of reads not containing both inverted repeat regions, we decided that splitting the genome into 50bp bins gave us specificity enough to map the deletions as accurately as possible. Due to this aggregation method however, there could be many more deletions that are similar to each other by fewer than 50 bp, and thus are missed from the analysis by being grouped with the other reads. This could mean however that we are greatly underestimating the number of unique deletion events that occurred.
A major consideration with this protocol is its population based, and thus competitive nature.
The large variation of deletions that are created are growing in direct competition with each other, and thus this protocol also allows for us to partially select for those cells that are the most robust. While we only allowed for one passage for the cells to grow in an attempt to minimise this as much as possible, the selection for faster growing mutants is inevitable, and it remains an inherent property of bacterial life that the fast-growers will proliferate at the expense of the slow growers. This is reflected in the essentiality of the genes that were deleted. Of the 147 genes, only 8 were classified as 'fitness' within the original genome. These genes were annotated as such due to the fact that they imparted a severe growth restriction on the cells when lost in a single disruption (Lluch-Senar et al., 2015). However when part of a larger deletion here, the cells could still compete with those that had only lost NE genes, clearly recovering some level of fitness as a result of the larger deletion than if they had lost the single fitness gene alone. This could be due in part to the retention of non-adherent cells, which was not undertook in previous essentiality studies in M. pneumoniae (Lluch-Senar et al., 2015). As such, phenotypes that imparted a loss of fitness in adherence could be well tolerated here.
Also, the major adhesin genes are known players in the cell division machinery (Krause and Balish, 2004), and while the P1 and hmw1 genes are NE, hmw2 is classified as F and hmw3 as E (Lluch-Senar et al., 2015). It is possible that the deletion of the large cytadaherance operon between mpn141 and mpn153 also affected cell division and was thus not tolerated, while the deletion of other less important cytadherance genes was tolerable, due to the retention of planktonic cells within the protocol.
This implies that there are epistatic effects at work within the genome, as combinatorial gene deletions do not have the same fitness outcome as the sum of their individual deletions (Arnold et al., 2018;Domingo et al., 2019). As shown by the team at JCVI when creating the JCVI-Syn3.0 cell, simply deleting all the genes deemed non-essential from the genome is a non-viable method, as each gene loss has the potential to change the essentiality of multiple other genes, which constantly needs to be re-evaluated (Hutchison et al., 2016). By utilising a randomised large scale deletion protocol, not only can single genes be tested for their updated essentiality, but whole regions as well. This has the power to greatly increase the scope of genome reduction projects, by quickly identifying the largest areas that are amenable to deletion at any given time, and identifying those regions that may not be amenable to deletion despite being counterintuitive. This is shown well by the deletions we see across the population. While the distribution of E and NE genes is fairly even in the M. pneumoniae genome, there are some clear islands of non-essentiality. We have shown that large deletions are possible in many of these islands, the regions around the 610Kb and 140Kb most notably. Looking at Figure 4, the majority of the regions with multiple NE genes contain some level of deletion. However, there are also places where few if any deletions are observed, such as the clusters of NE genes at 244Kb, 370Kb and 490Kb. While the loss of any of these genes individually is possible for the cell, their combined loss appears to be lethal. Therefore, this tool can be used not only to find which regions of the genome are most amenable to large scale deletion, but also to identify which putative non-essential regions play host to the most epistatic interactions.
Furthermore, the competitive nature of the protocol allows the researcher to not only elucidate the larger regions that can be deleted, but also those with the least fitness expense in any given environment. If a cell is being minimised for a specific application, then genome reduced pools can be grown in the desired condition for as long as required, and the cells with the most viable reductions will outcompete those whose deletions are less viable, and give researchers insights into not only which genes provide a desirable phenotype in new conditions, but which operons and larger genome areas as well.
We found limited examples of this in our own study. Our pool of deletions was grown for just a single passage under standard laboratory conditions, and thus genes required for the M.
pneumoniae cells to exhibit pathogenesis were not required. As such, 15 separate adhesion proteins (out of a total of 22) were among those deleted, as were nine restriction putative enzyme proteins. On top of this, the main virulence factor in M. pneumoniae, the CARDS toxin (Parrott et al., 2016;Waites and Talkington, 2004), was also among those genes deleted, along with many of the adhesins that are linked to pathogenesis (Parrott et al., 2016). This indicates that the protocol has the ability to be utilised as an attenuation process as well.
In line with this, the protocol also has potential for conversion into a multi-step protocol. The removal of the constitutive Cre transposon from the genome, or its replacement with a conditionally activated Cre could allow for multiple rounds of the technique to be utilised within a single cell. This could convert the technique from a screening tool to identify amenable deletions to a self-contained large scale genome minimisation technique, capable of deleting as many large and small genomic regions as the cell can endure. Coupled with its random nature, the technique could help researchers avoid unforeseen negative epistatic interactions and delete as much genetic material as is feasible within the cell, potentially paving the way for further elucidation of the minimal machinery needed for a cell to survive.
While single rounds of the protocol may only remove relatively small amounts of DNA in this case (our largest deletion of 28Kb accounts for 3% of the total M. pneumoniae genome), its unbiased and competitive nature make LoxTnSeq an attractive prospect.
In conclusion, we present the LoxTnSeq protocol as a multi-purpose tool for synthetic biology. It is capable of deleting large regions of NE genes from a host genome, identifying candidate regions for genome reductions based on the fitness the deletion imparts on the cell, allowing for more accurate essentiality maps based the loss of multiple genes and the identification of non-essential regions that contain interact epistatically and may cause loss of viability if removed in part of as a whole, despite tier constituent parts being deemed nonessential. We hope that these attributes will make it a useful contribution to the growing synthetic biology tool box.

Plasmid DNA
All plasmids were generated using the Gibson isothermal assembly method (Gibson et al., 2009). DNA was isolated from NEB 5-alpha Competent E. coli cells, and individual clones were selected for using LB + ampicillin plates (100 µg/ml). Correct ligation was confirmed via Sanger sequencing (Eurofins Genomics). A list of all plasmids, and primers used in their generation and sequencing can be found in Suppl. Tables 4 & 5. The plasmids used in this study were generated as follows: pMTnLox71Tc The plasmid pMTnTc was amplified separately via PCR using oligos 1 & 2, and 3 & 4. The samples were digested with DpnI and isolated via electrophoresis.
Bands of approx. 4.2Kb and 2Kb respectively were isolated and annealed via Gibson ligation.

pMTnLox66Cm
The plasmid pMTnCat was amplified via PCR using oligos 5 & 6. The sample was digested with DpnI and isolated via electrophoresis. A band of approx. 5Kb was isolated and self-annealed via Gibson ligation.

pMTnCreGm
The plasmid pMTnCat was amplified via PCR using oligos 7 & 8, and the plasmid pGmRCre was amplified via PCR using oligos 9 & 10. The samples were digested with DpnI and isolated via electrophoresis. Bands of approx. 4.2Kb and 2.6Kb respectively were isolated and annealed via Gibson ligation.
Random genome reduction protocol WT M129 cells were transformed according to the protocol outlined by Hedreyda et al., (Hedreyda et al., 1993). Cells were grown to mid-log phase, identified by the Hayflick media changing from red to orange. The media was decanted and the flask was washed 3x with  From the sequencing data, paired reads were extracted that contained the inverted repeat sequence from the transposons, a sequence of genomic DNA, the adapter sequence from the circularisation protocol and a second sequence of genomic DNA. We used basic bash tools to trim the adapter sequence and BlastN to map against M. pneumoniae M129 genome (Accession number: NC_000912). Then we used custom Python scripts to detect deletion points selecting for reads where the two halves of the read map to two different genomic loci.
The scripts required to run these processing steps can be found in https://github.com/CRG-CNAG/Fastq2LoxDel.

Validation of inversion removal
To ensure that the protocol only selected for cells that had undergone a deletion, the pool of cells containing all three transformations was grown to mid-log phase and isolated via the centrifugation protocol described above.