Lally, PatrickGómez-Romero, LauraTierrafría, Víctor H.Aquino, PatriciaRioualen, ClaireZhang, XiaomanKim, Sun-YoungBaniulyte, GabrielePlitnick, JonathanSmith, CarolBabu, MohanCollado Vides, Pedro JulioWade, Joseph T.Galagan, James E.2025-09-082025-09-082025Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, et al. Predictive biophysical neural network modeling of a compendium of in vivo transcription factor DNA binding profiles for Escherichia coli. Nat Commun. 2025 May 7;16(1):4255. DOI: 10.1038/s41467-025-58862-82041-1723http://hdl.handle.net/10230/71149The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We use these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We use BoltzNet to quantitatively design novel binding sites, which we validate with biophysical experiments on purified protein. We generate models for 124 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.application/pdfeng© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.Predictive biophysical neural network modeling of a compendium of in vivo transcription factor DNA binding profiles for Escherichia coliinfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1038/s41467-025-58862-8Gene regulationMachine learningThermodynamicsinfo:eu-repo/semantics/openAccess