Jan Baumbach studied computer science at Bielefeld University in Germany. His research career started at Rothamsted Research in Harpenden (UK) where he worked on computational methods for the integration of molecular biology data. He returned to the Center for Biotechnology in Bielefeld for his PhD studies where he developed CoryneRegNet. Afterwards, at the University of California at Berkeley, he worked in the Algorithms group of Richard Karp on Transitivity Clustering, a novel clustering framework for large-scale biomedical data sets. From March 2010, Jan was head of the Computational Systems Biology group at the Max Planck Institute for Informatics in Saarbrücken, Germany. In October 2012, he moved to the University of Southern Denmark as head of the Computational BioMedicine group. His research concentrated on systems and network biomedicine. He was study program coordinator of the Computational BioMedicine program from 2015 to 2017. In January 2018 he moved to the Technical University of Munich as chair of the Experimental Bioinformatics (ExBio), where he developed computational methods for systems medicine and novel federated AI approaches ensuring privacy by design. In January 2021, the lab relocated to the University of Hamburg where Jan Baumbach became director of the Institute for Computational Systems Biology (CoSy.Bio). In 2023, he was appointed a Humboldt Scout through the Henriette Herz-Scouting-Program.
To share or not to share? Privacy-preserving AI in medicine
European Health Data Spaces, national digital health records archives and similar initiatives aim to provide a mixture of legal and technical frameworks to make privacy-sensitive medical data available for data mining. The ultimate goal is to access the yet behind legal barriers hidden healthcare data treasure in order to train prognostic models for personalized medicine - from disease management to individualized drug repurposing prediction. The biggest road blocks are the GDPR and cyber security. In the talk, we will discuss federated learning technology that - coupled to other privacy-enhancing technologies - allows for a secure multi-center data mining collaboration. Specifically, we will demonstrate that it does provide as accurate results as centralized solutions. We will discuss concrete applications for multi-centric genome-wide association studies, for meta-genomics, transcriptomics and proteomics analysis including batch effect correction, and for survival time analysis. One applications involved >1,000 hospitals in North America, another one involves >100,000 European screening participants. Finally, we discuss remaining cyber security aspects, limitations and future prospects of federated learning in healthcare data mining