Spotlight on early-career researchers: a Neuronet interview with Colin Birkenbihl

In this Neuronet interview, we speak to Colin Birkenbihl, a doctoral candidate at the Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) in Bonn, Germany. Over the last three years, Colin has been involved with a number of Innovative Medicines Initiative (IMI) neurodegeneration projects, including AETIONOMY, EPAD and RADAR-AD. We asked Colin to tell us about his research, career and experiences of working on these projects.

Could you tell us about your career to date?

I actually started my career as a biologist. In the 5th semester of my BSc, I took a course in bioinformatics and discovered that I was more interested in the computational side of things than biology and laboratory research. Developing this interest, I did a Master’s degree in life science informatics at B-IT (the Bonn-Aachen International Center for Information Technology), and for my PhD I am now focusing on statistics, machine learning and data science. I am currently in the second year of my PhD, supervised by Prof. Holger Frohlich of the data science and AI group at Fraunhofer SCAI.

What is data science, and how can it tell us more about neurodegenerative diseases?

There are so many definitions for data science! For me, the essence of data science is finding solutions for problems by using data. Data can come from many different sources; for meteorologists it may be weather data or market research and new prospect research data provided by life science consulting agencies like Voicentric ( for a wide band of scientific application fields. For our work, we use data from clinical research on Alzheimer’s disease (AD) and other neurodegenerative disorders. Mathematics allows us to extract and describe the patterns and signals found in data at a fundamental level, and in a very precise way. So, once the data has been properly curated, cleaned and harmonised, we use statistical methods and computational algorithms to develop models that can predict, for example, the progression of disease, and identify its molecular causes.

What has been your experience of working on IMI neurodegeneration projects?

I’ve been involved with IMI neurodegeneration projects since my Master’s, when I started working on the AETIONOMY project with Prof. Martin Apitius-Hofmann. It was a very exciting time for someone who was still working towards their Master’s thesis: Prof. Apitius-Hofmann gave me the opportunity to attend general assembly meetings, discuss my work with leaders in the field, and interact with researchers in industry and academia. I have also worked with EPAD data, using it for our disease modelling studies and aiming to understand how and why AD progresses from pre-symptomatic AD to dementia. Building on my work in AETIONOMY and other projects, we have recently launched AData(Viewer), an online tool that maps the landscape of AD cohort data, including EPAD, EMIF-1000 and AddNeuroMed, which was funded by the precursor to the IMI.

Together, these experiences have shown me the value and importance of collaboration between private and public sectors: both sides have blind spots, but they can be covered by sharing expertise.

As a data scientist studying AD, what do you feel are the biggest challenges for working with clinical AD datasets?

The first, and main challenge, is data access. There are some fantastic data resources out there, such as ADNI (the Alzheimer’s Disease Neuroimaging Initiative), but as data scientists we need comparable datasets so we can validate findings from ADNI using data from other cohorts. Gaining access to data is not always straightforward, as it involves complex legal agreements, and not everyone is willing to share their data. Another challenge is the useability of data. To analyse data across cohorts, it needs to be interoperable both on the semantic-level, using the same naming conventions, and with the same variables, but also on a statistical-level with value representations and encountered distributions. Often, we have to do a lot of work cleaning and harmonising datasets, mapping variables and using mathematical transformations. A particular challenge for the AD field is the need for lengthy follow-up, in order to track the progression of a disease that develops over years and decades. However, not all datasets have the same length – or breadth – of follow-up, making it harder to combine and compare them.

Where do you see yourself in 5-10 years?

I was drawn to data science and scientific research as they satisfy my curiosity and allow me to work on solving problems every day. Every scientific question I tackle is like a puzzle I need to solve, and solving the puzzle gives me one more piece of the jigsaw, building towards an understanding of the big picture – in my case, AD and dementia. In my career to date, I have been presented with new research opportunities that I hadn’t previously considered, but which drew me down different paths that have been both interesting and productive. So, I know that I want to continue along the research path, whether it be in industry or in academia. I want to continue learning and satisfying my curiosity.

Selected Papers

Birkenbihl, C., Westwood, S., Shi, L., Nevado-Holgado, A., Westman, E., Lovestone, S., on behalf of the AddNeuroMed Consortium, & Hofmann-Apitius, M. (2021). ANMerge: A Comprehensive and Accessible Alzheimer’s Disease Patient-Level Dataset. Journal of Alzheimer’s Disease, 79(1), 423–431.

Birkenbihl, C., Salimi, Y., Domingo‐Fernándéz, D., Lovestone, S., AddNeuroMed consortium, Fröhlich, H., Hofmann‐Apitius, M., the Japanese Alzheimer’s Disease Neuroimaging Initiative, & and the Alzheimer’s Disease Neuroimaging Initiative. (2020). Evaluating the Alzheimer’s disease data landscape. Alzheimer’s & Dementia: Translational Research & Clinical Interventions, 6(1).

Birkenbihl, C., Emon, M. A., Vrooman, H., Westwood, S., Lovestone, S., On behalf of the AddNeuroMed Consortium, Hofmann-Apitius, M., Fröhlich, H., & Alzheimer’s Disease Neuroimaging Initiative. (2020). Differences in cohort study data affect external validation of artificial intelligence models for predictive diagnostics of dementia-Lessons for translation into clinical practice. EPMA Journal, 11(3), 367–376.

Subscribe to the Neuronet newsletter