Evaluating the Alzheimer’s disease data landscape

Longitudinal cohort studies are of particular value for research on progressive, neurodegenerative diseases like Alzheimer’s disease (AD), as they obtain clinical samples and data from a defined group of participants at regular intervals over time. However, it can be hard to compare or pool data across different cohort studies – something that is essential to really understand whether research results are reproducible – because studies are often designed differently, and due to variability in the levels of access provided to patient-level data.

To assess the AD data landscape, Colin Birkenbihl and colleagues analysed datasets from 9 cohort studies, including the EMIF-1000 and EPADv1500 studies. Summarising the data parameters and describing how they overlap between studies, they observed fairly large biases towards high levels of education and, in particular, a strong bias towards white/Caucasian ethnicity. Scoring different data parameters based on accessibility, the researchers observed that while some straightforward modalities were accessible across all studies (e.g sex, age, education), other modalities were much more heterogeneous, such as imaging data, lifestyle parameters and fluid biomarker samples – indicating a lack of interoperability across datasets and cohorts. In addition, the extent of longitudinal follow-up and sampling varied extensively between studies and data modalities, with a particular paucity in MRI and CSF biomarker categories. All the analyses reported in the article have been made available through “ADataViewer”, an interactive web application developed by Fraunhofer SCAI that displays the findings in data availability maps.

https://adata.scai.fraunhofer.de/