TrueImpactDataset is a dataset of scholarly publications, which was designed to facilitate the development and validation new methods in the area of research evaluation. The dataset consists of research publications of two types — research papers which are considered seminal work in their area and papers which provide a survey (a literature review) of a research area. It also contains related metadata, which include DOIs, titles, authors and abstracts.


Download the whole dataset as an archive. The dataset currently contains:

  • 314 research publications — 166 labeled as seminal and 148 labeled as survey
  • Metadata — DOIs and URLs, titles, authors and years of publication and citation counts in Google Scholar and Web of Science
  • Abstracts
  • Citing and cited references from Web of Science
  • Metadata from Mendeley including reader information


Additional files

Below you can download the survey form we used to collect the data, the original unprocessed responses we've received and a version of the file with responses which has additional columns containing the metadata we collected.


All Python source codes which we've used to clean and analyze the data are available via GitHub.


  • Drahomira Herrmannova, Knowledge Media Institute, The Open University, UK
  • Robert M. Patton, Oak Ridge National Laboratory, USA
  • Petr Knoth, Knowledge Media Institute, The Open University, UK
  • Christopher G. Stahl, Oak Ridge National Laboratory, USA