Eclipse Scava Datasets

This web site hosts the open datasets generated in the course of the Crossminer research project. It includes various pieces of data retrieved from the Eclipse forge in CSV and JSON formats, and each dataset has a R Markdown document describing its content and providing hints about how to use it. Examples provided mainly use the R statistical analysis software.

All datasets are published under the Creative Commons BY-Attribution-Share Alike 4.0 (International).

All data is anonymised, please see the dedicated document to learn more about privacy and the anonymisation mecanism.

Eclipse projects

We generate full data extracts of a set of Eclipse projects, including data sources like:

These datasets are updated weekly, at 2am on Sunday.


AERI Stacktraces

The AERI stacktraces dataset is a list of exceptions encountered by users in the Eclipse IDE, as retrieved by the AERI system. The Automated Error Reporting (AERI) system has been developed by the people at Code Trails and retrieves information about exceptions. It is installed by default in the Eclipse IDE and has helped hundreds of projects better support their users and resolve bugs. This dataset is a dump of all records over a couple of years, with useful information about the exceptions and environment.

Last update of the dataset occured on 2018-02-11.



More information about the AERI system can be found on the Code Trails website.

Eclipse mailing lists

The Eclipse Mailing lists dump is an extract of all emails posted on the Eclipse mailing lists.

More information can be found on the official Eclipse page for mailing lists.