Previous works have highlighted data scientists may use version control systems like GitHub differently. In this project, you will be leveranging the GitHub API to extract information about data science projects in multiple languages, and compare every version control operation. Expect to assess type of repositories, user activity, perform topic modelling in commit messages and issues, participation graphs between developers. In addition, you'll be part of an Ethical Application to be capable of surveying developers through an anonymous, online survey.
Note: This project is open and recruiting students.
- Programming knowledge, preferably either Python or R. Other languages are welcome but not needed.
- Knowledge (or willingness to learn quickly) about using APIs to download data.
- Demonstrated academic writing skills.
- Excellent attention to details.
Please, take a look at Dr Vidoni's papers here: https://melvidoni.rbind.io/project/2020-rse/
- Empirical Software Engineering. Mixed-Methods. Developers Survey.
- Natural Language Processing
- Data Scienc Software, Scientific Software
- Developers' Challenges