Survey of Automated Agents and Spam on GitHub

Jan Hensel

2024-03-15

Abstract

As the user base of GitHub grows, the social coding platorm faces increasing challenges from spam and malicious automation targeting those users. This paper investigates the different types of automated agents on GitHub, both helpful bots and harmful actors. Through the analysis of two recent examples of malicious social automation, the paper explores some of the methods used by malicious actors on GitHub and evaluates the responsive measures by the platform regarding these example cases. The examples analyzed are two distinct spam campaigns observed on GitHub in early 2024 and the analysis shows that users are clearly targeted algorithmically and that the targeting method correlates with some topical interests such as cryptocurrencies as well as certain social niches related to those same preferences. In connecting these practical instances of automated agents on GitHub to the terminology of their academic study in the social sciences, a necessary basis for discussing automation on GitHub is established. As the work ultimately intends to highlight the relevance of GitHub for the field of study of automated agents, this basis should provide future researches a point of entry for more in-depth research on social automation on GitHub.

Download PDF
see BibTeX entry
@misc{hensel2024sbotsgh,
  title={Survey of Automated Agents and Spam on GitHub},
  author={Hensel, Jan},
  date={2024-03-15},
  url={https://hensel.dev/papers/github-sbots-analysis-2024}
}

Some Context for the Paper

While studying computer science I also took some courses outside of that realm. One such course was in the field of social media and communications studies, a field somewhat closely related to sociology, on the topic of social bots, ie. automated agents acting in social networks.

While I do not use many social platforms, one sneakily social platform that I do use a lot is GitHub. Looking into it, I found that people had already analyzed and codified GitHub as a "social platform" years prior, but I could not find any paper discussing its social automation.

Of course, especially malicious social automation is ever-present and I quickly found a spam campaign to analyze; as luck would have it, just a few days into data gathering I actually found a second campaign which seems to be related to the first, as its methodology was strikingly similar.

In this paper, which tries to fit in with the social sciences context for which it is written, I therefore relate GitHub as a social platform with its types of automation to the terminology of the study of automated social agents (social bots) and analyze those spam campaigns in some detail.

Data?

If you are a researcher or otherwise interested in the data, I might be able to help you; Feel free to reach out.

Methodological sidenote

My methodology for data gathering was a bit haphazard (in my honest opinion) and I wrote a brief blog post one some of the complications of it, however it is intended to be a fun little primer on concurrency.

If you happen to have any questions about how to get data from GitHub at some (moderate) scale and would like some advice, feel free to reach out as well. I'm hardly an expert, but I do understand that especially for non-programmers it might be daunting to try and get data from a foreign platform.