2024-03-15
@misc{hensel2024sbotsgh, title={Survey of Automated Agents and Spam on GitHub}, author={Hensel, Jan}, date={2024-03-15}, url={https://hensel.dev/papers/github-sbots-analysis-2024} }
While studying computer science I also took some courses outside of that realm. One such course was in the field of social media and communications studies, a field somewhat closely related to sociology, on the topic of social bots, ie. automated agents acting in social networks.
While I do not use many social platforms, one sneakily social platform that I do use a lot is GitHub. Looking into it, I found that people had already analyzed and codified GitHub as a "social platform" years prior, but I could not find any paper discussing its social automation.
Of course, especially malicious social automation is ever-present and I quickly found a spam campaign to analyze; as luck would have it, just a few days into data gathering I actually found a second campaign which seems to be related to the first, as its methodology was strikingly similar.
In this paper, which tries to fit in with the social sciences context for which it is written, I therefore relate GitHub as a social platform with its types of automation to the terminology of the study of automated social agents (social bots) and analyze those spam campaigns in some detail.
If you are a researcher or otherwise interested in the data, I might be able to help you; Feel free to reach out.
My methodology for data gathering was a bit haphazard (in my honest opinion) and I wrote a brief blog post one some of the complications of it, however it is intended to be a fun little primer on concurrency.
If you happen to have any questions about how to get data from GitHub at some (moderate) scale and would like some advice, feel free to reach out as well. I'm hardly an expert, but I do understand that especially for non-programmers it might be daunting to try and get data from a foreign platform.