Fakestagram – How bots and fake accounts simulate a vivid world in social media

For a long time, I avoided Instagram for the most part. Not because I’m a notorious social media hater or have privacy concerns but mainly because I haven’t found content there that was appealing to me. This changed when I helped to build up a social media presence and it became part of my daily habit to check what other people are posting. Very quickly, I came across an interesting fact: People with no particular interesting content have thousands or even ten thousands of followers. I’ve seen a girl who presented an avocado (she apparently eats daily) on Instagram. Just a picture of an avocado. Every. single. day. 10.000s of followers! How could this be? Does it only take one avocado to ecstasiate the Instagram crowd? Is this the great chance for me to finally find an audience for my best friend apple I’m eating every day? I took a closer look at the followers of the avocado girl and the very first one looked like this:

Well, maybe this is a real person who just never posts anything and loves to insult others in his Instagram bio. But let’s be realistic: We all know this is a fake account. I continued scrolling but it didn’t get better, roughly 9 out of 10 followers showed all signs of fake accounts: A strange name, no postings, no bio, no profile pic and an absurd amount of random people they follow.

It’s very well known that there are fake accounts on social media platforms, Facebook reported that 3-4% of active users on their platform are fake. But how are those fake accounts distributed? I clicked on the list of the 173 people the account “mk6vwgli14” is following and saw this:

All of them looked like real people, all of them are influencers for various topics. But wait, is there a pattern behind it? I clicked on one of these real people and checked the followers. And again, 9 out of 10 followers looked suspicious. My interest was awakened and I stretched by fingers for some coding work, I wanted to find out: Did I just stumbled across a few bad examples or is this a general phenomenon, or in other words: How real is Instagram?

Methodology

It’s not always easy to decide if an account is real or not, the definition of “real” alone is squishy: Should an account that uses a portrait of someone else and not posting anything be considered real or fake? Take a look at this user:

Our friend Daniel Eduardo took a photo of Chinese actress Kan Qing-Zi and exclusively follows burger and food accounts. It could be a real (but shy) person that really loves burgers and is active daily on Instagram looking at delicious patties and will happily try out a new restaurant whenever he sees a burger ad on Instagram. Or it could be an inactive account, created a long time ago and de facto worthless for advertising companies as the person behind this account will likely never see any ads let alone buy the product. While in this case only Instagram can evaluate how active and “valuable” the account is, there are accounts that are clearly bots, just there to boost the follower count of others. One of these services is called Famoid, they offer 25 free Instagram follower for any account you wish and I gave it a try to understand how a typical bot account looks like. Here is an example of such a bot you can get as a follower for your account when you sign up for Famoid:

After investigating several of these accounts, I saw the following repeating patterns and I weighted the importance of 5 features that give indications that the account is not real:

  • The account has 3 or less posts (weight: 20%)
  • The followers-to-follow ratio is below 0.03 (weight: 30%)
  • The username is strange (we’ll come to that in a second; weight: 20%)
  • The bio has 2 words or less (weight: 15%)
  • The account has no profile pic (weight: 15%)

If an account fulfills >50% of these characteristics, I consider them as fake. Especially the username is interesting because I noticed that the naming is quite different between known real and known fake accounts: Real people tend to combine parts of their real names, add suffixes like “_official” while bot accounts use more random combinations of characters and numbers. I decided to train a quick machine learning LSTM network in PyTorch taking single characters as inputs and predicting how likely it is that the name belongs to a real person. The accuracy of ~78% of the model seemed promising that this feature can be used for the prediction in combination of the other characteristics as described above (you can find the model code here. This sample learns to classify the gender of a name but it can be easily adopted to distinguish real/fake usernames when the training data is available).

Next step was to write a small script that gets the followers of an account and performs the fake check. This is tricky as a fully automated scraping of Instagram is not allowed according to the terms of use and libraries like this one don’t seem to work reliably as Instagram is changing their (unofficial) API quite frequently. In the end I decided to go a semi-automatized route, listing the followers manually (which took some time) and automatized the fake check.

Results

I decided to do the fake check for 1000 randomly selected followers of the top 5 individual accounts with most followers on Instagram as well as 4 random “influencer”:

UsernameFollowersFake ratioSample followers that were predicted as unreal
@cristiano257,679,1925.27%gjsjfjdjei, android15304, jameshappyboy1128
@arianagrande218,594,89510.0%rada5743j, djonki17, rayan_as26
@therock214,312,8926.53%hwd7211, mmn11677, fyslhsn278
@kimkardashian202,146,36829.16%yadavsiddu41, eliza.beth3466, fl0werlliv
@selenagomez206,389,05312.5%ramosbandy, pling__157

As we can see, Cristiano Ronaldo’s account has a predicted fake follower rate of roughly 5% meaning, following the characteristics above, 5 out of 100 followers are probably not real. It is absolutely possible (and even expected) that the model has misclassified real users as fake (so called “false positives”) so a rate of 5% seems to be reasonable to assume that there are hardly any fake followers on his account. The situation looks quite different for @kimkardashian and the difference to @cristiano is astonishing: Almost 30% of her followers showed suspicious patterns of unreal accounts. While it’s perfectly possible that for example “yadavsiddu41” (one of her followers) is a real person, it is at least doubtful how active this account really is.

The difference between celebrity accounts and other influencers with much less followers is also interesting. I randomly picked 4 influencers with a follower count between 50,000 and 200,000 and performed the fake check again:

AccountFollowersFake ratioSample followers that were predicted as unreal
(A) 63,54710.52%don.pal, hansalexcisflores, bonjovistan
(B) 80,94750.2%52831l, 3d.enzo, josh111140
(C) 183,35237.5%_alexnunes22_, tom.floyd.56232, z.ohara19
(D) 194,98336.1%s_akihiro.8, sufiank_, solidgon7

Influencer (B) for example has several advertising partnerships with companies. If 50% of her followers are fake accounts, she basically presents the products to bots who will never buy said product or in other words: The majority of ad money the company pays this influencer is wasted. Wait, I hear you say, companies don’t rely on followers alone, they also check the “engagement rate” which is the ratio of comments/likes postings a user gets compared to the follower count. Yes, but this can be faked as well. For a little donation you can buy as many likes and comments as you wish, comments can be either customized or created randomly. In the latter case those are mostly predefined text blocks like “Wow I really love this” or arrays of emojis “❤️❤️❤️❤️”

Conclusion

What does this small experiment tell us? Inactive accounts, fake accounts and bots are way more common on Instagram than Instagram wants to admit. They claim that they actively delete fake accounts which may be true to some extent. But there is a conflict of interest: Fake accounts don’t hurt the platform. Companies are more willing to spend ad money on Instagram if they assume that a lot of people are seeing their ads. Without Instagram-internal data it is impossible to decide with certainty which accounts are real users and which are just bots, but the absurd high amount of bots on social media would also explain why companies like Uber can stop their online ad campaigns and hardly sees any difference.

There is also a socio-critical aspect here: As a lot of influencers buy fake follower to appear way more “popular” than they are, an expectation is created that you have to have thousands of followers to be “worth it”. People create a popular virtual ego of themselves, a social fiction constructed of fake followers and bought likes. This may also increase the pressure for other people to keep up with those “popular” people and they may feel like failures if they do not achieve similarly high follower counts as their peers. I’m surprised that this phenomenon is not discussed more broadly, it’s not enough to dismiss fake followers in social media as a marginal phenomenon, these platforms allow it that a world of active real users is simulated which does not exist and we should really think about the value we place on these platforms.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: