Fakestagram – How bots and fake accounts simulate a vivid world in social media

For a long time, I avoided Instagram for the most part. Not because I’m a notorious social media hater or have privacy concerns but mainly because I haven’t found content there that was appealing to me. This changed when I helped to build up a social media presence and it became part of my daily habit to check what other people are posting. Very quickly, I came across an interesting fact: People with no particular interesting content have thousands or even ten thousands of followers. I’ve seen a girl who presented an avocado (she apparently eats daily) on Instagram. Just a picture of an avocado. Every. single. day. 10.000s of followers! How could this be? Does it only take one avocado to ecstasiate the Instagram crowd? Is this the great chance for me to finally find an audience for my best friend apple I’m eating every day? I took a closer look at the followers of the avocado girl and the very first one looked like this:

Well, maybe this is a real person who just never posts anything and loves to insult others in his Instagram bio. But let’s be realistic: We all know this is a fake account. I continued scrolling but it didn’t get better, roughly 9 out of 10 followers showed all signs of fake accounts: A strange name, no postings, no bio, no profile pic and an absurd amount of random people they follow.

It’s very well known that there are fake accounts on social media platforms, Facebook reported that 3-4% of active users on their platform are fake. But how are those fake accounts distributed? I clicked on the list of the 173 people the account “mk6vwgli14” is following and saw this:

All of them looked like real people, all of them are influencers for various topics. But wait, is there a pattern behind it? I clicked on one of these real people and checked the followers. And again, 9 out of 10 followers looked suspicious. My interest was awakened and I stretched by fingers for some coding work, I wanted to find out: Did I just stumbled across a few bad examples or is this a general phenomenon, or in other words: How real is Instagram?

Methodology

It’s not always easy to decide if an account is real or not, the definition of “real” alone is squishy: Should an account that uses a portrait of someone else and not posting anything be considered real or fake? Take a look at this user:

Our friend Daniel Eduardo took a photo of Chinese actress Kan Qing-Zi and exclusively follows burger and food accounts. It could be a real (but shy) person that really loves burgers and is active daily on Instagram looking at delicious patties and will happily try out a new restaurant whenever he sees a burger ad on Instagram. Or it could be an inactive account, created a long time ago and de facto worthless for advertising companies as the person behind this account will likely never see any ads let alone buy the product. While in this case only Instagram can evaluate how active and “valuable” the account is, there are accounts that are clearly bots, just there to boost the follower count of others. One of these services is called Famoid, they offer 25 free Instagram follower for any account you wish and I gave it a try to understand how a typical bot account looks like. Here is an example of such a bot you can get as a follower for your account when you sign up for Famoid:

After investigating several of these accounts, I saw the following repeating patterns and I weighted the importance of 5 features that give indications that the account is not real:

  • The account has 3 or less posts (weight: 20%)
  • The followers-to-follow ratio is below 0.03 (weight: 30%)
  • The username is strange (we’ll come to that in a second; weight: 20%)
  • The bio has 2 words or less (weight: 15%)
  • The account has no profile pic (weight: 15%)

If an account fulfills >50% of these characteristics, I consider them as fake. Especially the username is interesting because I noticed that the naming is quite different between known real and known fake accounts: Real people tend to combine parts of their real names, add suffixes like “_official” while bot accounts use more random combinations of characters and numbers. I decided to train a quick machine learning LSTM network in PyTorch taking single characters as inputs and predicting how likely it is that the name belongs to a real person. The accuracy of ~78% of the model seemed promising that this feature can be used for the prediction in combination of the other characteristics as described above (you can find the model code here. This sample learns to classify the gender of a name but it can be easily adopted to distinguish real/fake usernames when the training data is available).

Next step was to write a small script that gets the followers of an account and performs the fake check. This is tricky as a fully automated scraping of Instagram is not allowed according to the terms of use and libraries like this one don’t seem to work reliably as Instagram is changing their (unofficial) API quite frequently. In the end I decided to go a semi-automatized route, listing the followers manually (which took some time) and automatized the fake check.

Results

I decided to do the fake check for 1000 randomly selected followers of the top 5 individual accounts with most followers on Instagram as well as 4 random “influencer”:

UsernameFollowersFake ratioSample followers that were predicted as unreal
@cristiano257,679,1925.27%gjsjfjdjei, android15304, jameshappyboy1128
@arianagrande218,594,89510.0%rada5743j, djonki17, rayan_as26
@therock214,312,8926.53%hwd7211, mmn11677, fyslhsn278
@kimkardashian202,146,36829.16%yadavsiddu41, eliza.beth3466, fl0werlliv
@selenagomez206,389,05312.5%ramosbandy, pling__157

As we can see, Cristiano Ronaldo’s account has a predicted fake follower rate of roughly 5% meaning, following the characteristics above, 5 out of 100 followers are probably not real. It is absolutely possible (and even expected) that the model has misclassified real users as fake (so called “false positives”) so a rate of 5% seems to be reasonable to assume that there are hardly any fake followers on his account. The situation looks quite different for @kimkardashian and the difference to @cristiano is astonishing: Almost 30% of her followers showed suspicious patterns of unreal accounts. While it’s perfectly possible that for example “yadavsiddu41” (one of her followers) is a real person, it is at least doubtful how active this account really is.

The difference between celebrity accounts and other influencers with much less followers is also interesting. I randomly picked 4 influencers with a follower count between 50,000 and 200,000 and performed the fake check again:

AccountFollowersFake ratioSample followers that were predicted as unreal
(A) 63,54710.52%don.pal, hansalexcisflores, bonjovistan
(B) 80,94750.2%52831l, 3d.enzo, josh111140
(C) 183,35237.5%_alexnunes22_, tom.floyd.56232, z.ohara19
(D) 194,98336.1%s_akihiro.8, sufiank_, solidgon7

Influencer (B) for example has several advertising partnerships with companies. If 50% of her followers are fake accounts, she basically presents the products to bots who will never buy said product or in other words: The majority of ad money the company pays this influencer is wasted. Wait, I hear you say, companies don’t rely on followers alone, they also check the “engagement rate” which is the ratio of comments/likes postings a user gets compared to the follower count. Yes, but this can be faked as well. For a little donation you can buy as many likes and comments as you wish, comments can be either customized or created randomly. In the latter case those are mostly predefined text blocks like “Wow I really love this” or arrays of emojis “❤️❤️❤️❤️”

Conclusion

What does this small experiment tell us? Inactive accounts, fake accounts and bots are way more common on Instagram than Instagram wants to admit. They claim that they actively delete fake accounts which may be true to some extent. But there is a conflict of interest: Fake accounts don’t hurt the platform. Companies are more willing to spend ad money on Instagram if they assume that a lot of people are seeing their ads. Without Instagram-internal data it is impossible to decide with certainty which accounts are real users and which are just bots, but the absurd high amount of bots on social media would also explain why companies like Uber can stop their online ad campaigns and hardly sees any difference.

There is also a socio-critical aspect here: As a lot of influencers buy fake follower to appear way more “popular” than they are, an expectation is created that you have to have thousands of followers to be “worth it”. People create a popular virtual ego of themselves, a social fiction constructed of fake followers and bought likes. This may also increase the pressure for other people to keep up with those “popular” people and they may feel like failures if they do not achieve similarly high follower counts as their peers. I’m surprised that this phenomenon is not discussed more broadly, it’s not enough to dismiss fake followers in social media as a marginal phenomenon, these platforms allow it that a world of active real users is simulated which does not exist and we should really think about the value we place on these platforms.

Technology makes it easier for conversations to lose their soul

Recently, my colleague Peter was promoted and happily announced it on LinkedIn. Within a very short time, an interesting phenomenon could be observed under his post, the comments section filled with the following text:

What happened here was clear: LinkedIn offers the possibility to reply to posts with ready-to-use text modules. A change of the job title is interpreted as positive news and therefore the module “Congrats X” is suggested, of which many colleagues made use. Now, this can be dismissed as a harmless way of simply expressing congratulations quickly in the hustle and bustle of everyday life, and technology helps us to do this in digital form. And yet there is something disturbing about this robotic form of communication and we should discuss the long-term impact more widely.

The generation of text or text modules is not limited to the LinkedIn microcosm, especially since the development of BERT and the transformer architecture of neural networks and the hype around GPT-3, more and more large and small companies have begun to integrate speech generation into their products or to cobble complete business models around it. Gmail, for example, offers the possibility of automatically completing text in emails called “smart compose” and the function is surprisingly good:

In this application email, Gmail not only continuously suggested text modules, at the end it even automatically suggested the subject “I want to be part of your team!”. This is astonishing, but it also raises the question of what value this rather good subject is in said application email, if Gmail also suggests this to all other applicants and the recruiter’s inbox is full of emails with the subject “I want to be part of your team!”. Although Gmail offers the option to turn on personalization for “smart compose”, which is supposed to learn your writing style, it is questionable how well the language model with trillions of parameters and trained on billions of texts from the internet can really acquire a personal writing style.

The topic seems even more disconcerting when you look at the development of the last few years, in which many companies have come up with the idea of replacing their customer support with chatbots. Anyone who has developed chatbots knows that there is no special artificial intelligence behind it: Sentences typed by users are assigned to so-called “intents” and programmers must manually assign prefabricated answer or subsequent processes to each intent. It is true that often artificial intelligence language models are used to map sentences to intents but that’s where things go wrong more than once: If the language model predicts the wrong intent for a sentence, this can lead to curious misinterpretations that would not occur in this way if one were actually communicating with a human.

Source: https://www.userlike.com/en/blog/chatbot-fails

The problem with chatbots is that they simulate human communication, but they are not on the same communication level as humans. Computers can be operated flawlessly with mouse and keyboard, the selection of a date via date picker for example and the insertion of a location in the field “Departure airport” are clear instructions, the written text “Flight Seattle 1-7” on the other hand can be interpreted ambiguously (is it July 1st  or 7th of January? Humans would either use the closest date here or ask queries in case they “feel” the situation is not clear).

It will quickly lead to frustration if we’re forced to communicate with a system using textual language when at the same time, we know that it wouldn’t be necessary and just using a few mouse clicks would do the trick as well. Even more when the system misinterprets our commands. Language always expresses emotions and feelings and just by the tone of our words or the choice of our words, our opposite is able to infer how we feel. This interpretation on the other hand will likely make our conversation partner adapt his words and choose them according to the situation. And overall, this adoption of our words and tones to reflect what we talk about gives a communication its “soul” and shows that, for example, we’re happy for a person that he got a new job. There is a different emotion in “Fantastic that you made this career move Peter, let’s toast to it at the next meeting” than in “Congrats Peter”, “Congrats Peter”, “Congrats Peter”. This emotion is what makes language alive. If technology causes this emotion to disappear, we should consider whether this is a positive development.

Why you will never become rich and famous like Silicon Valley Entrepreneurs

Admit it: You are dreaming of building this one thing that will make you financially independent for the rest of your life. Millions of young people admire those glamorous entrepreneurs with the mission to change the world and a lot of them tell the same story: Starting from basically nothing, in a garage, I just had this great idea and then I worked really hard and now I’m a millionaire which implies: Everyone can do this, just be smart and work hard. If you dig deeper however, it becomes obvious that most of these dishwasher-to-millionaire stories are either pure lies or at least sugarcoated. I picked three entrepreneurs to underline by point.

Let’s start with Mark Zuckerberg. His career began at the relatively unknown Ardsley High School. Two years later he moved on to the prestigious Phillips Exeter Academy – a school with good connections to Harvard. Of course, to be accepted into Phillips Exeter Academy, you need to have a certain level of intelligence and preferably have already won several school prizes. But this is true for many people who never make it even near such a school. One big reason for this is the annual tuition fees of over $40,000 – no real hurdle for young students with high earning parents (Zuckerberg’s father is a dentist).

As luck would have it, Zuckerberg was eventually accepted into Harvard, where he met the Winklevoss brothers, from whom he adopted the first idea for Facebook. Now a lot has already been written about Facebook, especially why it became the largest social network in the world but one key aspect to keep in mind is that Facebook was first only accessible for Harvard student. In the early days you needed a Harvard email address to register for the social network. This is important to understand because it created an aura of elitism around this site. I’m pretty sure there are not many places on earth where a network like Facebook would have taken off this rapidly, just because being member of an “Ivy League school circle” was enough reason for people to join the site. Once you’re able to show investors that your site is able to attract people rapidly, it’s basically just a question of time until money flows in, this was especially true some years before and after the dot com bubble. So as a conclusion: Yes, Zuckerberg is smart for sure – but there are a lot of smart people out in this world who never make it. But intelligence combined with being at the right place with the right amount of money can bring you far.

But even if you are actually not smart enough to create what you want to create, you can become a billionaire (at least theoretically) – let me introduce you to Elizabeth Holmes. Holmes was the founder of Theranos, a company that claimed to be able to perform dozens of different blood tests by using just a single drop of blood – something which is considered to be impossible as for now. Holmes had no clue how to do this, however she was a student at Stanford at this time and convinced her Stanford professor Channing Robertson to help here. With her big blue eyes, blond hair, big red lips and the persuasive idea of changing the world, how could you say no? I don’t want to imply that she solely bewitched her professor with her appearance but a lot of people that met Holmes in person agreed that her appearance had a certain influence on people. Now if you have a Stanford professor in your team, you can be sure he has good connections to various other influential people. Imagine, you’re an investor an get a call from a Stanford professor telling you about an exciting new startup in the healthcare business. And now imagine you get the same phone call from a professor from an African no-name university. We know the answer, if it’s coming from Stanford, it must be something serious, right? As most of you probably know, Theranos was a fraud – the methods Holmes described never worked. But just because she was at Stanford and could literally just walk into the office of Robertson, she instantly got an enormous competitive advantage compared to all the other business founders out there in the world. But how did she get this advantage in the first place? Holmes is coming from a wealthy family (surprise!) and her parents made sure she’d visit Stanford’s summer school – if you apply for Stanford and you casually mention in your application that your dad is Vice President at a big US company and you already know Stanford because you’ve been at the campus for summer schools – well, the admissions committee will likely decide in your favor.

So attending a well-known university is like a self-fulfilling prophecy: People will believe you’re smart and you know what you’re doing just because an admission committee once decided you’re worth it. People will give you more attention and are more likely to hire you or give you money which will then lead to the assumption: Of course you make more money – you visited Stanford. At least since the college admission scandal of 2019 we know what you can basically buy your way into elite universities and it would be naïve to think that this practice has now stopped or will stop in the future.

“But what about Elon Musk” I hear you say. He wasn’t rich and didn’t come from a wealthy family, right? Wrong! Musk moved to Canada from South Africa in 1989 where he started to study at Queen’s University and moved to the United States after two years to study at the University of Pennsylvania – Ivy League again, let’s take a closer look what happened. Immigrating to Canada or the United States is not easy. It got harder today but it was already challenging 1989. Musk had a competitive advantage because his mother had the Canadian citizenship and he got the Canadian passport rather quickly. He later received a scholarship for the University of Pennsylvania, but it’s not reported how he got it. Musk was without doubt a good student as former classmates and friends stated. But again: This is true for many people and many don’t get scholarships or get into elite universities. So was it just pure luck? Did the scholarship judges already see the high potential of Musk and therefore decided in his favor? Unlikely, if it would be so easy to predict the career of a person, we would not need committees, we could just simply write an algorithm that does the job. Even though it’s unclear how exactly Musk got to Pennsylvania, it’s well documented that he founded his first company Zip2 with his brother Kimbal. For starting a business, you need at least a bit of seed capital and after studying, Musk had apparently 6 figures in student debt to pay back. Where did the money come from? Apparently, Musk’s father gave his boys almost 30.000 USD for starting Zip2. Depending on your background you may think 30.000 USD is not much, but for a lot of families, it would be unaffordable to just blow $30.000 on the crazy ideas of your children. Also, 30.000 USD are not much if you have to rent an office, buy groceries, computer supplies (which was needed for Zip2) and other equipment. If you want to survive, you need money. If you don’t have money, you have to spend time working for someone who is willing to give it to you with the conclusion that you have less time of the day for your company. The Zip2 story seems to be incomplete, most reports seem to be fine to just recite the claim “Musk’s father gave his boys 30.000 dollar to build Zip2 and 4 years later it was acquired by Compaq for $307 million”. It was not so easy, it never is. Elon Musk undoubtfully work long and hard which – for most people – is already a good enough explanation why he became rich. But again: Working hard is not a unique selling point of Musk, intelligence is not a unique selling point of Musk, a lot of people work hard and are smart.

The list could go on forever, doesn’t matter which wealthy person you pick, chances are that this person either already had money or connections before he became rich:

Bill Gates? His mother Mary was a good friend of John Opel – CEO of IBM. It makes things a lot easier when you want to sell something to IBM if your mom knows the chairman.

Jeff Bezos? Made a solid middle 6-figure wage as a Vice President at Banker’s Trust before he founded Amazon. It’s way easier to try out new risky things with a full bank account.

If you make low wage, you’ll never be able to build the next Amazon simply because you cannot afford the risk to fail. But how do you get started and make money if you can’t take the risk to make money? Well, that’s exactly the chicken or the egg dilemma. That’s the vicious circle and the reason why most people will never get rich. It’s a fact not many want to hear because it’s better for our salvation if we keep the hope of getting rich some day. Some poor or middle class people try to break through this barrier, fail and basically go bankrupt. Of course you’ll never hear of those people, we are subject to the survivorship bias when we look at all those rich people and come to the conclusion we just have to be smart and work really hard and we’ll also become rich.

So what can you do? Just enjoy life. Don’t fall for the claim that working hard will make you rich, it won’t. If you find yourself at the bottom of the ladder, you’ll potentially never become rich but still, you can grow your fortune a tiny bit and leave more to your descendants than you received from your ancestors. And slowly, over multiple generations, your family can become wealthy and smooth the way for the great-great-great-grandson to become insanely rich.

How can Computers learn to think? From “Machine Learning” to “Machine Thinking”

What is good? And what is evil? Think about this for a couple of seconds, think about how you define those two terms and how you’ve learned how to distinguish them.

This is not a trivial question, in fact, the concept of good and evil and how to define those terms is on people’s mind since thousands of years. All religions have found their very own definition and metaphors. In Christianity for example, the Bible clearly defines who is good and evil. God and Jesus, as well as all who follow their life rules, are good. All who turn away from God, abhor the Ten Commandments, or let them seduce by the devil are evil. To be evil, however, is not an irreversible way to ruin, the Bible teaches that a true regret of the evil deeds and the forgiveness of sins through God can lead back to the “good” way.

In Buddhism the distinction between good and evil is much less acute than in Christianity. Ask yourself the question: Is a person that kills someone else but truly regrets it afterwards now fundamentally good or evil? Is a person that steals regularly to feed his children good or evil? In Buddhism, a person is not per se good or evil but only their actions. And those actions that we put into practice will eventually fall back upon us in the future – this is called Karma.

Now what does this have to do with computers? Imagine you create a robot that has an awareness, a mind. Imagine further, this robot asks itself fundamental questions like: Who am I? Where do I come from? How would this robot be able to learn about the concepts of good and evil? Of course, we could just teach him what we consider to be good and evil, but wouldn’t that automatically restrict the potential of this robot? Maybe, there is a much better definition or a better concept to think about good and evil – if we teach a robot what good and evil is, he will never be able to find a (possible) better approach. And this is currently a situation we’re facing in the field of Artificial Intelligence (AI). AI is mostly driven by Machine Learning (ML) at the moment and ML needs a lot of data to learn. Machine Learning is in fact nothing more than finding the right parameters in an equation to calculate a result. In supervised machine learning, we give a computer some input (for example, lot of images of dogs) and the expected output (putting the tag “dog” on this images). In unsupervised learning however, we don’t tell the algorithm which outcome we expect, we just give them some input data and the computer is trying to cluster this information. Here’s an image to visualize this: The top row represents supervised learning, the bottom unsupervised learning. Note, that in unsupervised learning, the algorithm is able to cluster images of the same species. However, it’s not able to put the tag “dog” on it, as this was never taught to the algorithm (in supervised learning, it was).

Now here’s the thing: Although it looks as if a computer is actually able to “think” about how he arranges the images in unsupervised learning, he is not. In the end, the computer only compares pixel values and recognizes that giraffes typically have an elongated thing (the neck obviously) and therefore pushes all the images that have this feature into a group. If people are faced with the same picture grouping problem, they classify these images differently: They typically combine emotions with pictures and think of giraffes not only as “elongated features”, but rather see Africa, a safari or other animals that live in the same habitat as Giraffes do in their mind’s eye. And this is of fundamental importance: Machine Learning means adjusting parameters of a fixed formula. The process of thinking would mean changing and adopting this formula on a regular basis. If we want to get further with Artificial Intelligence, the question arises if we should move from Machine Learning to Machine Thinking. Machine Thinking for example, would mean adopting the source code of a program by the program itself to become better. Machine Thinking is all about reacting to the environment and adoption. This sounds similar to Reinforcement Learning (RL) but in RL, still a human has to tell the algorithm what is the goal to reach and what actions help to achieve the goal.

The ultimate goal would be a computer that could answers questions like “What is evil?” without having it to learn from humans first. This sounds like “Deep Thought”, the supercomputer in The Hitchhiker’s Guide to the Galaxy and in fact, that’s the goal, a thinking computer. Even though we are not there yet, Machine Learning and especially Reinforcement Learning are already big steps in the right direction. With Machine Thinking, we could now bring AI to the next level.

Why human beings in computer animated movies still don’t look like real humans

Disney’s latest animated movie “Moana” is – once again – a big seller: With a budget of 150 million USD, the box office revenue after one month already beats 500 million. If you take a closer look at the protagonists Moana, Tui or Tala, you’ll notice that they look like cartoon characters but not like real human beeings. On the one hand this is not exceptional as almost every human being in Disney movies looks like a cartoon character. On the other hand, most computer game creators try to design the protagonists as realistic as possible and with modern technology and today’s computer power, it is easily possible to create movies with realistic human beings. So why do animated Disney characters still don’t look like real humans?

moana

The reason for this is the so called Uncanny Valley effect. This phenomenon describes that people find highly abstracted, completely artificial figures more appealing and more acceptable than figures that are increasingly realistic. In other words: There is no linear correlation between anthropomorphism and the empathic reaction of people. The following diagram tries to visualize this:

uncanny_valley_effect

As you can see, the movie “The Polar Express” is named as an example for the Uncanny Valley effect. This film used a technique called Motion Capture. The actors were filmed first and their representation, mimic and gesture were then transferred to the digital figures.

polar_express_movie

With a budget of 170 million USD, the movie “only” got a revenue of 300 million which is okay but not extraordinary. Also other animated movies that used Motion Capture (e.g. Tintin) got similar results. The Uncanny Valley effect could be an explanation for these average box office results, even though it’s still a hypothesis and not fully proven.