## neděle 12. srpna 2012

### Distribution of Friends on Facebook

I looked at the distribution of friends' friend count. I thought that I will just see an another example of Pareto distribution. Little did I know...

The distribution has a long tail but once the number of friends hit 150 it goes down quickly.

I thought ok, it's because they are mine friends and not a general population. I am more likely to be friend with someone who has many friends then with someone who has just a few friends. Hence my tail must be clipped at the tail.

I sampled people of the same frequent name and I have got the same strange result. I thought ok, it's because Facebook truncates the output to the first 300 people who are likely to be my friends. Hence the plot contains people with many friends.

But if I sample a rare name I still get a deformed curve. I have concluded that it can because of some power users with many friends. By Dunbar number our brain is designed to accommodate up to 150 friends with whom we maintain friendly relationship. And if we cross this limit we begin to neglect some of our friends. Hence I filtered all people with more than 150 friends.

And guess what, Pareto curve fitted the data perfectly!

To take away from this exercise:
1. Be careful about the population you sample. If you ask people in a bus how far do they travel you will get a higher average than if you ask people at a bus station of the bus. A nice article called Why Your Friends Have More Friends Than You Do describes it nicely.
2. Be careful about the extremes. They are likely to be somehow biased.
Edit: once I wrote this article I found a similar article directly from Facebook and they observed the same things as I did!