The distribution has a long tail but once the number of friends hit 150 it goes down quickly.
I thought ok, it's because they are mine friends and not a general population. I am more likely to be friend with someone who has many friends then with someone who has just a few friends. Hence my tail must be clipped at the tail.
But if I sample a rare name I still get a deformed curve. I have concluded that it can because of some power users with many friends. By Dunbar number our brain is designed to accommodate up to 150 friends with whom we maintain friendly relationship. And if we cross this limit we begin to neglect some of our friends. Hence I filtered all people with more than 150 friends.
And guess what, Pareto curve fitted the data perfectly!
To take away from this exercise:
- Be careful about the population you sample. If you ask people in a bus how far do they travel you will get a higher average than if you ask people at a bus station of the bus. A nice article called Why Your Friends Have More Friends Than You Do describes it nicely.
- Be careful about the extremes. They are likely to be somehow biased.