neděle 20. dubna 2014

Hacker in the browser and other hacking ideas

If I was a hacker with the ability to modify web pages in the browser, I would modify the top of Wikipedia and ask people to donate. People accustomed to donations would donate again. But this time to my pocket. If it was synced with the real campaign it would be awesome.

Or if I was Microsoft and detected that two people communication over Skype (or other messenger) are in a relationship, I would suggest the male counterpart to send roses to his fiancé. It would be just a small bunch of roses. But the delivery would be guaranteed in the next 30 minutes for the current estimated location of the fiancé.

And to be even more devilish I would use the fiancé as the vector. I would show a pop-up to the fiancé saying: “Do you want to test your boyfriend? Show him an advertisement with one of the following buckets!” She would be presented with three simple options like snowdrops, violas and sunflowers. She would be thinking about the ad while chatting with the boyfriend and in the end she would click on one of the buckets because who would not like to test the love of the beloved one? And to make sure that the boyfriend is not going to disappoint her (man are notoriously unreliable) she would hinge the boyfriend to buy the flowers until she actually gets the bucket (women can get really persistent when they decide to get something). The poor boyfriend would be then forced to buy the overprized bucket because who the hell should bear all that morning of the fiancé about some stupid inedible vegetable? Sooner he gets over this the better.

The boyfriend would be stressed to buy the specific bucket. But for men it’s hard to multitask (search for a cheaper delivery while chatting with his fiancé). And it would be ridiculously hard to find a delivery with that specific flower (Why the hell didn’t she want roses?!) and even harder to find deliver able to deliver the bucket in less than 30 minutes. Simply Machiavellous.

sobota 15. února 2014

Camera sensor

Recently I have noticed that someone patented a layout of sensors at camera chip that was better tuned to sensitivity of eye. Particularly the patented chip was combining colorless sensors with color sensors. This combination makes sense since human eye is more sensitive to luminance than color. Furthermore the resulting pictures are less noisy because colorless sensors do not filter light.

Nevertheless, the presented design doesn’t exactly follow the sensitivity of human eye. Hence I predict that sooner or later someone will patent the right proportion of sensors without specifying the exact geometric shape. And indeed it’s possible that the exact layout of the sensors will be random (while preserving the right proportions).

neděle 9. února 2014

Zpoveď


Paní H., zazlívám Vám jednu věc. Jak jsme týden co týden psali slohová cvičení, vytvořil jsem si závislost. Kupříkladu ulehnu do lóže, ale neusnu, protože v mysli neustále vylepšuji nějaký příběh. A jediné co pomáhá, je vstát, usednout za židli a vypsat se. Teprve jakmile jsou myšlenky vyexportovány na papír a zvalidovány, že export proběhl úspěšně, mohu se jít věnovat původně zamýšlené činnosti, spánku. U mně vypsat a vyspat často znamená totéž.

Nebo jsem s kamarády a chci se bavit, ale nějaká myšlenka se mi do mysli neustále vrací jako moucha na exkrement. A jediný způsob, jak se jí zbavit, je jí sdělit kamarádům nebo papíru. Kdyby to byly alespoň náležité myšlenky, které by společnost pobavili. Ale ono ne, ty myšlenky jsou akorát tak hodny papíru. Asi jsme měli více konverzovat a méně psát.

The difference between statistics, machine learning, data mining and data science.

Originally there was just statistics - a method how to summarize huge populations into two numbers, average and variance. And with a bit of exaggeration whole statistics is operating with just these two numbers. With these two numbers you can compute significance, confidence intervals, correlation, regression and many other. Back in time it was amazing success - you could operate with millions of records on a single piece of paper.

But with dawn of computers people became less limited in the amount of computation that was deemed practical and they started to think in big. What if we worked with whole population distribution? Or if we run these old trivial statistical tests on this huge pile of data, wouldn’t we find something? These two questions stood at the beginning of two fields, machine learning, evaluating the former question, and data mining, evaluating the later question.

Statisticians with access to computers started to pull nasty tricks like bootstrapping to narrow confidence intervals. And traditional statistics felt threatened. And they started to accuse computer statisticians from cheating. Latter on computer statisticians persuaded traditional statisticians about validity of the approach and statisticians accepted bootstrapping as a useful tool. But disagreements like this led to divergence of machine learning from statistics.

Similarly guys and gals from data mining were targets of many attacks because data mining allowed production of scientific articles with amazing pace – what would have taken whole life of a respected scientist could have been done in less than 5 minutes with a stupid computer. Unfortunately, in this case the despect was deserved because many results of data mining were false positives. Later on data miners learned to use Bonferroni correction and validate results to decrease the rate of false positives, but damage was done. Both, statisticians and machine learners, started to look down upon data miners as kids that learned a few tricks, which they apply without any deeper understanding.

With rise of Internet access to data simplified and the biggest time burden shifted from data collection to data procession. The methods invented by machine learners were hopelessly slow on data from Internet and phones and even methods employed by data miners were too slow to be executed on whole datasets. This change of paradigm led to return to the roots of statistics where people first created hypotheses and then they studied the data to prove or invalidate the hypothesis. But because focus shifted from correctness of methods (they were proved many times since then) to efficient computation of trivial algorithms, new group of statisticians with computer science background emerged. Nowadays we call this group as data scientists.

PS: A quick guide how to differentiate different fields based on the keywords:
  1. State space -> artificial inteligence
  2. Significance -> statistics
  3. Maximum a posteriori (MAP) -> machine learning
  4. Algorithmic efficiency (O-notation) -> theoretical computer science
  5. Cross-validation -> data mining

Povzdech

Otec je posedlý veterány, matkou familiárně přezdívaný vraky, vysavači, televizemi a měřící aparaturou. Matka je zase posedlá květinami, otcem familiárně přezdívány jako plevel, porcelánem a sklem. O oba mají tendenci své sbírky rozšiřovat, třebas i na úkor toho druhého. Takže když jeden odjede na chvíli pryč, druhý toho využije a posune demarkační čáru. Například když matka odjede na týden pryč, otec si pořídí nový vrak a jako by se nechumelilo, umístí ho na matčin trávník. A matka po návratu spíná ruce, protože vrak veterán se počůral a vytvořil olejovou loužičku, takže i když se vrak odsune, trávník je už zničen. A naopak, když otec odjede, matka vyhodí staré pneumatiky a zasadí místo nich vzrostlý strom. A otec potom žalostně lamentuje, že to byly ještě dobré pneumatiky a že je potřebuje. Protože ale maminku miluje, strom tam ponechá, jen ho obloží novými pneumatiky, takže strom se pokrucuje, jak se snaží skrze pneumatiky dostat ke světlu.

Přeji si, aby rodiče nikdy neumřely, protože etická likvidace nahromaděného majetku by byla vyčerpávající.

pátek 7. února 2014

My knowledge of RapidMiner

My knowledge of operators in RapidMiner:
  • Process Control (10/39)
  • Utility (7/54)
  • Repository Access (2/6)
  • Import (2/28)
  • Export (1/18)
  • Data Transformation (42/115)
  • Modeling (27/66)
  • Evaluation (11/32)
Overall, I know around 28% (102/358) of operators in RapidMiner.

středa 29. ledna 2014