Datafilos: března 2015

neděle 29. března 2015

Umělá inteligence

Lidi jsou rádi strašeni. Proto pohádky pro děti jsou tak brutální. Puberťáci se dívají na horory a dospělí na televizní noviny.

A tak když globální oteplování a geneticky modifikované rostliny přestaly během ekonomické krize táhnout, tak se hledal další, ještě nevyčpělý strašák. A tím se stala umělá inteligence.

Osobnosti tím na sebe stáhnou pozornost. Média mají o čem psát. Lidi se pobaví. A nejlepší na tom je, že nakonec se obavy potvrdí.

Když umělá inteligence začínala, předpokládalo se, že za pět let budou počítače rozumět mluvenému slovu. Nejpozději za deset let. Trvalo ale padesát let, než se počítače naučili jen transkribovat mluvené slovo na text. A pokud nejlepším prediktorem budoucnosti je minulost, bude trvat dalších padesát let, než budeme moci bez zardění říci, že, máme umělou inteligenci.

A padesát let, to je optimum pro paniku. Je to dostatečně krátká doba, abychom se o to starali, neboť to postihne naše děti. Na druhou stranu, dostatečně dlouhá, aby měla čas narůst.

Steve se musí úžasně bavit :).

pátek 6. března 2015

How to deal with overfitting

1) Measure it (with cross-validation...)
2) Decrease it
- Get more data
- Decrease the size of the hypotheses space (for example decrease the degree of polynomial in regression, limit the decision tree size, assume attribute independence in Bayes...)
- Introduce bias (for example L1 or L2 regularization in regression, ensembles of different classifiers, operators background knowledge,...)

neděle 1. března 2015

Why did I resignate on writing thorough SQL translater

While it's easy to translate common stddev_samp into stdev when MSSQL is used, the situation can get more complicated. The first level of complication are time data types. Let's consider adding a month to the date several databases:

MSSQL: DATEADD(MONTH, @amount, @date)
MySQL: DATE_ADD(@date, INTERVAL @amount MONTH)
Oracle: ADD_MONTHS(@date, @amount)
PostgreSQL: (@date + INTERVAL '@amount MONTH')

Now, it's not mere find and replace (once we get rid of entities).

The second level of complication are missing functions like correlation:

(Avg(@numericalColumn * @timeColumn) - Avg(@numericalColumn) * Avg(@timeColumn)) / (StdDev_Samp(@numericalColumn) * StdDev_Samp(@timeColumn)) "@columnName"

It's verbose, but doable. However, there is so many functions, for example, in Oracle, that it would be too much work for one person to reach completeness. And if I can't do it myself, I have to rely on the work of others. And if I am relying on the work of others, I have to make it as approachable as possible.