Datafilos: října 2019

pondělí 28. října 2019

Student performance

Once, a professor told me: "A student must be either smart or diligent to perform well". Back in time, I thought that it is an oversimplification. But I was wrong.

When I built a predictive model, which predicts whether a student passes to the next term based on their past performance (~100 features), the model had AUC=0.8. Not bad. But neither useful. What stroked me: the model could be simplified to 2 features:

Performance in an off-topic but obligatory course that does not require anything more than handing out simple homework in time (the teachers were realizing that the course is off-topic for these students and treated the course more like a recruitment opportunity rather than an opportunity to filter out bad students).
Performance in a mathematical logic course.

This simplified model had AUC=0.78, which is not much worse than 0.8 of the full model. When visualized, the model looked like:

On the horizontal axis, we have the performance from the off-topic course, which exercises student's diligence. On the vertical axis, we have the performance from mathematical logic, which exercises student's intelligence.

There are 3 interesting takeaways:

The professor was right. A student has to be either smart or diligent in order to perform well. And while I was right as well - it is a simplification - it is also an extremely practical and accurate simplification.
The student does not have to be smart and diligent in order to perform well. It is enough if the student is smart or diligent. This is something that I didn't expect. But the shape of the decision border speaks for itself: the North-West and South-East corners are green, not red. And the decision border is convex, not concave. Hence, this takeaway cannot be dismissed just by saying that the teachers are just "too soft".
The decision border looks like an arch and not like a line. This is also surprising, because it suggests that whenever we have multiple unrelated scores, we should move away from Manhattan space (where we just sum the scores) to Euclidean space (where we first square the scores). Some universities already somehow do that - if an applicant is outstandingly good in sport or art, the applicant is preferred. But the scores from Math and Languages are still generally just summed.

neděle 27. října 2019

File format for Helmer GPS locator

The file format of 'p_data.txt' is CSV. Unfortunately, the columns do not have a label. The inferred meaning of the columns follow in hope that it will save someone a few minutes:

IMEI
The operation mode (monitor/tracker)
Timestamp in YYMMDDHH24MISS format
?
?
Time in HH24MISS format
?
Latitude in Degrees Decimal Minutes format
Latitude hemisphere (N/S)
Longitude in Degrees Decimal Minutes format
Longitude hemisphere (E/W)
Speed
Direction in degrees

úterý 22. října 2019

Braess's Paradox

Braess's paradox states that when we add a new path into a network, it may decrease the total flow through the network.

The paradox can be nicely illustrated on a spring. However, beware that Nature's illustration of the spring experiment is flown:

The lengths of the springs should be different when we cut the rope (while the lengths of the remaining ropes should remain unchanged). The corrected illustration:

On Wikipedia, you can read that you can also observe it in electrical networks "at low temperatures using a scanning gate microscopy":

But you don't need a fancy technology to observe the paradox in electrical circuits. Nature's article actually got close to a "simple circuit":

But it requires a source of constant current. A simpler schema, which utilizes a source of constant voltage, is:

We just need 2 light bulbs, 2 resistors, a 9V battery (or any other source of constant voltage) and an ampermeter to measure the change of the current. Optionally, we may include a button or switch. But a piece of wire is enough.

How does it work? Incandescent bulbs have an interesting property: when they are cold, they work almost like a short-circuit. Only once they heat up, their resistance increases to the nominal value (the diagram is for a 60W light-bulb):

The nominal resistance of the light bulb in our schema is 5V/(0.45W/5V)=56Ω. When the button is open, the nominal current thru a single light bulb is 9V/(56Ω+68Ω)=0.073A. But once we close the button, more current goes thru the light bulbs than thru the resistors, because the light bulbs have nominal resistance of 56Ω while the resistors have 68Ω. But the increased current thru the light bulbs heats up the filaments, the light bulb resistance increases and the total current thru the network decreases.

Experimental currents thru the whole network for different resistors:

resistance [Ω]	open [mA]	close [mA]
32	160	160.5
46	140	140
68	125	124
100	112	113

The paradox was observed only for one value of the resistors' resistance (68Ω). The optimal resistance for maximization of the paradox effect is somewhere around 62Ω. If you have a tandem potentiometer that can withstand ~1 Watt (e.g.: an old wire potentiometer), I would be happy to hear what is the actual optimal value and the size of the effect.

Applications: A teaser on an open-day/science-fair in a school. You ask visitors to guess when the circuit is going to consume more energy:

When the button is visibly closed and the light bulbs are shining brightly?
Or when the button is visibly open and the light bulbs are just dim?

The paradoxical answer is that the circuit consumes more energy when the light bulbs are dim.

To puzzle the visitors even more, you may stress that the circuit contains only passive components and that the trick is not in the power supply.