Datafilos

úterý 18. února 2020

Civilization 6

One of the most criticized features of the Civ 5&6 is "one unit per tile" (1UPT) limitation - in the older versions of Civ, it was possible to stack an unlimited count of units on a single tile.

The benefit of the change is more opportunities for tactics. For example, it is possible to replay Battle of Marathon in Jadwiga's Legacy scenario - a single well placed pikeman can indefinitely block a mountain passage against a hoard of knights, till only a single knight can attack the pikeman per turn. Or you may place a city between two large lakes, like real-world Madison-Wisconsin, making it difficult to "flank" the city and put it in to the siege. And of course, 1UPT fixes the infamous Stack of Doom issue.

On the other end, it makes logistics difficult. It happened to me multiple times that I ordered a unit to move forward, only to watch it in horor how it moves backward because all the "resting places" in the forward direction are already occupied by my units. A simple mitigation of the described nuisance is to introduce a keyboard shortcut particularly popular in Microsoft Word: Ctrl+Z. The implementation can take an inspiration from The Battle for Wesnoth turn based strategy, which gives the user the opportunity to take back a miss-click. But to prevent the abuse of the feature by the players, it is not possible to take back a move that uncovers part of a map or a move that lifts fog-of-war. Westnoth also prevents you from taking back a move, which results into a combat. Since Civ 6 still uses randomness in the damage calculation, this limitation would have to be ported as well. The long loading times because of a silly miss-click would be mostly just part of the distant past...

Another irritating thing is that sometimes a victory (or loss) is inevitable. But you still have to get thru many turns to get there. Some people enjoy this part of the game. But I find it boring. It would be nice to have something like "auto combat" or "quick battle" from Heroes of Might & Magic III that would allow you to watch the march to the victory from the resting position of your seat or that would allow you to just skip to the victory screen because you really need some sleep time. Didn't you get the outcome you expected? No worry, just press the sweet Ctrl+Z combo and you can show the computer how it should have been done.

The biggest advantage of having the "fast end" option is that as a developer you do not have to stress yourself so much with fine-tuning the winning conditions. Does the game slip too frequently into the war-of-attrition? Just add "fast end" option. Problem mitigated. Release the game. And leave the re-balancing on the expansion pack.

And finally, the game logic for city states and civilizations should be the same thing. City states are great. But I want to be able to:

Befriend not only city states but also civilizations with amenities.
Get unique bonuses not only from being suzerain of city states, but also from being a friend to civilizations.

And reversely, civilizations are great, but I want to be able to:

Select not only civilizations that appear in the game, but also city states.

All these issues can be nicely resolved by adding a switch into the "create new game" GUI next to each civilization, which would "cripple" the selected civilization to a city state. This switch would:

Remove the leader and her/his traits. But the civilization traits would stay there.
Limit the count of cities to one.
Adjust things like loyalty bonus,... to actually make the city states work.

neděle 15. prosince 2019

How to kill processes without the necessary privileges

Windows have one strange property: the shutdown is not an atomic operation. Hence, if you do not have the privilege to terminate programs (like an antivirus on a corporate machine) but still have a privilege to perform shutdown (quite common on laptops), you may still succeed in killing the unwanted processes.

The procedure:

Open Excel.
Invoke Windows shutdown.
Windows will tell you that Excel has unsaved documents. Do nothing. Just wait until all unwanted processes are killed.
Cancel the shutdown.

How to mitigate this security weakness:

Run regedit
Go to HKEY_CURRENT_USER\Control Panel\Desktop
Set AutoEndTasks to 1

pátek 22. listopadu 2019

Old products were reliable, the new one not so much

I hear this line pretty frequently. But the data do not support this statement. One of the possible explanations of the belief that older products were more reliable is selection bias.

Products have a variable lifespan. Hence, when we use a century-old item, the item does not break easily, because it was stress-tested for a century and the weaklings were weaned a long time ago. On the other end, when we use a new item, there is a good chance that it won't survive long because it wasn't stress-tested and weaned for a century like the old products that survived to these days.

Another factor is the variable variance of durability. When the distinct count of the manufacturers that produce the product is high and when the manufacturers are independent of each (e.g.: they use only local resources), we may expect high variance in the durability of the products. On the other end, if there is just a few manufacturers or if they all use the same components or design, we may expect low variance in the durability of the products. Hence, some of the products that were produced during the high variance period are likely going to have outstanding durability (just like it is likely that some of them had really terribly low durability). The "issue" with the new products is, that we currently live in a fairly globalized world and many technologies that we use daily were commoditized (standardized and made widely available). Hence, many new items that we use daily have a fairly predictable lifespan. A lifespan, which does not span centuries (because that would be overkill). Consequently, we may sometimes find "indestructible" items that were created when the technology was new. But keep in mind that just like "indestructible" items were produced, there were also "rubbish" items, which were quickly thrown out.

Overall, the data suggest that the quality of manufacturing keeps improving over time. But thanks to selection bias and variable variance, the reverse appears to hold for the item-user.

Addendum: We could model it analytically. For simplicity, assume that probability of a product failure follows log-normal distribution (I picked this distribution because it fulfills three basic properties: it does not allow negative values, it has a long tail and people are familiar with it).

Selection bias can be then illustrated as a difference between probability that a new product fails in the next 10 years vs. a probability that a 100 years old product fails in the next 10 years. The first probability is going to be large, because log-normal distribution is "fat" at the beginning. But once we get to the tail, the derivative of the distribution is going to be close to 0. In other words, if a product survives 100 years, it is actually more likely that it will fail after 10 years than during the next 10 year period.

The variable variance can be illustrated with an observation that whenever an engineer doesn't know how to accurately estimate something, he/she prefers to overestimate the parameters and build the thing robustly. However, sometimes the initial design has a flaw, which reduces the lifespan of the product (hence, the peak is wide - the product can last long but also fail quickly). The next phase is fixing these flaws. But everything else is left as before (the fat from the beginning is removed but the fat tail is preserved - this is the period from which we observe many "eternal" products). Over the time, the product is price optimized (the fat tail is removed - the products have a predictable lifespan without extremes and they all look like garbage in comparison to the eternal products of the past).

čtvrtek 14. listopadu 2019

An app for climbing shoe recommendation

One of the most important factors of a good climbing shoe is a good fit. Unfortunately, human feet vary greatly. Feet vary not only in the length, but also in the length/width and length/height ratios, toe lengths:

and deviations like bunions, hammer toes and so on. In the case of walking shoe, a single "size" measure is enough to guarantee a good enough fit, i.e.: the shoe doesn't slip but it also doesn't hurt anywhere. But a single measure is enough for walking shoe only because in walking shoe we tolerate wast spaces between the feet and the shoe (e.g.: between the toes and the shoe). In the climbing shoe, each such empty space results into degradation of the climbing performance because our feet do not have a good contact with the rock at that particular "empty" spot. Hence, power climbers generally prefer as snug fit, as they can handle.

Ideally, an experienced shop assistant should be able to recommend a well fitting climbing shoe based on the look at the client's foot. Shoe, which can provide a snug fit on the client's feet without causing deformities. But my experience is that the salesperson is commonly (and naturally) biased toward the shoes that fit him/her well. Only exceptionally you encounter an expert, who can overcome the bias. But these experts generally (and naturally) work for some brand and if this brand does not make shoe for your type of feet, you are out of luck.

Hence my proposal: an app in a phone, which would take a photo of your feet (a self-photo when you are standing barefoot on the floor), perform some rudimentary calculations (like length-to-width ratio, the ratio of individual toe lengths to the feet length,...), provide some result illustrations in order to persuade the users that your app actually does something (e.g.: overlay the user's foot outline to the prototypical foot) and display shoe ranked from best fit to the worst fit.

There are three obstacles in order to get it working:

Data collection
Monetization
Machine learning

Data collection

First, you have to have some data in order to feed the recommendation system. The best option would be to contact some climbing shoe manufacturer, present them your aim and ask them for their shoe profiles. I do not think that the biggest manufacturer's like La Sportiva are going to be supportive (they may see it as too risky). But the small manufacturer's may see it as a good opportunity for shoving progressiveness and improving their visibility without risking too much. Plus, thanks to the fact that they are small, they can make their decision quickly. And finally, there are many small manufacturers and they vary wildly - it would be surprising if neither of them was eccentric enough to provide you with the data (or allow you to take photos of the shoe lasts...).

But the initial measurements are just the beginning. You also have to track what people actually buy and what do they return. And based on that alter the recommendations.

Monetization

Second, if you want the app to be successful in the long term, it has to make money. Or it will eventually die due to technological obsolescence (and lack of your motivation to keep it alive on your side). In this case, the monetization model is simple: a shop/brand that can provide good recommendation will make better sales. The reasoning is simple: whenever a customer has doubts which product to buy, they prefer to postpone their action. And whenever they postpone their action, you risk that they will eventually perform the action (the purchase) somewhere else or that they do not perform the action at all (they stick with their current shoe or they even completely abandon climbing). Hence, the app should allow purchase and return realization in order to collect data and make money.

Machine learning

The app must be simple and fast to use. Hence, it is a good idea to not require any measurement with a ruler - a single photo of the foot/feet should be enough.
But how to process the photo? I would simply train a convolutional neural network to identify, which pixels belong to a feet and which to the background. I would collect training images of the feet from internet and build the ground truth masks either in Photoshop (for tough photos) or with local thresholding methods (Savuola thresholding for photos with clean background). Since everything relies on getting a good outline of the feet, I would also instruct the users to get the photo of their feet on white monolithic background like a paper or a wall. The paper has the advantage that you can estimate the size of the paper (A4 or letter) based on the paper side ratio and use it for feet size estimation (toe to heel) and perspective correction (as the camera does not always have to be at the same position and always point in the same direction) based on the knowledge that the paper should have right angles.

Once we have an outline of the foot (let's say of the right foot), we should normalize the outline. This is important for visualization (the overlay the user's feet to the prototypical feet) and for feature extraction. I would simple use affine projection of an ideal foot outline to the obtained outline in OpenCV (or whatever is your favourite tool).

Once we have the user's normalized foot outline, we can compare overlay of the user's foot to the inner shape of the shoe. Ideally, there should be a perfect overlap. And the measure of the overlap can be used for ranking of the shoe.

Latter on, once we collect enough data from sells and return records, we could even retrain the convolutional network to directly rank the shoes based on the foot photo. The idea is, that the photo may contain more information than the outline alone. And that the neural network could be better in deciding, which parts of the feet tolerate overly tight/loose fits and which not so much.

Evaluation

At the beginning, the goal will be to get repeatable results. I.e.: when we snap two photos of the same foot, we expect to get the same foot outline and the same shoe ranking. On the other end, when we snap two wildly different foots, we expect different outlines and different shoe ranking.

Latter on, we can simply maximize profit (sale margin minus the returns).

Edit

It looks like that there is already at least one company that takes body measurements thru camera: Menro, which makes smart suits. As a reference of the body size, they take A4 paper with 2 corners blacked to get a good contrast against a (likely white painted) wall.

pondělí 28. října 2019

Student performance

Once, a professor told me: "A student must be either smart or diligent to perform well". Back in time, I thought that it is an oversimplification. But I was wrong.

When I built a predictive model, which predicts whether a student passes to the next term based on their past performance (~100 features), the model had AUC=0.8. Not bad. But neither useful. What stroked me: the model could be simplified to 2 features:

Performance in an off-topic but obligatory course that does not require anything more than handing out simple homework in time (the teachers were realizing that the course is off-topic for these students and treated the course more like a recruitment opportunity rather than an opportunity to filter out bad students).
Performance in a mathematical logic course.

This simplified model had AUC=0.78, which is not much worse than 0.8 of the full model. When visualized, the model looked like:

On the horizontal axis, we have the performance from the off-topic course, which exercises student's diligence. On the vertical axis, we have the performance from mathematical logic, which exercises student's intelligence.

There are 3 interesting takeaways:

The professor was right. A student has to be either smart or diligent in order to perform well. And while I was right as well - it is a simplification - it is also an extremely practical and accurate simplification.
The student does not have to be smart and diligent in order to perform well. It is enough if the student is smart or diligent. This is something that I didn't expect. But the shape of the decision border speaks for itself: the North-West and South-East corners are green, not red. And the decision border is convex, not concave. Hence, this takeaway cannot be dismissed just by saying that the teachers are just "too soft".
The decision border looks like an arch and not like a line. This is also surprising, because it suggests that whenever we have multiple unrelated scores, we should move away from Manhattan space (where we just sum the scores) to Euclidean space (where we first square the scores). Some universities already somehow do that - if an applicant is outstandingly good in sport or art, the applicant is preferred. But the scores from Math and Languages are still generally just summed.

neděle 27. října 2019

File format for Helmer GPS locator

The file format of 'p_data.txt' is CSV. Unfortunately, the columns do not have a label. The inferred meaning of the columns follow in hope that it will save someone a few minutes:

IMEI
The operation mode (monitor/tracker)
Timestamp in YYMMDDHH24MISS format
?
?
Time in HH24MISS format
?
Latitude in Degrees Decimal Minutes format
Latitude hemisphere (N/S)
Longitude in Degrees Decimal Minutes format
Longitude hemisphere (E/W)
Speed
Direction in degrees

úterý 22. října 2019

Braess's Paradox

Braess's paradox states that when we add a new path into a network, it may decrease the total flow through the network.

The paradox can be nicely illustrated on a spring. However, beware that Nature's illustration of the spring experiment is flown:

The lengths of the springs should be different when we cut the rope (while the lengths of the remaining ropes should remain unchanged). The corrected illustration:

On Wikipedia, you can read that you can also observe it in electrical networks "at low temperatures using a scanning gate microscopy":

But you don't need a fancy technology to observe the paradox in electrical circuits. Nature's article actually got close to a "simple circuit":

But it requires a source of constant current. A simpler schema, which utilizes a source of constant voltage, is:

We just need 2 light bulbs, 2 resistors, a 9V battery (or any other source of constant voltage) and an ampermeter to measure the change of the current. Optionally, we may include a button or switch. But a piece of wire is enough.

How does it work? Incandescent bulbs have an interesting property: when they are cold, they work almost like a short-circuit. Only once they heat up, their resistance increases to the nominal value (the diagram is for a 60W light-bulb):

The nominal resistance of the light bulb in our schema is 5V/(0.45W/5V)=56Ω. When the button is open, the nominal current thru a single light bulb is 9V/(56Ω+68Ω)=0.073A. But once we close the button, more current goes thru the light bulbs than thru the resistors, because the light bulbs have nominal resistance of 56Ω while the resistors have 68Ω. But the increased current thru the light bulbs heats up the filaments, the light bulb resistance increases and the total current thru the network decreases.

Experimental currents thru the whole network for different resistors:

resistance [Ω]	open [mA]	close [mA]
32	160	160.5
46	140	140
68	125	124
100	112	113

The paradox was observed only for one value of the resistors' resistance (68Ω). The optimal resistance for maximization of the paradox effect is somewhere around 62Ω. If you have a tandem potentiometer that can withstand ~1 Watt (e.g.: an old wire potentiometer), I would be happy to hear what is the actual optimal value and the size of the effect.

Applications: A teaser on an open-day/science-fair in a school. You ask visitors to guess when the circuit is going to consume more energy:

When the button is visibly closed and the light bulbs are shining brightly?
Or when the button is visibly open and the light bulbs are just dim?

The paradoxical answer is that the circuit consumes more energy when the light bulbs are dim.

To puzzle the visitors even more, you may stress that the circuit contains only passive components and that the trick is not in the power supply.