Datafilos

úterý 25. června 2019

Drug discount regulation

This is a reaction to The insulin racket: why a drug made free 100 years ago is recently expensive.

The problem that my proposal aims to fix:
The difference between the "official list price" (which has gone up) and the "price actually paid by large providers" (which has gone down).

The proposal:
The proposal calls for the same cost of a drug for everyone (in the US) regardless of whether the subject is an uninsured person or a large insurance company.

Sure enough, some order-quantity discounts would be permissible but only up to a capped rate. E.g.: 10% discount for ordering a million pills is ok. But 99% discount is not ok.

Comparison to alternatives:
"Pre-restitution" could solve the issue as well. But it is politically hardly acceptable (the last time when a pharmaceutical company was nationalized in the US was, I think, during the second world war). Discount regulation, on the other end, preserves the ownership.

Cap on the cost could work as well. But it is tough to set a fair price, particularly for new drugs. While discount regulation does not eliminate the risk of bad pricing, it shifts the responsibility of the good pricing to drug manufacturers. Hence, discount regulation is less risky for lawmakers than the price cap. And the drug manufacturers are better positioned to quickly adjust a bad price, as the price itself remains unregulated.

Single-payer healthcare could work as well. The issue is that it is a big change. And big changes are risky. Sure enough, the discount cap is a big change as well. But it is going to negatively impact just "pharmacy benefit managers". While single-payer healthcare would negatively
impact both, pharmacy benefit managers and insurance companies.

Additional advantages of the discount regulation:

In principle, the proposal is item agnostic. While it makes sense to initially limit the scope of the bill to a few troublesome drugs in order to limit the impact of unexpected consequences, there is no evident reason why the bill should not, in the end, apply to all treatments.
It eliminates a lot of "unproductive" hassle associated with the price negotiation. Essentially, it could reduce industry of pharmacy benefit managers (they would still exist - medical care is not just about the cost of treatments but also about cost-benefit analysis, which will remain a valuable tool in presence of multiple treatments).

What it does not attempt to solve:
The proposal aims to lower the gap between the price of the drug for an uninsured person and the price for an insurance company. But it does not attempt to decrease the price of the drug for the insurance company. I.e.: a drug that is already too expensive for an insurance company will remain to be too expensive. But that is politically acceptable because the drug will remain expensive for _everyone_.

Corner scenarios:
People are extremely creative in avoiding the law. Hence, the law would have to strike the balance between the generality of the formulation and enforceability.
Examples that should be illegal:

The manufacturer decides to stick with the high price. But in order to compensate it for the insurance company, they will provide some drugs "that are close to expiration" for free.
The manufacturer decides to stick with the high price. But time-to-time it decreases the price just for a moment until it signs a contract with the insurance company.
The manufacturer decides to stick with the high price, but only for the US. Hence, the insurance companies will simply import the drug for a much lower price.

As I am not skilled in law-speak, I do not provide a suggested law formulation. Nevertheless, it is reasonable to assume that the amount of the text necessary to describe the discount cap is going to be lower than in the case of the cost cap (as the cost cap has to provide a unique cap for each drug).

pátek 22. března 2019

How to fix backlight bleed

Sometimes a laptop display gets damaged by pressure or water. And the damage manifests itself by the presence of bright spots:

The best way how to get rid of the white speckles is to change the display. But that can cost like a new laptop. A fast and free solution is to use a dark theme thru the system - the bright spots are not so irritating on the dark background as they are on bright background. But if we are determined to stick with the light theme, we can overlay the screen with an image, which is dark where the speckles are and otherwise is transparent. Note that this solution is not perfect - when you see bright spots on the display, it is because the backlight layer got damaged. And because there is some nonzero space between the backlight and the pixels, we can get a perfect alignment only for a single observation point - when we move our head a bit, the overlay image gets slightly misaligned and the edges of the speckles will be visible. Furthermore, it took me a few hours to get acceptable results with this approach.

Walkthrough for using the overlay method:
    1) Get an application that can permanently overlay an image over the screen. On MacOS, I used Uberlayer.
    2) Make a good photo of the display with completely white background. The camera should be roughly at the position, from which you commonly look at the screen. Also, we want to take a longer exposition (e.g.: 1 second) in order to avoid capturing the display refresh deformations (they look like a moire pattern on my display). Hence, a stative can be handy. Finally, it can be better to take a photo in a darkened room in order to avoid capturing reflections on the display.
    3) Convert the photo to shades of shade.
    4) Write down the position of the bright spots on the photo in pixels.
    5) Write down the position of the bright spots on your display in pixels. I used Ruler.
    6) Align the photo to the screen with a combination of RANSAC and projective transformation.
    7) Homogenize the illumination of the photo.
    8) Invert the color of the image - bright spots will become dark spots.
    9) Set the transparency.
10) Use the generated image as the screen overlay.

The script in Matlab:
% Load data
original = rgb2gray(imread('photo.tif'));

% Location of spots on the photo (as read from: imshow(original))
% From: left x top.
photo = [
    556     1125    % the brightest speckel
    101     961     % the single speckel on the left
    61      1578    % the bottom left single pixel
    2422    1161    % right: the top bright spot
    2465    1216    % right: the lowest bright spot on right
    1065    698     % middle: the brightest speckel (north west)
    15      31      % corners...
    2545    1637
    22      1630
    2548    67
];

% Location of bright points on the display
% Averaged from 2 measures
screen = [
    302.5   611     % the brightest speckel
    47      523     % the single speckel on the left
    23      870     % the bottom left single pixel
    1367.5 627     % right: the top bright spot
    1393.5 658.5   % right: the lowest bright spot on right
    589     368     % middle: the brightest speckel (north west)
    1       1       % corners...
    1440    900
    1       900
    1440    1
];
% Visualization
figure('Name', 'Photo')
imshow(original);
hold on
plot(photo(:,1), photo(:,2), 'or')

% Map photo to screen coordinates
rng(2001); % RANSAC is stochastic (it may exclude outliers) -> set seed
[tform, inlierpoints1, inlierpoints2] = estimateGeometricTransform(photo, screen, 'projective', 'MaxDistance', 3);

% Transform the photo
outputView = imref2d(size(original));
image = imwarp(original, tform, 'OutputView', outputView, 'FillValues', 255);

% Crop the image to the size of the screen
image = image(1:900, 1:1440);

% Plot the aligned photo
figure('Name', 'Photo after transformation')
imshow(image)
hold on
plot(screen(:,1), screen(:,2), 'ob')

%% Illumination homogenization
% First, we remove the white speckles. See:
%   https://www.mathworks.com/help/images/correcting-nonuniform-illumination.html
se = strel('disk', 25);
background = imopen(image, se);

% Gaussian smoothing creates nicely smooth transitions (at least with doubles)
smoothed = imgaussfilt(double(background), 25);

% Subtract background illumination from the image to get the foreground
foreground = double(image) - smoothed;

% Remove too dark pixels (essentially the leaking border)
threshold = median(foreground(:)) + std(foreground(:));
borderless = max(threshold, foreground);
imagesc(borderless)

%% Invert the colors
result = uint8(255-(borderless-threshold));
imshow(result)

%% Set transparency
black = uint8(ones(900, 1440));

maximum = max(borderless(:));
minimum = min(borderless(:));
alpha = (borderless-minimum)/(maximum-minimum);

%% Store the image
imwrite(black, 'overlay.png', 'alpha', alpha)

neděle 6. ledna 2019

Matlab

I took me a while to realize why I reproduce results from articles in Matlab (and not in Java, Python or R).

Here are the reasons:

Matlab has succinct syntax for matrix operations, which is similar enough to equations in articles (this is where Java gets beaten).
It is extremely simple to get an example data from the article into the Matlab code - you just copy-paste the data from the table in the article, surround them in square brackets and that's it. You have a matrix. There is no need to separate the numbers with commas (but you can - they are optional). There is no need to separate the individual rows with brackets. It just works. This is where Python and R gets beaten.

Who would guess that great copy-paste support would affect the choice of the language...

Another nice thing is that Matlab (and R in RStudio) autocompletes paths to files and and directories - once again, it is a small thing. But it helps to avoid typos.

neděle 30. prosince 2018

Beta-binomial distribution

Beta-binomial distribution is a distribution, which is not commonly teach at universities. However, it is a nice distribution to know. Examples of usage:

Estimate popularity of a product based on impression and purchase counts.
Estimate quality of a baseball player (section 7.2) based on hit and at-bat counts.

středa 26. prosince 2018

Classification with high cardinality attributes

There are many ways how to deal with high cardinality categorical attributes, like zip code, in binary classification. If the classifier directly supports high cardinality attributes, like CatBoost or mixed models, there is nothing we have to do - we pass the attribute to the classifier and we have a nice day. However, if the model does not ingest high cardinality attributes, we have a problem. And maybe surprisingly, we have a problem not only if we use a classifier that does not accept (polynomial) categorical attributes, like SVM or logistic regression, but also when we use a classifier based on decision tree, like random forest or gradient boosted trees (CatBoost is a noteworthy exception), because decision trees can frequently deal with only up to 64 categories in a categorical attribute (the exact threshold depends on the implementation). When the categorical attribute has more categories, the attribute gets to be ignored.

A common remedy, when we have a high cardinality attribute and the classifier does not accept them, is to use one-hot-encoding (or dummy coding). But it creates a large matrix. And when the classifier does not have a great support for sparse matrices (some implementations of kernel based algorithms like SVM have it), larger datasets are uncomputable with this approach.

But not all encodings for conversion of high cardinality attributes into numerical attributes result into an explosion of attributes. One of such encodings is the estimation of the label y conditional probability given the categorical value x_i: p(y|x_i) for each category i. That means that a single categorical attribute gets to get replaced with only a single numerical attribute. However, this approach is not without it's own issues. Namely, we have to be able to estimate this conditional probability from a very few samples. Following list contains of few methods that can be used to estimate the conditional probability:

Use empirical estimate (simple average of the label, assuming y takes values {0, 1}).
Use empirical estimate with Laplace correction (as used in naive Bayes).
Use leave-one-out (as used in CatBoost).
Calculate lower confidence bound of the estimate, assuming Bernoulli distribution.
Shrink the estimates toward the global label average, as in equations 3 and 4.
Shrink the estimates, assuming Gaussian distributions, as in equations 3 and 5.
Use m-probability estimate, as in equation 7.
Shrink the estimates, assuming beta-binomial distribution.

Experimental results
Testing AUC for points {1,2,7,8} from running 9 different classifiers on 32 datasets:

Conclusion
At alpha=0.05, there is no significant difference.

The impression of machine learning in R

One professor once said, that whenever a student asks him a question, he immediately knows whether the student studies economy, statistics or computer science. If a student asks him what is the return of investment, the student studies economy. If a student asks him what is the sample size, the student studies statistics. And if a student asks him for an edge scenario, the student is a computer scientist.
From the current state of algorithms in R it is evident that many authors are statisticians but not programmers. On the good side, whenever I need an implementation of some obscure statistical algorithm, I know that either I get it in R or I am out of luck. However, package authors in R frequently forget to check for common edge scenarios like division by zero.

I good burn in the past so many times that I have assembled a table with the prior believe that the implementation is not going to blow up when I throw my dirty data at it:

Core R functions like c(): 100%
Core library functions like chisq.test(): 98%
A function in a library implementing a single algorithm: 70%
A function in a library implementing many algorithms: 40%

pátek 7. prosince 2018

Hegemon scenario for Civ 3

The best Civilization 3 contrib scenario that I have ever played is, without hesitation, Hegemon. This is a scenario that you will want to play multiple time. Just each time with a different nation. Notes for one of the weakest nations, Ithaka, follow.

While Ithaka has strong start, their power is limited:

While they have an awesome unique wonder, which is buildable from the beginning, they don't have any other unique building (not even a tribal camp!).
While they start with 4 unique units prebuild, they are not able to create additional unique units.
While the motherland is fertile, it does not contain an Island resource, which is necessary for trade network construction. Hence, forget about trading strategic and luxury resources with other nations.
Ithaka city does not lye on river, while many other capitols do. Hence, the growth of Ithaka is capped, until aqueduct is discovered and build.
Ithaka belongs to Island Greeks. And Island Greeks (together with Lesser Greeks) have inferior units to everyone else (with a few exceptions during the early and mid-game when their units are just as good as everyone).

Nevertheles, contrary to what is said in the guide, it is possible to win with Ithaka.

Ithaka starts with 2 prebuild boats. These boats allows you to quickly uncover the whole coastline and make highly profitable discovery trades.
Thanks to the unique wonder, Ithaka has a lead in production capacity. Use it to build unit generating units, because own units suck. And use the generated units for early attack on weaker nations. There are four goals:

Get their cities. Economically wise, it is cheaper to build attack units and conquer cities than to build settlers.
Get workers from their settlers. They are an easy target.
Get a leader in order to build an army. The army will then allow you to conquer well defended capitols of stronger nations (e.g.: Sparta).
If you are not able to eliminate some nation completely, deny them access to cooper as cooper is necessary for building many mid-game units.

Rush the conquest. Ithaka is greatly positioned to take down nations in the order they get their unique units. Sparta has early unique units. So, take down Sparta before they start to spawn their unique units. Then there are Athens and Thebians with their mid-game unique units. And finally Macedonians with their late unique units.
Focus on nations that have freshly build a wonder and take it from them (it is frequently cheaper to conquer a city with a wonder than to build the wonder).