When I built a predictive model, which predicts whether a student passes to the next term based on their past performance (~100 features), the model had AUC=0.8. Not bad. But neither useful. What stroked me: the model could be simplified to 2 features:
- Performance in an off-topic but obligatory course that does not require anything more than handing out simple homework in time (the teachers were realizing that the course is off-topic for these students and treated the course more like a recruitment opportunity rather than an opportunity to filter out bad students).
- Performance in a mathematical logic course.
On the horizontal axis, we have the performance from the off-topic course, which exercises student's diligence. On the vertical axis, we have the performance from mathematical logic, which exercises student's intelligence.
There are 3 interesting takeaways:
- The professor was right. A student has to be either smart or diligent in order to perform well. And while I was right as well - it is a simplification - it is also an extremely practical and accurate simplification.
- The student does not have to be smart and diligent in order to perform well. It is enough if the student is smart or diligent. This is something that I didn't expect. But the shape of the decision border speaks for itself: the North-West and South-East corners are green, not red. And the decision border is convex, not concave. Hence, this takeaway cannot be dismissed just by saying that the teachers are just "too soft".
- The decision border looks like an arch and not like a line. This is also surprising, because it suggests that whenever we have multiple unrelated scores, we should move away from Manhattan space (where we just sum the scores) to Euclidean space (where we first square the scores). Some universities already somehow do that - if an applicant is outstandingly good in sport or art, the applicant is preferred. But the scores from Math and Languages are still generally just summed.