Datafilos: How to identify faulty sensors in home automation

Home automation tend to be unreliable as it is composed of many individual components. If each component has 99.9% reliability, we have 100 statistically independent components, then we have only 99.9^100=90.5% chance that everything works. That is not much. Hence, we need to monitor the components and alert when they brake down.

I use 5 types of heuristics to trigger an alert that there is an issue with the sensor. For illustration, I will explain them on a wireless battery powered thermometer sensor:

Too low/too high. When temperature in a boiler is above 90˚ C or below 30˚C, it is a sign of troubles.
Unavailable. When the temperature reading is unavailable for 15 minutes, battery in the sensor died or the sensor got water damaged.
Stuck. When the temperature reading AND signal strength (RSSI) is stuck at the same value for over 24 hours, the battery voltage is too low or the sensor needs to be restarted. I like to combine multiple sensors from the same device to decrease the false alert probability when it is plausible that the same value reading is legal. For example, the temperature in my basement is constant as long as no one opens the basement doors. Hence, if I measured only temperature, it would result in false alerts. On the other end, the battery powered thermometer next to the server is so close that RSSI is always at the maximal value. Hence, an alert based on RSSI alone would give false alerts. By requiring temperature AND RSSI to be stuck, I can use the same code for all my thermometers without false alerts. And I do not have to think about whether the particular sensor has more variable temperature or RSSI.
Rapid change. Generally, when the temperature is increasing too quickly, it might be a sign of fire. Or that the battery is almost empty and the sensor became erratic.
Noisiness. When the standard deviation of the temperature over the last hour is too high, the battery is dying.

Nevertheless, my single favorite alert trigger for network based sensors is a check whether the sensor's web page is loading, or not. It happened to me multiple time that a sensor was answering on ping. But otherwise the device was unresponsive. Hence, a ping is not sufficient. However, a simple HTTP status code check so far worked reliably and universally across all my network devices.

Does it mean ping alerts are useless? No. Once I had a faulty device. So I filled a warranty claim. But the claim was denied because "I have unreliable network and should hire professionals to fix it". So I presented them with the ping and HTTP status code historical logs for the device. The device was answering on ping. But web was 404. This was enough for them to accept the warranty claim. If I didn't have the ping logs, they could have claimed that the Ethernet cable was faulty... But the fact that ping worked continuously for a week silenced them. Hence, having multiple alert triggers, even if they partially overlap, payed off for me.

Datafilos

čtvrtek 25. prosince 2025

How to identify faulty sensors in home automation

Žádné komentáře:

Okomentovat