pondělí 25. listopadu 2024

Better debugging UI for CTE

One of the great advancements in SQL was introduction of Common Table Expressions (CTE), which made creation of long queries without creating intermediate tables on disk (or temporary tables if the database supports it) convenient.

However, debugging of long CTEs is difficult. To find the bug, I currently use bisection. I modify the query to show the first rows of some intermediate CTE somewhere in the middle of the query. If the CTE result looks OK, the bug is somewhere in the second part of the query. If the result does not look OK, I know that the bug is in the first half of the query. And this process repeats recursivelly, until I pinpoint the single offending CTE.

This approach works well on quickly running queries. However, once each query execution takes long time, the debugging drags as it requires multiple executions to just find the offending CTE, let alone to fix the CTE.

A possible mitigation is to enhance the query editor. Just like some query editors allow you to fold CTEs, the editor should present these information for each CTE:

  1. the CTE's row count
  2. an icon, which on click opens the CTE result

These informations should appear in the editor progressively, as they get available. With this approach, identification of the offending CTE should require, at most, 1 complete query execution. If you are lucky, just by looking at the row count of the intermediate CTEs, you spot the issue while the query is still running.

Technical limitations:

  1. The database ought to support temporary tables. This is needed for a simple implementation of the CTEs' result preview.
  2. While in the debugging mode, optimisation techniques that work across CTEs, like predicate pushdown, will not be applicable. This is acceptable as programmers expect a slow-down in the debug mode. However, it means that there has to be an additional "Debug" button next to the ordinary "Run" button. These two modes can't be mixed.

Future enhancements:
Do we want to store whole CTE results or only the top 1000 rows? If we store whole tables, we can use that to warm start the query execution, once we modify the query. The disadvantage of storing the whole results is that even a single query in the debugging mode can exhaust disk space and consequently prematurelly terminate, while in the ordinary run mode the query may finnish without any issue. Hence, the limit mode will likely have to be implemented. Nevertheless, for testing of the core idea, the limit/warm-start functionality is not needed.

Žádné komentáře:

Okomentovat