|
Business Intelligence Series |
Introduction
You received a (long) query for troubleshooting, reviewing, conversion or any similar tasks. In addition, you don’t know much about the underlying table structure or business logic. So, what do you do then? For sure two things are intuitively clear: you don’t need to panic and, understanding the query may help you in your task. Understanding the query, it seems such a simple statement, though there is more to it. Here are some points on how to approach a query.
State your problem “A problem well stated is a problem half solved” (Charles F. Kettering). Before performing any work, check what’s requested from you, whether you are having the information required for the task(s) ahead, for example documentation, valid examples, all code, etc. If something is missing, don’t hesitate to request all the information you need. While waiting for information, you can continue with next steps. As we don’t live in a perfect world, there will be also cases in which you’ll have to fill the gaps by yourself by performing additional research/work. When troubleshooting is important to understand what’s wrong and, when possible, have data against which to compare query’s output.
Save the work
Even if you are having a copy of the query somewhere on the server, save the previous version of the query and, when possible, use versioning. It might seem a redundant task, however the fact is that you never know when you need to refer to it and, as you’ll see next, it can/should be used as a baseline for validating the changes. In case you haven’t saved the query, check whether your RDBMS is tracking metadata about the queries run and, if the metadata were not reset in the meantime, you might be lucky enough to find a copy of your query.
I found that is important to save the daily work, the various analysis performed in order to understand a query, the various versions and even the data used for testing. All this work could help you letter to review what you made, the steps you missed, you can reuse one of the queries for further work, etc.
Break down
When the query is too complex, it could be useful to break the query into chunks that could be run and understood in isolation. Typically such chunks derive from query’s structure (e.g. inline queries, subqueries derived from unions). I found that often, focusing only a chunk of a query help isolating issues.
Restructure
Many programmers still write queries using the old non-ANSI joining syntax in which the join constraints appear in the WHERE clause, making the understanding and troubleshooting of a query more difficult. Often I found myself in the position of transforming first a query to ANSI SQL syntax, before performing further work on it. It’s actually a good occasion to gain a first understanding of query’s structure, but I’d prefer not to do it so often. In addition, during restructure phase it makes sense to differentiate between the join and filter constraints, this helping isolating the issue(s).
Check cardinalities
Wrong join constraints lead to duplicates or fewer records than expected, such differences being difficult to track when the variances in the numbers of records are quite small. Even if RDBMS come in developers’ help by providing metadata about the join relations, the columns and predicates participating in a join are not always so easy to identify. Therefore, in order to address this issue, it’s needed to check the constraints between any two tables between participating in a join. Sometimes, when the query is based on the table with the lowest level of detail, it can be enough to check the variations of the number of records.
Check filter constraints
Filter constraints are maybe more difficult to identify, especially when is needed to reengineer the logic built in applications. Many of the filter constraints are logical, though when you have no documentation about the schemas, is like rambling in the dark, having to check real examples and identify the various values and the impact they have on the behavior of your report.
Validate changes
So, you made the changes, everything looks perfect. Is it so? Often your intuition might tell you that the logic of a query is correct, though as software is not based on magic, at least not all the time, check some of the records to assure that the data are rendered as expected, check totals, compare the current with previous version, identify variations, etc. You don’t need to use all the technique you know, but to choose the best and minimal set of tools that allows you to validate the query.
Perform refactoring
Refactoring, the way to (continuous) code improvement, should become part of each developer’s philosophy about programming. A query, as any other piece of code, is rarely perfect as technical and factual knowledge is relative, features get deprecated and new techniques are introduced. On the other side, there is an old saying in IT – don’t change something that’s already working, so, there should be kept a balance between the two – the apparent and needed for change.
Document
I hope it’s not the case to stress the importance of documentation. From versioning to logic description, it’s a good practice to document the important parts of your work, especially the parts that will facilitate later work.