28 January 2010

Query Writing Best Practices

    In general, when creating queries, the following best practices should be considered:
- Use ANSI-compliant syntax as much as possible.
- Use aliases for all the tables.
- Design for performance reusability and security.
- Reduce unnecessary network traffic.
- Group together the attributes coming from the same table.
- Rename columns in order to avoid confusion, though don’t overreact with this practice.
- Use uniform coding style, formatting and naming conventions
- Use indexed join predicates.
- Learn about the strengths/weaknesses of each feature before using it.
- Use versioning & keep older versions. - Test the queries.
- Write unit tests first [2].
- Handle missing values (NULLs).
- Document database objects inline and specific documents (e.g. Data Dictionaries, Functional Specifications).
- Refactor code
- Use SQL tunings tools.
- Write tiny chunks of code: encapsulate formulas and business logic in functions [2], avoid inline scalar functions.
- Defensive coding: use exception handling, consider all scenarios.

Things to avoid:
- Complex expressions in search conditions [1].
- Join predicates on expressions [1].
- Expressions over columns in local predicated [1].
- Data types mismatches on join columns [1].
- Non-equality join predicates [1].
- Unnecessary outer joins [1].
- Redundant predicates [1].
- Multiple aggregations with DISTINCT.
- Build queries dynamically unless necessary.
- Techniques that use full-table scan: functions that don’t use/perform poor on indexes, wildcards at the beginning of a word.
- Use more attributes/records than needed (particular case: Use * instead of specifying the attributes).
- Using nested views.
- Rely entirely on the code created by wizards and other automation tools.
- UNION unless really needed: use UNION ALL.
- UNIONS instead of conditional-base code (e.g. CASE, DECODE) or self-joins.
- Using temporary tables.
- Server side cursors.
- Procedural queries (e.g. loops, cursors) rather than using set-based queries.
- Redundant logic/code.
- Negations on constraints.
- Code facilitating SQL injection: use parameterized objects.
- Hard-coding values.
- Undocumented functionality.
- Use GROUP BY on final sub-query when it could be used in a sub-query.
- Multiple self-joins/joins to same table instead of GROUP BY.
- Repetitive calls to the same function and same parameters.
- Use constants in ORDER BY clause.
- Create too many versions of the same query.

    Please note that there are situations and situations, a technique not recommended in general could prove to offer better performance than alternatives (e.g. : recursive simulation + temporary table vs. hierarchical self-joins on SQL Server 2000 ), while for others there are several aspects that need to be considered, for example the trade in performance vs. reusability. Even if most of the database vendors adhere to SQL ANSI standard, in the end each functionality could be implemented differently and vendors could provide additional functionality, therefore could be considered specific best practices for each functionality/vendor.

References: [1] IBM. (????).Best Practice - Writing and Tuning Queries for Optimal Performance [Online] Available from: http://www.ibm.com/developerworks/wikis/display/data/Best+Practice+-+Writing+and+Tuning+Queries+for+Optimal+Performance (Accessed: 27 January 2010)
[2] Oracle. (2009). Cleaning Up PL/SQL Practices, by S. Feuerstein. [Online] Available from: http://www.oracle.com/technology/oramag/oracle/04-mar/o24tech_plsql.html (Accessed: 27 January 2010)

Microsoft TechNet. (2010). SQL Server - Best Practices. [Online] Available from: http://technet.microsoft.com/en-us/sqlserver/bb671430.aspx (Accessed: 27 January 2010)

No comments:

Related Posts Plugin for WordPress, Blogger...