17 April 2010

Learning SQL

Introduction

    One of the topics that appears in many database related groups or forums is from where and how new comers should start to learn SQL (Structured Query Language) in general, and specific vendor SQL features in particular. Two years back, in Learning something new post from my programming-grasps blog attempt, I wrote down my thoughts on how to approach learning new subjects in general, many of the respective observations could be applied to SQL learning too, and in addition could be added also domain-specific recommendations. I would say that when discussing a topic like learning it makes sense to approach the subject from users level of knowledge and skill-set, different recommendations could apply for beginners, intermediate and expert level. No, also when you are an expert the learning process doesn’t stops there as SQL and RDBMS (Relational Database Management System) evolve with time, new features and techniques being added, same as new contexts of applicability could emerge (e.g. very large databases, geographic information systems, Web-related SQL, etc.).

The Beginner Level

    Approaching a topic from beginner’s perspective is never an easy task given the fact that many new comers don’t know from where to start, what to learn first, what RDBMS to target. I was and I’m still at that point as new subjects appear on my learning schedule. One important decision is what RDBMS solution to choose, and for this you might need to consider several aspects: market share, goals/objectives, functional area (e.g. database administration, database development, data warehousing, business intelligence, ETL, data quality), personal preferences, salaries and jobs postings, UI, features, tools, learning curve, performance, programmability, extensibility, integration and compatibilities with other technologies, availability of documentation, books, tutorials and other learning material, etc.

     I hope I’m not annoying anybody when I’m saying that Oracle, Microsoft and IBM are the leaders in RDBMS and related data technologies, this being also the ranking in what concerns the market share and job postings (see for example B. Nicolich’s SQL Server Job Trends article). If you aren’t having any preferences I would recommend you to focus on at least one of the RDBMS of the mentioned vendors: Oracle, SQL Server and MySQL. Unfortunately, given researches’ subjectivity, I can’t delimitate which of the three has better performance or is better to use, being important the context in which the studies are made, the overall architecture and their particularities, etc. Having a Microsoft technologies background I like SQL Server’s easy to use UI and integrated solution approach it provides, on the other hand Oracle seems to provide more SQL-related features (e.g. functions), while MySQL is still trying to catch the other two vendors and become more mature. All three vendors implement SQL ANSI 92 standard, so learning SQL on one of the platforms would allow you to write queries also on the other platforms, the difference residing in the functions used, DML (Data Manipulation Language) or DDL (Data Definition Language) syntax and specific functionality, architecture and tools.

   For SQL Server and Oracle you could download the trial versions, or opt for the Express versions that could be used free but with certain constraints, while for MySQL you could download one of the free community editions, quite stable solutions. If SQL Server comes with a nice Management Studio that could be used to write queries and administrate your databases, for Oracle and MySQL you might have to consider tools like Toad for OracleOracle or SQL Developer, respectively Toad for MySQL, that allow to be more productive on the respective platforms. If you are intending to use a RDBMS more for personal use (e.g. data analysis), what I call a personal database, then MS Access and MS Excel could prove to be enough, especially when is needed to develop personal or small scale solutions in an organization with a limited knowledge on SQL, though a more evolved RDBMS (actually MS Excel is not a RDBMS but it could be used as a data source or data analysis tool) is recommended when providing solution for multiple users.

    All three vendors provide rich documentation, with a plus for SQL Server and Oracle, through the online MSDN or Technet, respectively Oracle Database Documentation Library, and MySQL Reference Manual, the knowledge provided by the documentation being complemented by the many published books, magazines, tutorials, blogs, webcasts, podcasts, studies, best practices and other type of technical literature, many of them easily retrievable with just a simple search using your favorite search engine, important being to know what you are looking for. You have to consider that the technical documentation focuses more on the features, syntax and technical details, while for specific purposes like development, architecture integration or problem solving you might have to consider the other sources of knowledge with a plus for books, tutorials and blogs, the later two being actually more up-to-date than the books, given their shorter “time to market” and informal character.  If your interest in SQL and databases derives from pure educational purposes then you could also consider the free online course materials (presentations, video lectures) posted within initiatives like the ones of YouTube EDU or MIT Open CourseWare, many other lecture notes being available online from various well-known universities. Other important source of knowledge that gains field in the past years is the one of social and professional networks centered around the important technologies, so joining such a network could bring some benefit but you don’t expect miracles!

     In what concerns the books maybe it makes sense to list at least my preferences given the huge volume of books on this topic. I’m a fan of “for Dummies” series, therefore I find Allen Taylor’s SQL for Dummies and its more extensive version SQL All-In-One Desk Reference for Dummies or Database Development for Dummies as a good starting point. If you are targeting SQL Server then you might need to check also Microsoft SQL Server 2008 For Dummies, Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies, Microsoft SQL Server 2005 Programming For Dummies or Microsoft SQL Server 2005 Express Edition For Dummies. In exchange the list of choices for Oracle or MySQL is quite smaller, as far I know there are only two books on the market on Oracle - Oracle PL/SQL For Dummies and Oracle 11g For Dummies, while for MySQL the books from “for Dummies” cycle target the use of MySQL in combination with other scripting programming languages: Apache, MySQL, and PHP Web Development All-in-One Desk Reference, PHP & MySQL For Dummies or PHP & MySQL Everyday Apps For Dummies. Another popular series are the “for Beginners” or “Beginning” series, having a more narrow focus, for example Beginning T-SQL with Microsoft SQL Server 2005 and 2008 or Beginning PL/SQL: From Novice to Professional could be two of the books you could start with.

       Choosing the right book is highly dependent on what you want to achieve, your learning style and author preferences, books’ availability and costs, in general each professional having his/her own list of preferences. Years back I was quite stressed because many of the examples written in books were not working, nowadays hopefully that changed positively, in addition many publishers making the code available through books’ supporting sites or publishers websites together with a few sample chapters from the respective books. If you don’t want to buy the whole book or you’d like to read it online, you could become member of online libraries like Safari Books Online, Skillsoft’s Books 24x7, eBrary or SpringerLink. Many books are starting to become available for purchasing in PDF directly from publishers or in proprietary format on digital book readers like Amazon’s Kindle or Sony’s Reader, the digital books being in theory with 10-30% cheaper than the paperback products.

The Intermediate Level

    The intermediate level is a relative point in the evolution of a professional, in theory this level being reached after several projects or prolonged use of SQL in various scenarios. During this stage many developers are starting to approach topics like performance or best practices, going thus above the basics, or interest themselves in taking a certification, attempting thus to approach the whole range of topics related to databases. Some of the books listed above qualify as well to the intermediate or expert levels, it depends also on books’ level of detail and approach on the given topics. The books’ series closer to this level are the ones for Professionals, for example Pro SQL Server 2008 Relational Database Design and Implementation, Pro T-SQL 2008 Programmer’s Guide, Professional Oracle Programming, Pro MySQL or Professional MySQL.

     For those targeting the MCTS certification, it makes sense to consider the Self-Paced Training Kit Exams: (
Exam 70-432) Microsoft SQL Server 2008-Implementation and Maintenance, (Exam 70-433) Microsoft SQL Server 2008-Database Development, while for those approaching the MCITP certification then MCITP SQL Server 2005 Database Developer All-in-One Exam Guide is quite useful. As I haven’t approached the Oracle or MySQL certifications, I think it’s more indicated to do some research by yourself. 

The Expert Level

    At this level developers are starting to go deeper into the database internals, learning how things are handled in the background, differentiate between multiple solutions and choose the optimum solution. Books like Microsoft SQL Server 2008 Internals, SQL Server 2008 Query Performance Tuning Distilled or Professional SQL Server 2008 Internals and Troubleshooting shouldn’t miss from a developer’s library! I would expect there are similar books also for Oracle and MySQL, but as I’m not aware of them so I will stop here.

    At this level a developer could also approach the database topics from a higher level, making an incursion in the mathematical/conceptual theory of relational databases, books like Information Modeling and Relational Databases or Applied Mathematics for Database Professionals could become really handy in understanding such topics.

Note:

    Most probably there are even better books out there though, as I highlighted above, these are my recommendations based on the books that have fallen in my hand along the years. In many cases, instead of buying a book, I’m looking on whether I could find on the web the information I need, though the quality of the material varies, sometimes same as the quality of the books. There are even cases when the information are contradictory, however in most of the cases I would go with the experts that have the information from the source. 

No comments: