About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Monday, January 02, 2017

Lessons Learned - Documentation

Introduction


“Documentation is a love letter that you write to your future self.”
Damian Conway

    For programmers as well for other professionals who write code, documentation might seem a waste of time, an effort few are willing to make. On the other side documenting important facts can save time sometimes and provide a useful base for building own and others’ knowledge. I found sometimes on the hard way what I needed to document. With the hope that others will benefit from my experience, here are my lessons learned:

 

Lesson #1: Document your worked tasks


“The more transparent the writing, the more visible the poetry.”
Gabriel Garcia Marquez


   Personally I like to keep a list with what I worked on a daily basis – typically nothing more than 3-5 words description about the task I worked on, who requested it, and eventually the corresponding project, CR or ticket. I’m doing it because it makes easier to track my work over time, especially when I have to retrieve some piece of information that is somewhere else in detail documented.

   Within the same list one can track also the effective time worked on a task, though I find it sometimes difficult, especially when one works on several tasks simultaneously. In theory this can be used to estimate further similar work. One can use also a categorization distinguishing for example between the various types of work: design, development, maintenance, testing, etc. This approach offers finer granularity, especially in estimations, though more work is needed in tracking the time accurately. Therefore track the information that worth tracking, as long there is value in it.

   Documenting tasks offers not only easier retrieval and base for accurate estimations, but also visibility into my work, for me as well, if necessary, for others. In addition it can be a useful sensemaking tool (into my work) over time.

Lesson #2: Document your code


“Always code as if the guy who ends up maintaining your code will be
a violent psychopath who knows where you live.”
Damian Conway

    There are split opinions over the need to document the code. There are people who advise against it, and probably one of most frequent reasons is rooted in Agile methodology. I have to stress that Agile values “working software over comprehensive documentation”, fact that doesn’t imply the total absence of documentation. There are also other reasons frequently advanced, like “there’s no need to document something that’s already self-explanatory “(like good code should be), “no time for it”, etc. Probably in each statement there is some grain of truth, especially when considering the fact that in software engineering there are so many requirements for documentation (see e.g. ISO/IEC 26513:2009).

   Without diving too deep in the subject, document what worth documenting, however this need to be regarded from a broader perspective, as might be other people who need to review, modify and manage your code.

    Documenting code doesn’t resume only to the code being part of a “deliverables”, but also to intermediary code written for testing or other activities. Personally I find it useful to save within the same fill all the scripts developed within same day. When some piece of code has a “definitive” character then I save it individually for reuse or faster retrieval, typically with a meaningful name that facilitates file’s retrieval. With the code it helps maybe to provide also some metadata like: a short description and purpose (who and when requested it).

   Code versioning can be used as a tool in facilitating the process, though not everything worth versioning.

 

Lesson #3: Document all issues as well the steps used for troubleshooting and fixing


“It’s not an adventure until something goes wrong.”
Yvon Chouinard

   Independently of the types of errors occurring while developing or troubleshooting code, one of the common characteristics is that the errors can have a recurring character. Therefore I found it useful to document all the errors I got in terms of screenshots, ways to fix them (including workarounds) and, sometimes also the steps followed in order to troubleshoot the problem.

   Considering that the issues are rooted in programming fallacies or undocumented issues, there is almost always something to learn from own as well from others’ errors. In fact, that was the reasons why I started the “SQL Troubles” blog – as a way to document some of the issues I met, to provide others some help, and why not, to get some feedback.

 

Lesson #4: Document software installations and changes in configurations


   At least for me this lesson is rooted in the fact that years back quite often release candidate as well final software was not that easy to install, having to deal with various installation errors rooted in OS or components incompatibilities, invalid/not set permissions, or unexpected presumptions made by the vendor (e.g. default settings). Over the years installation became smoother, though such issues are still occurring. Documenting the installation in terms of screenshots with the setup settings allows repeating the steps later. It can also provide a base for further troubleshooting when the configuration within the software changed or as evidence when something goes wrong.


   Talking about changes occurring in the environment, not often I found myself troubleshooting something that stopped working, following to discover that something changed in the environment. It’s useful to document the changes occurring in an environment, importance stressed also in “Configuration Management” section of ITIL® (Information Technology Infrastructure Library).

 

Lesson #5: Document your processes


“Verba volant, scripta manent.” Latin proverb
"Spoken words fly away, written words remain."

    In process-oriented organizations one has the expectation that the processes are documented. One can find that it’s not always the case, some organization relying on the common or individual knowledge about the various processes. Or it might happen that the processes aren’t always documented to the level of detail needed. What one can do is to document the processes from his perspective, to the level of detail needed.

 

Lesson #6: Document your presumptions


“Presumption first blinds a man, then sets him a running.”
Benjamin Franklin

   Probably this is more a Project Management related topic, though I find it useful also when coding: define upfront your presumptions/expectations – where should libraries lie, the type and format of content, files’ structure, output, and so on. Even if a piece of software is expected to be a black-box with input and outputs, at least the input, output and expectations about the environment need to be specified upfront.

 

Lesson #7: Document your learning sources


“Intelligence is not the ability to store information, but to know where to find it.”
Albert Einstein

    Computer specialists are heavily dependent on internet to keep up with the advances in the field, best practices, methodologies, techniques, myths, and other knowledge. Even if one learns something, over time the degree of retention varies, and it can decrease significantly if it wasn’t used for a long time. Nowadays with a quick search on internet one can find (almost) everything, though the content available varies in quality and coverage, and it might be difficult to find the same piece of information. Therefore, independently of the type of source used for learning, I found it useful to document also the information sources.

 

Lesson #8: Document the known as well the unknown

 

“A genius without a roadmap will get lost in any country but an average person
with a roadmap will find their way to any destination.”
Brian Tracy

   Over the years I found it useful to map and structure the learned content for further review, sometimes considering only key information about the subject like definitions, applicability, limitations, or best practices, while other times I provided also a level of depth that allow me and others to memorize and understand the topic. As part of the process I attempted to keep the  copyright attributions, just in case I need to refer to the source later. Together with what I learned I considered also the subjects that I still have to learn and review for further understanding. This provides a good way to map what I known as well what isn’t know. One can use for this a rich text editor or knowledge mapping tools like mind mapping or concept mapping.


Conclusion


    Documentation doesn’t resume only to pieces of code or software but also to knowledge one acquires, its sources, what it takes to troubleshoot the various types of issues, and the work performed on a daily basis. Documenting all these areas of focus should be done based on the principle: “document everything that worth documenting”.

Saturday, November 05, 2016

System.OutOfMemoryException in SQL Server Management Studio and other 32-bit Drawbacks

    I was playing this week with a few datasets downloaded from the web on various topics, trying to torture the data until they’ll confess something. A few of the datasets were prepared for load into a MySQL database as individual INSERT INTO statements. They were containing between 100000 and a few millions of records. While looking at the big but slim datasets in SSMS (SQL Server Management Studio) and reconciling the differences between MySQL and SQL Server I got several times the System.OutOfMemoryException exception, SSMS crashing one or two times. That should be ok, given the number of records, though I was surprised that I got the same error message while executing the INSERT INTO statements for one of the smallest datasets which had about 300000 records:


    „An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown”


    Kb 2874903 brings some light into the topic – SSMS is still a 32-bit process and thus limited to 2GB of memory. The Kb offers three methods to avoid this issue. The first two, outputting the query results to text or file didn’t worked. The third method based on using sqlcmd utility worked smoothly with a syntax like the one below:

sqlcmd -i “<file_name.sql>” -d “<database name>”

    So it doesn’t matter that you’re having a supercomputer and that working with big datasets becomes a necessity nowadays, this limitation can make data loading just a little bit more complicated. On one side, it’s true that when dealing with such datasets is probably recommended to use directly sqlcmd to execute the scripts. On the other side, independently from this type of problem, even if understandable from the need of keeping backwards compatibility with 32-bit platforms/solutions, it’s hard to digest the fact that Microsoft keeps some of its products 32-bit based when SQL Server is targeting 64-bit platforms. One has same problem when using BIDS (Business Intelligence Development Studio), developing SSRS, SSIS or SSAS solutions under 32-bit and having maybe to deploy the code as 64-bit (e.g. SQL Server Agent). From my point of view most of the issues I had were when dealing with proprietary drivers like the ones for Oracle or even for MS Office. In addition in SSIS there could be features that are only available in 32-bit versions, or have limitations on 64-bit computers (see [5]). As it seems also the SQL Server Data Tools (SSDT) will have similar drawbacks…


   Anyway, sqlcmd utility saved the day with a minimum of overhead. Unfortunately it’s not always that easy to solvethe compatibility issues between 32-bit and 64-bit software and platforms.

Resources:

[1] Microsoft Support (2013) Kb 2874903: "System.OutOfMemoryException" exception when you execute a query in SQL Server Management Studio https://support.microsoft.com/en-us/kb/2874903

[2] MSDN (2016) SQL Server 2016: sqlcmd Utility https://msdn.microsoft.com/en-us/library/ms162773.aspx

[3] MSDN (2016) SQL Server 2016: Use the sqlcmd Utility https://msdn.microsoft.com/en-us/library/ms180944.aspx

[4] MSDN (2012) Introducing Business Intelligence Development Studio  https://msdn.microsoft.com/en-us/library/ms173767.aspx

[5] SQL Server 2008 R2: 64 bit Considerations for Integration Services https://technet.microsoft.com/en-us/library/ms141766(v=sql.105).aspx

Wednesday, March 02, 2016

Self-Service BI


Introduction


    According to Gartner, the world's leading information technology research and advisory company, Self-Service BI (aka self-service analytics, ad-hoc analysis, personal analytics), for short SSBI, is a “form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support” [1].

    Reading between the lines, SSBI presumes the existence of an infrastructure made of tools to support it (aka self-service BI tools), direct or indirect access to row data and/or data models for the users, and the skillset needed in order to work with data and answer to business problems/questions.

A Little History

     The concept of self-service is not new, it just got “rebranded” and transformed into a business opportunity. The need for business users to perform ad-hoc analyses was always there in organizations, especially in the ones not having the right infrastructure for harnessing their data. Even since the 90s with the appearance of products like MS Excel or MS Access in many organizations users were forced by the state of art to learn how to use such products in order to get the answers they needed from the data. Users started building personal solutions, many of them temporary, intended to fill the reporting gaps organizations had. With a little effort and relatively small investment users had the possibility of playing with the data, understanding the data, identifying and solving problems in the business. They acquired thus a certain level of business expertise and data awareness becoming valuable resources in the organization.

     With time such solutions grew in scope and data volume, gained broader visibility and reached deeper in organizations, some of them becoming team, departmental or cross-departmental solutions. What grows uncontrolled with time starts to have negative impact on the environment. First tools’ management became a problem because the solutions needed to be backed-up and maintained regularly, then other problems started to surface: security of data, inefficient data processing as increasing volumes of data were processed on local computers and transferred over the network, data and effort were duplicated, different versions of reality existed as different numbers were reported, numbers that were reflecting different definitions, knowledge about the business or data-analysis skillsets. The management needed a more consolidated and standardized effort in order to address these problems. Organizations were forced or embraced the idea of investing money in modern BI solutions, in more powerful servers capable of handling a larger amount of requests, in flexible data models that facilitate data consumption, in data quality initiatives. Thus through various projects a considerable number of such solutions were converted into more standardized and performant BI solutions, the IT department being in control of the changes and new requests.

Back to Present

    With IT in control of the reporting requirements the business is forced to rely on the rapidity with which IT is able to address new requirements. Some organizations acquired internal resources in order to build reports and afferent infrastructure in-house, others created partnerships with vendors, or approached a combination of the two. As the volume of requirements isn’t uniform over time, the business has to wait several days between the time a requirement was addressed to IT and a solution was provided. In business terms a few of days of waiting for data can equate with the loss of an opportunity, a decision taken too late, decision that could have broader impact.

     A few years ago things started to change when the ad-hoc analysis concept was rebranded as self-service and surfaced as trend. This time vendors like Qlik, Tableau, MicroStrategy or Microsoft, some of the main SSBI vendors, are offering easy to use and rich in functionality tools for data integration, visualization and discovery, tools that reflect the advances made in graphics, data storage and processing technologies (e.g. in-memory databases, parallel processing). With just a few drag-and-drops users are able to display details, aggregate data, identify trends and correlations between data. In addition the tools can make use of the existing data models available in data warehouses, data marts and other types of data repositories, including the rich set of open data available on the web.

Looking at the Future

   Like its predecessors SSBI seems to address primarily data analysts and data-aware business users, however in time is expected to be adopted by more organizations and become more mature where already adopted. Of course, some of the problems from the early days more likely will resurface though through governance, better architectures and tools, integration with other BI capabilities, trainings and awareness most of the problems will be overcome. More likely there will be also organizations in which SSBI will fail. In the end each organization will need to find by itself the value of SSBI.

Resources:
[1] Gartner (2016) Self-Service Analytics [Online] Available from: http://www.gartner.com/it-glossary/self-service-analytics
[2
] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb

Saturday, February 27, 2016

2¢ on BI Myths: Business Intelligence is Complex

Introduction

    While looking over “Business Intelligence Concepts and Platform Capabilities” Coursera MOOC resources for Module 2 I run into two similar articles from Solutions Review, respectively Information Age. What caught my attention was the easiness with which the complexity of BI “myth” is approached in both columns.

    According to the two sources the capabilities of nowadays BI tools “enabled business users to easily identify and present trends in an impactful way” [1], and “do not require an expert at the helm” [2]. It became thus simpler for users to independently query data and create interactive reports and presentations [2]. In both columns one can read between the lines that the simplicity of using BI tools is equivalent with negating the complexity of BI, which from my point of view is false. In fact here are regarded especially the self-service BI tools, in trend nowadays, that allow users to easily perform ad-hoc analysis with a minimal involvement from IT. Self-service BI is only a subset of what BI for organizations means, and just a capability from the many BI capabilities an organization needs in theory, even if some organizations might use it extensively.

Beyond the Surface

    A BI tool is not a BI solution per se, even if many generic BI solutions for different systems are available out of the box. This is one of the biggest confusion managers, users and unfortunately also BI professionals make. A BI tool offers the technological basis for creating a BI infrastructure, though it comes with no guarantees. It takes a well-defined IT and business strategy, one or more successful projects, skillful developers and users in order to harness the BI investment.

    On the other side it’s also true that organizations can obtain results also from less, though BI doesn’t equates with any ad-hoc analysis performed by users, even if they use BI tools for this purpose. BI is not only about tools, reporting and revealing trends in the data. BI often implies a holistic knowledge about the business and certain data awareness, without which users will start aggregating and comparing apples with pears and wonder why they taste and look different.

    If everything were so simple then why so many BI projects fail to deliver what’s expected? Why so many managers complain that they don’t have the data they need, when they need them? Sure maybe the problem lies in over-complexifying the whole BI landscape and treating everything from a high-level, though that’s more likely not it.

It’s a Teamwork Knowledge Game

    BI is or needs to be monitoring and problem solving oriented. This requires a deep understanding about processes and business. There are business users and also BI professionals who don’t have the knowledge one needs in order to approach a business problem. One can see that from the premises they have, the questions they raise, the data they consider, the models they build, and the results.

    From a BI professional’s perspective, even if one has a broad knowledge about various businesses, one often lacks the insight in a given business. BI professionals can seldom provide adequate BI solutions without input and feedback from the business. Some BI professionals rely too much on their knowledge, same as the business sometimes expects a maximum output from BI professionals by providing a minimum of input.

    Considering the business users, quite often their focus and knowledge cover only the data boundaries of their department, while many problems extend over those boundaries. They know facts that are not necessarily reflected in the data. Even if they are closer to the data than other parties, they still lack some data-awareness (including statistical awareness) in order to approach problems.

    Somebody was saying ironically when talking about users’ data and problem solving skills - “not everybody is a Bill Gates or Steve Jobs”. Continuing the idea, one can’t expect users to act as such. For sure there are many business users who are better problem solvers than BI consultants, though on the other side one can’t expect that the average business user will have the same skillset as an experienced BI consultant. This is in fact one of the problems of self-service BI. Probably with time and effort organization will develop such resources, though some help from BI professionals will be still needed. Without a good cooperation between the business and BI professionals an organization might not have the hoped results when investing in BI

More on Complexity

    The complexity arises when one tries to make more with the data, especially the data found in raw form. Usually the complexity of raw data can be addressed by building a logical or physical model that allows easier consumption of data. Here is the point where the users find themselves overwhelmed, because for this is required a good knowledge of the physical data model and its semantics, the technical knowledge to build models and the skills to reengineer the logic available in the source systems. These are the themes BI professionals are supposed to excel in. Talking about models, they are the most difficult to build because they reflect various segments of the business, they reflect a breakdown of the complexity. It’s also the point where many BI projects fail as the built models don’t reflect the reality or aren’t capable to answer to business questions.

    Coming back to the two columns, I have to point out that the complexity of a subject or domain can’t be judged based on how easy is to approach basic tasks. The complexity lies typically when one goes beyond the basics, when one dives into details. In case of BI its complexity starts when one attempts mixing various technologies and knowledge domains to model and solve daily business problems in an integrated, holistic, aligned, consistent and cost-effective manner. The more the technologies, the knowledge domains and constraints one has to consider, the more complex the BI landscape and solutions become.

    On the other side this doesn’t mean that the BI infrastructure can’t be simplified, that BI can’t rely heavily or exclusively on self-service BI solutions. However for each strategy there are advantages and disadvantages and one more likely has to consider both sides of the coin in the process. And self-service BI has its own trade-offs, weaknesses that can be transformed in strengths with time.

Conclusion

   When one considers nowadays BI tools capabilities, ad-hoc analyses are relatively easy to perform and can lead to results, though such analyses don’t equate with BI and the simplicity with which they are performed don’t necessarily imply that BI is simple as a whole. When one considers the complexity of nowadays businesses, the more one dives in various problems a business has, the more complex the BI landscape seems. In the end it’s in each organization powers to simplify and harmonize its BI infrastructure to a degree that its business goals aren’t affected negatively.


Resources
[1] Information Age (2015) 5 Myths about Intelligence, by Ben Rossi, [Online] Available from: http://www.information-age.com/technology/information-management/123460271/5-myths-about-business-intelligence 
[2] SolutionsReview (2015) Top 5 Business Intelligence Myths Revealed, by Timothy King, [Online] Available from: http://solutionsreview.com/business-intelligence/top-5-business-intelligence-myths-revealed
[3] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb 
[4] Coursera (2016) Business Intelligence Concepts, Tools, and Applications MOOC, led by Jahangir Karimi, University of Colorado, [Online] Available from: https://www.coursera.org/learn/business-intelligence-tools

Friday, May 29, 2015

Keeping Current or the Quest to Lifelong Learning for IT Professionals

Introduction

    The pace with which technologies and the business changes becomes faster and faster. If 5-10 years back a vendor needed 3-5 years before coming with a new edition of a product, nowadays each 1-2 years a new edition is released. The release cycles become shorter and shorter, vendors having to keep up with the changing technological trends. Changing trends allow other vendors to enter the market with new products, increasing thus the competition and the need for responsiveness from other vendors. On one side the new tools/editions bring new functionality which mainly address technical and business requirements. On the other side existing tools functionality gets deprecated and superset by other. Knowledge doesn’t resume only to the use of tools, but also in the methodologies, procedures, best practices or processes used to make most of the respective products. Evermore, the value of some tools increases when mixed, flexible infrastructures relying on the right mix of tools working together.

    For an IT person keeping current with the advances in technologies is a major requirement. First of all because knowing modern technologies is a ticket for a good and/or better paid job. Secondly because many organizations try to incorporate in their IT infrastructure modern tools that would allow them increase the ROI and achieve further benefits. Thirdly because, as I’d like to believe, most of the IT professionals are eager to learn new things, keep up with the novelty. Being an adept of the continuous learning philosophy is also a way to keep the brain challenged, other type of challenge than the one we meet in daily tasks.

Knowledge Sources

    Face-to-face or computer-based trainings (CBTs) are the old-fashioned ways of keeping up-to-date with the advances in technologies though paradoxically not all organizations afford to train their IT employees. Despite of affordable CBTs, face-to-face trainings are quite expensive for the average IT person, therefore the IT professional has to reorient himself to other sources of knowledge. Fortunately many important Vendors like Microsoft or IBM provide in one form or another through Knowledge Bases (KB), tutorials, forums, presentations and Blogs a wide range of resources that could be used for learning. Similar resources exist also from similar parties, directly or indirectly interested in growing the knowledge pool.

    Nowadays reading a book or following a course it isn’t anymore a requirement for learning a subject. Blogs, tutorials, articles and other types of similar material can help more. Through their subject-oriented focus, they can bring some clarity in a small unit of time. Often they come with references to further materials, bring fresh perspectives, and are months or even years ahead books or courses. Important professionals in the field can be followed on blogs, Twitter, LinkedIn, You Tube and other social media platforms. Seeing in what topics they are interested in, how they code, what they think, maybe how they think, some even share their expertize ad-hoc when asked, all of this can help an IT professional considerably if he knows how to take advantage of these modern facilities.

    MOOCs start to approach IT topics, and further topics that can become handy for an IT professional. Most of them are free or a small fee is required for some of them, especially if participants’ identity needs to be verified. Such courses are a valuable resource of information. The participant can see how such a course is structured, what topics are approached, and what’s the minimal knowledge base required; the material is almost the same as in a normal university course, and in the end it’s not the piece of paper with the testimonial that’s important, but the change in perspective we obtained by taking the course. In addition the MOOC participant can interact with people with similar hobbies, collaborate with them on projects, and why not, something useful can come out of it. Through MOOCs or direct Vendor initiatives, free or freeware versions of software is available. Sometimes the whole functionality is available for personal use. The professional is therefore no more dependent on the software he can use only at work. New possibilities open for the person who wants to learn.

Maximizing the Knowledge Value

    Despite the considerable numbers of knowledge resources, for an IT professional the most important part of his experience comes from hand-on experience acquired on the job. If the knowledge is not rooted in hand-on experience, his knowledge remains purely theoretical, with minimal value. Therefore in order to maximize the value of his learning, an IT professional has to attempt using his knowledge as much and soon as possible in praxis. One way to increase the value of experience is to be involved in projects dealing with new technologies or challenges that would allow a professional to further extend his knowledge base. Sometimes we can choose such projects or gain exposure to the technologies, though other times no such opportunities can be sized or identified.

    Probably an IT professional can use in his daily duties 10-30% of what he learned. This percentage can be however increased by involving himself in other types of personal or collective (open source or work) projects. This would allow exploring the subjects from other perspective. Considering that many projects involve overtime, many professionals have also a rich personal life, it looks difficult to do that, though not impossible.

    Even if not on a regular basis achievable, a professional can allocate 1-3 hours on a weekly basis from his working time for learning something new. It can be something that would help directly or indirectly his organization, though sometimes it pays off to learn technologies that have nothing to do with the actual job. Somebody may argue that the respective hours are not “billable”, are a waste of time and other resources, that the technologies are not available, that there’s lot of due tasks, etc. With a little benevolence and with the right argumentation also such criticism can be silenced. The arguments can be for example based on the fact that a skilled professional can be with time more productive, a small investment in knowledge can have later a bigger benefit for both parties – employee and employer. An older study was showing that when IT professionals was given some freedom to approach personal projects at work, and use some time for their own benefit, the value they bring for an organization increased. There are companies like Google who made from this type of work a philosophy.

    A professional can also allocate 1-3 hours from his free time while commuting or other similar activities. Reading something before going to bed or as relaxation after work can prove to be a good shut-down for the brain from the daily problems. Where there’s interest in learning something new a person will find the time, no matter how busy his schedule is. It’s important however to do that on a regular basis, and with time the hours and knowledge accumulate.

    It’s also important to have a focused effort that will bring some kind of benefit. Learning just for the sake of learning brings little value on investment for a person if it’s not adequately focused. For sure it’s interesting and fun to browse through different topics, it’s even recommended to do so occasionally, though on the long run if a person wants to increase the value of his knowledge, he needs somehow to focus the knowledge within a given direction and apply that knowledge.

    Direction we obtain by choosing a career or learning path, and focusing on the direct or indirect related topics that belong to that path. Focusing on the subjects related to a career path allows us to build our knowledge further on existing knowledge, understanding a topic fully. On the other side focusing on other areas of applicability not directly linked with our professional work can broaden our perspective by looking at one topic from another’s topic perspective. This can be achieved for example by joining the knowledge base of a hobby we have with the one of our professional work. In certain configurations new opportunities for joint growth can be identified.

    The value of knowledge increases primarily when it’s used in day-to-day scenarios (a form of learning by doing). It would be useful for example for a professional to start a project that can bring some kind of benefit. It can be something simple like building a web page or a full website, an application that processes data, a solution based on a mix of technologies, etc. Such a project would allow simulating to some degree day-to-day situations, when the professional is forced to used and question some aspects, to deal with some situations that can’t be found in textbook or other learning material. If such a project can bring a material benefit, the value of knowledge increases even more.

    Another way to integrate the accumulated knowledge is through blogging and problem-solving. Topic or problem-oriented blogging can allow externalizing a person’s knowledge (aka tacit knowledge), putting knowledge in new contexts into a small focused unit of work, doing some research and see how other think about the same topic/problem, getting feedback, correcting or improving some aspects. It’s also a way of documenting the various problems identified while learning or performing a task. Blogging helps a person to improve his writing communication skills, his vocabulary and with a little more effort can be also a visit card for his professional experience.

    Trying to apply new knowledge in hand-on trainings, tutorials or by writing a few lines of code to test functionality and its applicability, same as structuring new learned material into notes in the form of text or knowledge maps (e.g. concept maps, mind maps, causal maps, diagrams, etc.) allow learners to actively learn the new concepts, increasing overall material’s retention. Even if notes and knowledge maps don’t apply the learned material directly, they offer a new way of structuring the content and resources for further enrichment and review. Applied individually, but especially when combined, the different types of active learning help as well maximize the value of knowledge with a minimum of effort.

Conclusion

    The bottom line – given the fast pace with which new technologies enter the market and the business environment evolves, an IT professional has to keep himself up-to-date with nowadays technologies. He has now more means than ever to do that – affordable computer-based training, tutorials, blogs, articles, videos, forums, studies, MOOC and other type of learning material allow IT professionals to approach a wide range of topics. Through active, focused, sustainable and hand-on learning we can maximize the value of knowledge, and in the end depends of each of us how we use the available resources to make most of our learning experience.

Saturday, November 10, 2012

Data Quality: Data Migration’s Perspective – Part I: A Bird’s-Eye View

    Imagine you just finished a Data Migration (DM) project, everything went smoothly, the data were loaded into the new system with a minimum amount of issues, inherent sometimes to such complex projects, the users started to use the new system, everybody seemed to be satisfied, and a few weeks later within the company rumors propagate with the speed of light – “the migrated data are wrong”, “the new system can’t be used” , “IT did a bad job”, “we have to get back to the previous system”, and so on. The panic propagates, a few heads fall, the business tries to revert to the old system but there’s lot of new data available in the new system, and it’s not so trivial to move the data back to the old, in the meantime other rumors appear, and… it’s just a scenario but this could happen to any company if not the appropriate measures were taken at the right time. What could help a company when something like this happens?! A good Plan B aka a good Migration Fallback Plan/Policy, but that’s something nobody would like to do except extreme situations.

    A common approach to any type of projects as well to a DM project is to identify and mitigate the risks before or during the project. That’s something I started to do a few days ago, to prepare a list with the risks associated with DM projects. For this exercise I tried to remember what things went wrong in previous similar projects I worked on and to figure out what else could go wrong. Some online resources helped me to refresh my memory too, and I think I found also two or three things I haven’t really thought about. My attempt was primarily focused on this type of problem mentioned above – minimizing the risks of not having the right data when the new system goes live. Before jumping into the thematic I would like to sketch the bigger picture, as I perceive it.

    Having the right data when a system goes live primarily means having good Data Quality (DQ) in the target system after the data were migrated! As a DM is the best exemplification of the GIGO (Garbage-In Garbage-Out) principle, in order to have good DQ in the target is important to handle DQ latest during the DM project. That’s essential and common sense – you can’t expect to have good data in the new system when there’s lot of garbage in the old. So, a DM and a Risk Management for such a project should be built around this. In fact not having a DQ initiative or project in a DM project is one of the most important risks a company can take. Maybe in small DM, a DQ initiative isn’t necessary, though when the data are important for your company, DQ is a must! In addition DQ assessments have to be performed in alignment with the new system, and not the old. Even if the data have good quality into the old system, the quality of your data after DM will be judged in corroboration with the new system. This is a requirement that can be easily overlooked and its implications misunderstood!

    Many think that DQ is one time activity, we do it for a DM project and we’ll have quality data and never have to care about their quality anymore. Totally false! DQ has to be part of a broader strategy, call it Data Governance, Master Data Management, Data Management or any other initiative in which data plays an important role. DQ is an on going, iterative and consolidated effort, it doesn’t end after DM but continues for the whole data life-cycle, as long the data have value for an organization. It doesn’t help if the data have high quality when the data are migrated and a few weeks or months later the overall quality and trust in data decreased considerably.

    Keeping an acceptable level of DQ must be an organization’s strategy, and must be built a culture toward DQ. People need to be aware of the importance of having good quality data, and especially the consequences of having bad quality data. DQ doesn’t concern only the owners or stewards of data, or the people working with data, it concerns the whole organization because decisions are made based on those data, processes are changed and improved, an organization’s performance is often judged based on data. The quality of data is a matter of perception, on how users see the quality of data in corroboration with the needs they have, and the needs change over time. Primarily being aware what good DQ means and which are an organization’s needs in respect to data, it’s also a way of minimizing the negative perception of data, of gaining trust in data and some solid basis on which decisions can be made. Secondarily, these organizational data needs need to be addressed in a DM, they are the success factors upon which the success of a DM project is judged.

    For sure considerable costs are associated with DQ initiatives and everything related to data which doesn’t always represent a direct cost component in the products or services handled by an organization. Considering that not all data have the same importance for an organization, it makes sense to prioritize the DQ effort as a whole and the data cleaning needs in particular, the focus should be the data with the highest impact and with time to tackle data with lower and lower impact. It must be found equilibrium between the DQ costs and the value of data. Most probably is important to spend resources on raising people’s awareness in respect to DQ early rather than cleaning retroactively data later. It also make sense to invest in tools that help to clean data using automated or semi-automated methods, though some manual/visual control need to be in place too.

    DQ and the way the problems associated with it are tackled depends more on an organization’s internal kitchen – people, partners, organization, strategy, maturity, culture, geography, infrastructure, methodologies used, etc. What it matters is how the various negative and important aspects of an organization are aligned in order to take advantage of one of the most important assets an organization has is its data! For this is important to adopt methodologies that support DQ, align them and tweak them as requested, in order to make most of your data! But before or while doing that remember that a DM is an organization’s opportunity to change the quality of its data and its data strategy!

Thursday, October 04, 2012

Business Rules – An Introduction

    "Business rules" seems to be a recurring theme these days – developers, DBAs, architects, business analysts, IT and non-IT professionals talk about the necessity to enforce them in data and semantic models, information systems, processes, departments or whole organizations. They seem to affect the important layers of an organization. In fact the same business rule can affect multiple levels either directly, or indirectly through the hierarchical or networked structure of causality it belongs to. When considered all the business rules, the overall picture can become very complex. The fact that there are multiple levels of interconnected layers, with applications and implications at macro or micro level, makes the complexity to fight back because in order to solve business-specific problems often you have to go at least one level above the level where the problems were defined, or to simplify the problems to a level of detail that allows to tackled.

    The Business Rules Group defines a business rule as "a statement that defines or constrains some aspect of the business" [1], definition which seems to be closer to the vocabulary of IT people. Ronald G. Ross, in his book Principles of the Business Rule Approach, defines it as "a directive intended to influence or guide business behavior" [2], definition closer to the vocabulary of HR people. In fact the two definitions are kind of similar, highlighting the constrictor or guiding role of business rules. They raise also an important question – can everything that is catalogued as constraint or guidelines considered as a business rule? In theory yes, practically there are constraints and guidelines that have different impact on the business, so depending on context they need to be considered or not. What to consider is itself an art, which adds up to the art of problem solving.

    Besides identification, neither the definition nor management of business rules seems easy tasks. R.G. Ross considers that business rules need to be written and made explicit, expressed in plain language, independent of procedures and workflows, built on facts, motivated by identifiable and important business factors, accessible to authorized parties, specific, single sourced, managed, specified by those people who have relevant knowledge, and they should guide or influence behavior in desired ways [2]. This summarizes the various aspects that need to be considered when defining and managing business rules. Many organization seems to be challenged by this, and it can be challenging when lacks business management maturity.

    Many business rules exist already in functional and technical specifications written for the various software products built on request, in documentation of purchases software, in processes, procedures, standards, internal defined and external enforced policies, in the daily activities and knowledge exchanged or hold by workers. Sure, the formulations existing in such resources need to be enhanced and aggregated in order to be brought at the status of business rule. And here comes the difficulty, as iterative work needs to be performed in order to bring them to the level indicated by R.G Ross. For sure Ross’ specifications are idealistic, though they offer a “framework” for defining business rules. In what concerns their management, there is a lot to be done within an organization, as this aspect needs to be integrated with other activities and strategies existing in an organization.

    Often, when an important initiative, better said project, starts within an organization, then is felt in particular the lack of up-front defined and understood business rules. Such events trigger the identification and elicitation of business rules; they are addressed in documentation and remain buried in there. It is also true that it’s difficult to build a business case for further processing of business rules. An argument could be the costs associated from decisional mistakes taken by not knowing the existing rules, though that’s something difficult to quantify and make visible in an organization. In the end, most probably an organization will recognize the value of business rules when it reached a certain level of maturity.

References:
[1] Business Rules Group (2000) Defining Business Rules - What Are They Really? [Online] Available from: http://businessrulesgroup.org/first_paper/BRG-whatisBR_3ed.pdf
[2] Ronald G. Ross (2003) Principles of the Business Rule Approach. Addison Wesley. ISBN: 0-201-78893-4.