06 October 2007

🏗️Software Engineering: Users (Just the Quotes)

"Computers do not decrease the need for mathematical analysis, but rather greatly increase this need. They actually extend the use of analysis into the fields of computers and computation, the former area being almost unknown until recently, the latter never having been as intensively investigated as its importance warrants. Finally, it is up to the user of computational equipment to define his needs in terms of his problems, In any case, computers can never eliminate the need for problem-solving through human ingenuity and intelligence." (Richard E Bellman & Paul Brock, "On the Concepts of a Problem and Problem-Solving", American Mathematical Monthly 67, 1960)

"The most important property of a program is whether it accomplishes the intention of its user." (C Anthony R Hoare, Communications of the ACM, 1969)

"In computer design three levels can be distinguished: architecture, implementation and realisation; for the first of them, the following working definition is given: The architecture of a system can be defined as the functional appearance of the system to the user, its phenomenology. […] The inner structure of a system is not considered by the architecture: we do not need to know what makes the clock tick, to know what time it is. This inner structure, considered from a logical point of view, will be called the implementation, and its physical embodiment the realisation." (Gerrit A Blaauw, "Computer Architecture", 1972)

"There always is an architecture, whether it is defined in advance - as with modern computers - or found out after the fact - as with many older computers. For architecture is determined by behavior, not by words. Therefore, the term architecture, which rightly implies the notion of the arch, or prime structure, should not be understood as the vague overall idea. Rather, the product of the computer architecture, the principle of operations manual, should contain all detail which the user can know, and sooner or later is bound to know." (Gerrit A Blaauw, "Computer Architecture", 1972)

"Models are models of something, namely, [they are] reflections, representations of natural and artificial originals, that can themselves be models again. […] Models, in general, do not cover all the attributes of the originals they are representing, but only those [attributes] that seem relevant to the actual model creators and/or model users." (Herbert Stachowiak, "Allgemeine Modelltheorie", 1973)

"Models are not assigned per se uniquely to their originals. They perform their replacement function: a) for definite – cognitive and/or handling, model-using – subjects, b) within definite time intervals, c) under restrictions of definite operations of thought or fact. […] Models are not only models of something. They are also models for somebody, a human or an artificial model user. They perform thereby their functions in time, within a time interval. And finally, they are models for a definite purpose." (Herbert Stachowiak, "Allgemeine Modelltheorie", 1973)

"Two of the most difficult areas of data-base management are the design of an information structure and the reduction of that structure to a data structure which is compatible with and managed by the DBMS. […] Data-base management systems are tools to be applied by the users of these systems to build an accurate and useful model of their organization and its information needs. To accomplish this, the information structure must accurately define and characterize the items of data and the relations among them that are of interest to the users. This is no small task, for it demands a knowledge of the organization and the distribution of information among its various parts." (Robert W Taylor & Randall L Frank, "CODASYL Data-Base Management Systems", 1976)

"The utility of a language as a tool of thought increases with the range of topics it can treat, but decreases with the amount of vocabulary and the complexity of grammatical rules which the user must keep in mind. Economy of notation is therefore important." (Kenneth E Iverson, "Notation as a Tool of Thought", 1979)

"People’s mental models are apt to be deficient in a number of ways, perhaps including contradictory, erroneous, and unnecessary concepts. As designers, it is our duty to develop systems and instructional materials that aid users to develop more coherent, useable mental models. As teachers, it is our duty to develop conceptual models that will aid the learner to develop adequate and appropriate mental models. And as scientists who are interested in studying people’s mental models, we must develop appropriate experimental methods and discard our hopes of finding neat, elegant mental models, but instead learn to understand the messy, sloppy, incomplete, and indistinct structures that people actually have." (Donald A Norman, "Some Observations on Mental Models" [in "Mental Models"], 1983)

"Program designers have a tendency to think of the users as idiots who need to be controlled. They should rather think of their program as a servant, whose master, the user, should be able to control it. If designers and programmers think about the apparent mental qualities that their programs will have, they'll create programs that are easier and pleasanter - more humane - to deal with." (John McCarthy, "The Little Thoughts of Thinking Machines", Psychology Today,v1983)

"A database management system is a collection of interrelated files and a set of programs that allow users to access and modify these files. A major purpose of a database system is to provide users with an abstract view of the data. That is, the system hides certain details of how the data is stored and maintained. However, in order for the system to be usable, data must be retrieved efficiently. This concern has lead to the design of complex data structures for the representation of data in the database. Since many database systems users are not computer-trained, the complexity is hidden from them through several levels of abstraction in order to simplify their interaction with the system." (Henry F. Korth & Abraham Silberschatz, "Database System Concepts" 2nd Ed., 1991)

"Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large." (Jakob Nielsen, "Usability Engineering", 1993)

"The system should always keep users informed about what is going on, through appropriate feedback within reasonable time." (Jakob Nielsen, "Usability Engineering", 1993)

"Users can work with analysts and object designers to formulate and tune system requirements. People from business, analytical and object design disciplines can come together, learn from each other and generate meaningful descriptions of systems that are to be built. Each participant and each project has slightly different concerns and needs. Practical application of use cases can go a long way to improve our ability to deliver just what the customer ordered. (Rebecca Wirfs-Brock, "Designing scenarios: Making the case for a use case framework", 1993)

"Users often do not know what is good for them. […] Users have a very hard time predicting how they will interact with potential future systems with which they have no experience. […] Furthermore, users will often have divergent opinions when asked about details of user interface design." (Jakob Nielsen, "Usability Engineering", 1993)

"Users often raise questions that the development team has not even dreamed of asking. This is especially true with respect to potential mismatches between the users' actual task and the developers' model of the task. Therefore, users should be involved in the design process through regular meetings between designers and users. Users participating in a system design process are sometimes referred to as subject matter experts, or SMEs." (Jakob Nielsen, "Usability Engineering", 1993)

"Users are not designers, so it is not reasonable to expect them to come up with design ideas from scratch. However, they are very good at reacting to concrete designs they do not like or that will not work in practice. To get full benefits from user involvement, it is necessary to present these suggested system designs in a form the users can understand." (Jakob Nielsen, "Usability Engineering", 1993)

"The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better." (Eric S Raymond, "The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary", 1999)

"Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging." (Eric S Raymond, "The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary", 1999)

"Ultimately, users visit your website for its content. Everything else is just the backdrop." (Jakob Nielsen, "Designing Web Usability", 1999)

"The physical design process is a key phase in the overall design process. It is too often ignored until the last minute in the vain hope that performance will be satisfactory. Without a good physical design, performance is rarely satisfactory and throwing hardware at the problem is rarely completely effective. There is no substitute for a good physical design, and the time and effort spent in the physical design process will be rewarded with an efficient and well-tuned database, not to mention happy users!" (Ken England, "Microsoft SQL Server 2000 Performance Optimization and Tuning Handbook", 2001)

"As the least conscious layer of the user experience, the conceptual model has the paradoxical quality of also having the most impact on usability. If an appropriate conceptual model is faithfully represented throughout the interface, after users recognize and internalize the model, they will have a fundamental understanding of what the application does and how to operate it." (Bob Baxley, "Making the Web Work: Designing Effective Web Applications", 2002)

"The aim of architectural design is to prepare overall specifications, derived from the needs and desires of the user, for subsequent design and construction stages. The first task for the architect in each design project is thus to determine what the real needs and desires of the user are […]" (George J Klir & Doug Elias, "Architecture of Systems Problem Solving" 2nd Ed, 2003)

"A road plan can show the exact location, elevation, and dimensions of any part of the structure. The map corresponds to the structure, but it's not the same as the structure. Software, on the other hand, is just a codification of the behaviors that the programmers and users want to take place. The map is the same as the structure. […] This means that software can only be described accurately at the level of individual instructions. […] A map or a blueprint for a piece of software must greatly simplify the representation in order to be comprehensible. But by doing so, it becomes inaccurate and ultimately incorrect. This is an important realization: any architecture, design, or diagram we create for software is essentially inadequate. If we represent every detail, then we're merely duplicating the software in another form, and we're wasting our time and effort." (George Stepanek, "Software Project Secrets: Why Software Projects Fail", 2005)

"Abstractions matter to users too. Novice users want programs whose abstractions are simple and easy to understand; experts want abstractions that are robust and general enough to be combined in new ways. When good abstractions are missing from the design, or erode as the system evolves, the resulting program grows barnacles of complexity. The user is then forced to master a mass of spurious details, to develop workarounds, and to accept frequent, inexplicable failures." (Daniel Jackson, "Software Abstractions", 2006)

"Developing fewer features allows you to conserve development resources and spend more time refining those features that users really need. Fewer features mean fewer things to confuse users, less risk of user errors, less description and documentation, and therefore simpler Help content. Removing any one feature automatically increases the usability of the remaining ones." (Jakob Nielsen, "Prioritizing Web Usability", 2006)

"The role of conceptual modelling in information systems development during all these decades is seen as an approach for capturing fuzzy, ill-defined, informal 'real-world' descriptions and user requirements, and then transforming them to formal, in some sense complete, and consistent conceptual specifications." (Janis A Burbenko jr., "From Information Algebra to Enterprise Modelling and Ontologies", Conceptual Modelling in Information Systems Engineering, 2007)

"Users may be annoyed by bugs, and software developers may be disappointed by their inability to perfect their work, and managers may be frustrated by the unreliability of their plans. But in the end, none of that matters as much as the simple fact that software does not work the way we think, and until it does, it is not worth trying to perfect." (Scott Rosenberg, "Dreaming in Code", 2007)

"We tend to form mental models that are simpler than reality; so if we create represented models that are simpler than the actual implementation model, we help the user achieve a better understanding. […] Understanding how software actually works always helps someone to use it, but this understanding usually comes at a significant cost. One of the most significant ways in which computers can assist human beings is by putting a simple face on complex processes and situations. As a result, user interfaces that are consistent with users’ mental models are vastly superior to those that are merely reflections of the implementation model." (Alan Cooper et al,  "About Face 3: The Essentials of Interaction Design", 2007)

"Writing the spec, a document that lays out copiously detailed instructions for the programmer, is a necessary step in any software building enterprise where the ultimate user of the product is not the same person as the programmer. The spec translates requirements - the set of goals or desires the software developer’s customers lay out - into detailed marching orders for the programmer to follow." (Scott Rosenberg, "Dreaming in Code", 2007)

"Conceptual models are best thought of as design-tools - a way for designers to straighten out and simplify the design and match it to the users’ task-domain, thereby making it clearer to users how they should think about the application. The designers’ responsibility is to devise a conceptual model that seems natural to users based on the users’ familiarity with the task domain. If designers do their job well, the conceptual model will be the basis for users’ mental models of the application." (Jeff Johnson & Austin Henderson, "Conceptual Models", 2011)

"A conceptual model of an interactive application is, in summary: the structure of the application - the objects and their operations, attributes, and relation-ships; an idealized view of the how the application works – the model designers hope users will internalize; the mechanism by which users accomplish the tasks the application is intended to support." (Jeff Johnson & Austin Henderson, "Conceptual Models", 2011)

"The conceptual model is not the users’ mental model of the application. […] users of an application form mental models of it to allow them to predict its behavior. A mental model is the user’s high-level understanding of how the application works; it allows the user to predict what the application will do in response to various user-actions. Ideally, a user’s mental model of an application should be similar to the designers’ conceptual model, but in practice the two models may differ significantly. Even if a user’s mental model is the same as the designer’s conceptual model, they are distinct models." (Jeff Johnson & Austin Henderson, "Conceptual Models", 2011)

"Heuristics are simplified rules of thumb that make things simple and easy to implement. But their main advantage is that the user knows that they are not perfect, just expedient, and is therefore less fooled by their powers. They become dangerous when we forget that." (Nassim N Taleb, "Antifragile: Things that gain from disorder", 2012)

"If the user can’t understand it, the design and the designer have failed." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Successful information design in movement systems gives the user the information he needs - and only the information he needs - at every decision point." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"For an infrequent action to become a habit, the user must perceive a high degree of utility, either from gaining pleasure or avoiding pain." (Nir Eyal, "Hooked: How to Build Habit-Forming Products", 2014)

"To change behavior, products must ensure the user feels in control. People must want to use the service, not feel they have to." (Nir Eyal, "Hooked: How to Build Habit-Forming Products", 2014) 

"User habits are a competitive advantage. Products that change customer routines are less susceptible to attacks from other companies." (Nir Eyal, "Hooked: How to Build Habit-Forming Products", 2014)

"Users who continually find value in a product are more likely to tell their friends about it." (Nir Eyal, "Hooked: How to Build Habit-Forming Products", 2014) 

"When designers intentionally trick users into inviting friends or blasting a message to their social networks, they may see some initial growth, but it comes at the expense of users' goodwill and trust. When people discover they've been duped, they vent their frustration and stop using the product." (Nir Eyal, "Hooked: How to Build Habit-Forming Products", 2014)

"Once we understand our user's mental model, we can capture it in a conceptual model. The conceptual model is a representation of the mental model using elements, relationships, and conditions. Our design and final system will be the tangible result of this conceptual model." (Pau Giner & Pablo Perea, "UX Design for Mobile, 2017)

"A 'stream' is the continuous flow of work aligned to a business domain or organizational capability. Continuous flow requires clarity of purpose and responsibility so that multiple teams can coexist, each with their own flow of work. A stream-aligned team is a team aligned to a single, valuable stream of work; this might be a single product or service, a single set of features, a single user journey, or a single user persona." (Matthew Skelton & Manuel Pais, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019)

"Documentation is a practice concerned with all the processes involved in transferring documents from sources to users." (Brian C Vickery)

"This is generally true: any sizeable piece of program, or even a complete program package, is only a useful tool that can be used in a reliable fashion, provided that the documentation pertinent for the user is much shorter than the program text. If any machine or system requires a very thick manual, its usefulness becomes for that very circumstance subject to doubt!" (Edsger W. Dijkstra, "On the reliability of programs")

02 October 2007

🏗️Software Engineering: Agile Methods (Just the Quotes)

"Agile development methodologies promise higher customer satisfaction, lower defect rates, faster development times and a solution to rapidly changing requirements. Plan-driven approaches promise predictability, stability, and high assurance. However, both approaches have shortcomings that, if left unaddressed, can lead to project failure. The challenge is to balance the two approaches to take advantage of their strengths and compensate for their weaknesses." (Barry Boehm & Richard Turner, "Observations on balancing discipline and agility", Agile Development Conference, 2003)

"It is a myth that we can get systems 'right the first time'. Instead, we should implement only today’s stories, then refactor and expand the system to implement new stories tomorrow. This is the essence of iterative and incremental agility. Test-driven development, refactoring, and the clean code they produce make this work at the code level." (Robert C Martin, "Clean Code: A Handbook of Agile Software Craftsmanship", 2008)

"Agile approaches to software development consider design and implementation to be the central activities in the software process. They incorporate other activities, such as requirements elicitation and testing, into design and implementation. By contrast, a plan-driven approach to software engineering identifies separate stages in the software process with outputs associated with each stage." (Ian Sommerville, "Software Engineering" 9th Ed., 2011)

"Agile methods universally rely on an incremental approach to software specification, development, and delivery. They are best suited to application development where the system requirements usually change rapidly during the development process. They are intended to deliver working software quickly to customers, who can then propose new and changed requirements to be included in later iterations of the system. They aim to cut down on process bureaucracy by avoiding work that has dubious long-term value and eliminating documentation that will probably never be used." (Ian Sommerville, "Software Engineering" 9th Ed., 2011)

"Agile development methods require a disciplined approach to ensure that customer feedback, continuous testing, and iterative development actually lead to frequent deliveries of working, valuable software." (Michael Hüttermann et al, "DevOps for Developers", 2013)

"The advantages of Agile processes, including Scrum and Kanban (a method for delivering software with an emphasis on just-in-time delivery), are often nullified because of the obstacles to collaboration, processes, and tools that are built up in front of operations." (Michael Hüttermann et al, "DevOps for Developers", 2013)

"Many problems stem from a premature attempt at scaling Agile within the organization. The nature of the transformation is such that it is unrealistic to plan upfront for an 18-month organization-wide change program to go from status quo to continuous delivery. People try nevertheless, and when the outcomes don’t materialize, they say Agile doesn’t work." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"This is what the Agile Manifesto means when it says responding to change over following a plan. To maximize adaptability, it is essential to have good, fast feedback loops. This is why there is so much emphasis on iterative development." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Heart of Agile is a meme. Heart of Agile is four words stripped down to nothing. It contains only four words – collaborate, deliver, reflect, improve." (Alistair Cockburn, [interview] 2017)

23 September 2007

💎SQL Reloaded: 6 out of 49 (Part 4: More Basic Data Analysis)

In a post I read long time ago, it was mentioned that’s interesting to see in a data set in which intervals the data falls. It was recommended to split the interval in 10 buckets, though we don’t know which is the right most extremity for our interval. We could use the maximum value:

  -- get maximum number of consecutive drawings for which any number haven’t appeared 
SELECT Max(DifferenceWeeks) 
FROM dbo.vLottoConsecutiveDrawings  

But this doesn’t guarantees that there will be cases in which the extremity is exceeded. From my point of view, if the maximum is 63, then I’ll take the extremity 70 and split it in 10 buckets: first interval will be 1 … 7, second 8 … 14, etc. Thus, I’m taking the GROUP query I used in the previous copy and add to it additional logic for the split: 

SELECT Number 
, Min(NullIf(DifferenceWeeks, 0)) Minimum 
, Max(NullIf(DifferenceWeeks, 0)) Maximum 
, Avg(NullIf(DifferenceWeeks, 0)) Average 
, count(*) NumberDrawings 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 1 AND 7 THEN 1 END) Range1_7 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 8 AND 14 THEN 1 END) Range8_14 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 15 AND 21 THEN 1 END) Range15_21 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 22 AND 28 THEN 1 END) Range22_28 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 29 AND 35 THEN 1 END) Range29_35 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 36 AND 42 THEN 1 END) Range36_42 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 43 AND 49 THEN 1 END) Range43_49 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 50 AND 56 THEN 1 END) Range50_56 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 57 AND 63 THEN 1 END) Range57_63 
, SUM(CASE WHEN DifferenceWeeks BETWEEN 64 AND 70 THEN 1 END) Range64_70 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 1 AND 7 THEN 1 END)/count(*) Percentage1_7 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 8 AND 14 THEN 1 END)/count(*) Percentage8_14 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 15 AND 21 THEN 1 END)/count(*) Percentage15_21 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 22 AND 28 THEN 1 END)/count(*) Percentage22_28 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 29 AND 35 THEN 1 END)/count(*) Percentage29_35 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 36 AND 42 THEN 1 END)/count(*) Percentage36_42 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 43 AND 49 THEN 1 END)/count(*) Percentage43_49 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 50 AND 56 THEN 1 END)/count(*) Percentage50_56 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 57 AND 63 THEN 1 END)/count(*) Percentage57_63 
, 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 64 AND 70 THEN 1 END)/count(*) Percentage64_70 
FROM dbo.vLottoConsecutiveDrawings 
GROUP BY Number 
ORDER BY Number 

As can be seen I just summed the number of cases in which a value fallen in the given interval:

SUM(CASE WHEN DifferenceWeeks BETWEEN 1 AND 7 THEN 1 END) Range1_7     

Additionally, I wanted to see what was the percentage for a value to fall in a given interval from the total number of occurrences: 
    
 100*SUM(CASE WHEN DifferenceWeeks BETWEEN 1 AND 7 THEN 1 END)/count(*) Percentage1_7 

How can we benefit from such output?     
We can check for each number what’s the bucket with the highest percentage, usually there are only 1-2 intervals which qualify. The percentage the is key in finding useful information. For example if there is only one interval with a 75% occurrences, then most probably the number will appear in a drawing from that interval, if the number didn’t appeared in the respective interval, then the probability for that number to appear increases. When we have multiple intervals with a considerable high percentage, if the right extremity of the highest interval is exceeded, the chances for a number to appear are considerably higher.

Previous Post <<||>> Next Post

💎SQL Reloaded: 6 out of 49 (Part 3: Basic Data Analysis)

The most basic information one can get is the number of drawings in which a number occurred, respectively which was its frequency:

-- number & frequency drawings by number
DECLARE @NumberDrawings int = 2000
SELECT Number 
, Cast(100.0*count(*)/@NumberDrawings as decimal(10,2)) Frequency 
, count(*) NumberDrawings 
FROM dbo.vLottoNumbers 
GROUP BY Number 
ORDER BY Number  

One of the interesting things to see in a data set of drawings is the distance between two drawings in which appears the same numbers. It doesn’t look too easy to compare two consecutive records within the same query; actually this can be done easy with the help of DENSE_RANK function (works only on SQL Server 2005+ and Oracle), which ranks the data within a partition, thus if two values are identical, they have the same ranking. Let’s see how DENSE_RANK function works: 

-- ranked drawings by number
SELECT Number 
, DrawingDate 
, DENSE_RANK() OVER (PARTITION BY Number ORDER BY DrawingDate) Ranking 
FROM dbo.vLottoNumbers   

The partition is created by Number, while the ranking is delimited by DrawingDate. Now all we have to do is to join two such queries by Number with two consecutive Rankings: 

--creating the view 
CREATE VIEW dbo.vLottoConsecutiveDrawings 
AS 
-- consecutive drawings by numbers
WITH DAT 
AS (
	SELECT Number     
	, DrawingDate     
	, DENSE_RANK() OVER (PARTITION BY Number ORDER BY DrawingDate) Ranking   
	FROM dbo.vLottoNumbers  
)
SELECT A.Number 
, A.DrawingDate 
, B.DrawingDate NextDrawingDate 
, IsNull(DateDiff(d, A.DrawingDate, B.DrawingDate), 0) DifferenceDays 
, IsNull(DateDiff(wk, A.DrawingDate, B.DrawingDate), 0) DifferenceWeeks 
FROM DAT A
     LEFT JOIN DAT B 
	   ON A.Number = B.Number 
	  AND A.Ranking = B.Ranking - 1 

The first interesting thing to find out is:     
  • What’s the average number of occurrences?     
  • What’s the minimum/maximum number of consecutive drawings in which the number hasn’t appeared?  
  • How many times a number appeared?      
The following query answers to these questions, by doing a simple grouping by Number over vLottoConsecutiveDrawings output:

-- minimum/maximum differences between drawings
SELECT Number 
, Min(NullIf(DifferenceWeeks, 0)) Minimum 
, Max(NullIf(DifferenceWeeks, 0)) Maximum 
, Avg(NullIf(DifferenceWeeks, 0)) Average 
, count(*) NumberDrawings 
FROM dbo.vLottoConsecutiveDrawings 
GROUP BY Number 
ORDER BY Number    

The output doesn’t give much information, but it’s a good start. The study interval can be broken down by using the DrawingDate in GROUP and/or WHERE clause: 

-- minimum/maximum differences between drawings by year
SELECT Year(DrawingDate) DrawingYear 
, Number 
, Min(NullIf(DifferenceWeeks, 0)) Minimum 
, Max(NullIf(DifferenceWeeks, 0)) Maximum 
, Avg(NullIf(DifferenceWeeks, 0)) Average 
, count(*) NumberDrawings 
FROM vLottoConsecutiveDrawings 
--WHERE Year(DrawingDate) IN (2000, 2001) 
GROUP BY Number 
, Year(DrawingDate) 
ORDER BY Number, DrawingYear

It would be also interesting to know how many numbers fall on a given row or column within the 7x7 matrix:

-- aggregating data by drawing & row
SELECT LN.DrawingDate 
, SUM(CASE WHEN LN.Number BETWEEN 1 AND 7 THEN 1 ELSE 0 END) Bucket1
, SUM(CASE WHEN LN.Number BETWEEN 8 AND 14 THEN 1 ELSE 0 END) Bucket2
, SUM(CASE WHEN LN.Number BETWEEN 15 AND 21 THEN 1 ELSE 0 END) Bucket3
, SUM(CASE WHEN LN.Number BETWEEN 22 AND 28 THEN 1 ELSE 0 END) Bucket4
, SUM(CASE WHEN LN.Number BETWEEN 29 AND 35 THEN 1 ELSE 0 END) Bucket5
, SUM(CASE WHEN LN.Number BETWEEN 36 AND 42 THEN 1 ELSE 0 END) Bucket6
, SUM(CASE WHEN LN.Number BETWEEN 43 AND 49 THEN 1 ELSE 0 END) Bucket7
FROM dbo.vLottoNumbers LN 
GROUP BY LN.DrawingDate 

-- aggregating data by drawing & column
SELECT LN.DrawingDate 
, SUM(CASE WHEN LN.Number %7 = 1 THEN 1 ELSE 0 END) Bucket1
, SUM(CASE WHEN LN.Number %7 = 2 THEN 1 ELSE 0 END) Bucket2
, SUM(CASE WHEN LN.Number %7 = 3 THEN 1 ELSE 0 END) Bucket3
, SUM(CASE WHEN LN.Number %7 = 4 THEN 1 ELSE 0 END) Bucket4
, SUM(CASE WHEN LN.Number %7 = 5 THEN 1 ELSE 0 END) Bucket5
, SUM(CASE WHEN LN.Number %7 = 6 THEN 1 ELSE 0 END) Bucket6
, SUM(CASE WHEN LN.Number %7 = 7 THEN 1 ELSE 0 END) Bucket7
FROM dbo.vLottoNumbers LN 
GROUP BY LN.DrawingDate 

The buckets can be aggregated as follows (observe the replacement of SUM with MAX):
 
-- consolidated drawings by 7 buckets (row)
 SELECT SUM(Bucket1) Bucket1
, SUM(Bucket2) Bucket2
, SUM(Bucket3) Bucket3
, SUM(Bucket4) Bucket4
, SUM(Bucket5) Bucket5
, SUM(Bucket6) Bucket6
, SUM(Bucket7) Bucket7
 FROM (
     -- aggregating data by drawing & row
	 SELECT LN.DrawingDate 
	, Max(CASE WHEN LN.Number BETWEEN 1 AND 7 THEN 1 ELSE 0 END) Bucket1
	, Max(CASE WHEN LN.Number BETWEEN 8 AND 14 THEN 1 ELSE 0 END) Bucket2
	, Max(CASE WHEN LN.Number BETWEEN 15 AND 21 THEN 1 ELSE 0 END) Bucket3
	, Max(CASE WHEN LN.Number BETWEEN 22 AND 28 THEN 1 ELSE 0 END) Bucket4
	, Max(CASE WHEN LN.Number BETWEEN 29 AND 35 THEN 1 ELSE 0 END) Bucket5
	, Max(CASE WHEN LN.Number BETWEEN 36 AND 42 THEN 1 ELSE 0 END) Bucket6
	, Max(CASE WHEN LN.Number BETWEEN 43 AND 49 THEN 1 ELSE 0 END) Bucket7
	 FROM dbo.vLottoNumbers LN 
	 GROUP BY LN.DrawingDate 
  ) DAT

Happy coding!

💎SQL Reloaded: 6 out of 49 (Part 2: Modeling the Data)

In a previous post I created a random set of data and stored them in LottoExtracts table, the first step is done, now we have to model the data in a form which could be useful for our analysis. Thus it makes sense to split the table in two tables:     
  • LottoExtracts containing mainly the DrawingDate and eventually additional information about extraction (e.g. Country, System, etc.). For simplicity I’m including only the Drawing Date.     
  • LottoNumbers containing the numbers and their order in extraction. 
Here are the tables:
 
-- dropping the LottoDrawings table
DROP TABLE IF EXISTS dbo.LottoDrawings
GO

--creating the LottoDrawings table 
CREATE TABLE dbo.LottoDrawings( 
  DrawingID int IDENTITY(1,1) NOT NULL
 , DrawingDate smalldatetime NULL) 
GO

 -- dropping the LottoNumbers table
DROP TABLE IF EXISTS dbo.LottoNumbers

 --creating LottoNumbers table 
 CREATE TABLE dbo.LottoNumbers(NumberID int IDENTITY(1,1) NOT NULL
 , DrawingID int NULL
 , Position smallint NULL
 , Number smallint NULL)   

 And, the view which will be used as source for analysis: 

 --creating vLottoNumbers view 
 CREATE VIEW dbo.vLottoNumbers AS 
 -- consolidated drawings by number
 SELECT LN.NumberID 
 , LN.Position 
 , LN.Number 
 , LN.DrawingID 
 , LD.DrawingDate 
 , Year(LD.DrawingDate) DrawingYear , Month(LD.DrawingDate) DrawingMonth 
 , DatePart(dd, LD.DrawingDate) DrawingDay 
 , DatePart(wk, LD.DrawingDate) DrawingWeek 
 FROM dbo.LottoNumbers LN 
      JOIN dbo.LottoDrawings LD 
	    ON LN.DrawingID = LD.DrawingID    

Now that we have the objects to model the data, we need to populate the tables with the data from LottoExtracts table: 

--populating LottoDrawings table 
INSERT dbo.LottoDrawings (DrawingDate) 
SELECT DISTINCT DrawingDate 
FROM dbo.LottoExtracts 

--populating the LottoNumbers table - first drawing number 
INSERT LottoNumbers (DrawingID, Position, Number) 
SELECT DrawingID, 1, N1 
FROM dbo.LottoExtracts LE 
     JOIN dbo.LottoDrawings LD 
	   ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0  

The above query inserted only the first number of the extraction, in order to insert the other 5 numbers, all you have to do is to replace 1 with 2 in the second line and run the query again, the insert for second number becomes: 

--populating the LottoNumbers table – second drawing number 
INSERT LottoNumbers (DrawingID, Position, Number) 
SELECT DrawingID, 2, N2 
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0

Here's the code for the remaining numbers:

--populating LottoNumbers table – the remaining drawing numbers
INSERT LottoNumbers (DrawingID, Position, Number) 
SELECT DrawingID, 3, N3 
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0
UNION ALL 
SELECT DrawingID, 4, N4
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0
UNION ALL 
SELECT DrawingID, 5, N5 
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0
UNION ALL 
SELECT DrawingID, 6, N6
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0

Checking the output, you should have 12000 records: 

--checking the data 
SELECT * 
FROM dbo.vLottoNumbers 
ORDER BY DrawingDate
, Position

Now we have the data for analysis (see the next post).

Happy coding!

Previous Post <<||>> Next Post

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.