30 November 2021

Data Science: Regularization (Definitions)

"It is a formal concept based on fuzzy topology that removes geometric anomalies on fuzzy regions." (Markus Schneider, "Fuzzy Spatial Data Types for Spatial Uncertainty Management in Databases", 2008)

"It is any method of preventing overfitting of data by a model and it is used for solving ill-conditioned parameter-estimation problems." (Cecilio Angulo & Luis Gonzalez-Abril, "Support Vector Machines", 2009)

"Optimization of both complexity and performance of a neural network following a linear aggregation or a multi-objective algorithm." (M P Cuéllar et al, "Multi-Objective Training of Neural Networks", 2009)

"Including a term in the error function such that the training process favours networks of moderate size and complexity, that is, networks with small weights and few hidden units. The goal is to avoid overfitting and support generalization." (Frank Padberg, "Counting the Hidden Defects in Software Documents", 2010)

"It refers to the procedure of bringing in additional knowledge to solve an ill-posed problem or to avoid overfitting. This information appears habitually as a penalty term for complexity, such as constraints for smoothness or bounds on the norm." (Vania V Estrela et al, "Total Variation Applications in Computer Vision", 2016)

"This is a general method to avoid overfitting by applying additional constraints to the model that is learned. A common approach is to make sure the model weights are, on average, small in magnitude." (Rayid Ghani & Malte Schierholz, "Machine Learning", 2017)

"Regularization is a method of penalizing complex models to reduce their variance. Specifically, a penalty term is added to the loss function we are trying to minimize [...]" (Chris Albon, "Machine Learning with Python Cookbook", 2018)

"Regularization, generally speaking, is a wide range of ML techniques aimed at reducing overfitting of the models while maintaining theoretical expressive power." (Jonas Teuwen & Nikita Moriakov, "Convolutional neural networks", 2020)

29 November 2021

Systems Engineering: Chaos Theory (Definitions)

"A scientific approach – research effort which is based on examining behaviors of nonlinear dynamical systems, which are highly sensitive to their initial conditions." (Utku Köse & Ahmet Arslan, "Chaotic Systems and Their Recent Implementations on Improving Intelligent Systems", 2014)

"Study of deterministic behaviours that depend on initial conditions in physical, natural and social sciences." (Ayşe G Gözüm, "Evaluating HRM Functions within the Context of Chaos and Complexity Theory", 2016)

"The mathematical framework for understanding irregular and erratic fluctuations in economic cycles, financial markets, weather, other complex phenomenon, or non-linear systems with many variables." (Kijpokin Kasemsap, "Utilizing Complexity Theory and Complex Adaptive Systems in Global Business", 2016)

"The study of the behavior of dynamical systems that are highly sensitive to initial conditions." (Rohnn B Sanderson, "Understanding Chaos as an Indicator of Economic Stability", 2016)

"The theory that emerged from mathematics and used widely by other disciplines which concentrates on the dynamical systems." (Çağlar Doğru, "Leader-Member Exchange and Transformational Leadership in Chaos and Complexity", 2016)

"A field of study that explains nonlinear or dynamical systems." (Sharon E Norris, "Examining the Strategic Leadership of Organizations Using Metaphor: Brains and Flux-Interconnected and Interlocked", 2017)

"Chaos theory is the branch of mathematics deals with complicated linear dynamic systems." (Anandkumar R &  Kalpana R, "A Review on Chaos-Based Image Encryption Using Fractal Function", 2020)

"Suggests a randomness of understanding around complex patterns. These may be described as dynamic systems that reflect irregularities and is extremely sensitive to negligible fluctuations or moderations in situation." (Caroline M Crawford et al, "Social Learning Through a Participative Storytelling Framework: Rethinking the Essence of Course Engagement", 2021)

"Chaos theory is a branch of mathematics focusing on the study of chaos - dynamical systems whose random states of disorder and irregularities are governed by underlying patterns and deterministic laws that are highly sensitive to initial conditions." (Nima Norouzi, "Criminal Policy, Security, and Justice in the Time of COVID-19", 2022)

04 April 2021

Lean Management: Between Value and Waste I (Introduction)


Independently on whether Lean Management is considered in the context of Manufacturing, Software Development (SD), Project Management (PM) or any other business-related areas, there are three fundamental business concepts on which the whole scaffolding of the Lean philosophies is built upon, namely the ones of value, value stream and waste. 

From an economic standpoint, value refers to the monetary worth of a product, asset or service (further referred as product) to an organization, while from a qualitative perspective, it refers to the perceived benefit associated with its usage. The value is thus reflected in the costs associated with a product’s delivery (producer’s perspective), respectively the price paid on acquiring it and the degree to which the product can fulfill a demand (customer’s perspective).

Without diving too deep into theory of product valuation, the challenges revolve around reducing the costs associated with a product’s delivery, respectively selling it to a price the customer is willing to pay for, typically to address a given set of needs. Moreover, the customer is willing to pay only for the functions that satisfy the needs a product is thought to cover. From this friction of opposing driving forces, a product is designed and valued.

The value stream is the sequence of activities (also steps or processes) needed to deliver a product to customers. This formulation includes value-added and non-value-added activities, internal and external customers, respectively covers the full lifecycle of products and/or services in whatever form it occurs, either if is or not perceived by the customers.  

Waste is any activity that consumes resources but creates no value for the customers or, generally, for the stakeholders, be it internal or external. The waste is typically associated with the non-added value activities, activities that don’t produce value for stakeholders, and can increase directly or indirectly the costs of products especially when no attention is given to it and/or not recognized as such. Therefore, eliminating the waste can have an important impact on products’ costs and become one of the goals of Lean Management. Moreover, eliminating the waste is an incremental process that, when put in the context of continuous improvement, can lead to processes redesign and re-engineering.

Taiichi Ohno, the ‘father’ of the Toyota Production System (TPS), originally identified seven forms of waste (Japanese: muda): overproduction, waiting, transporting, inappropriate processing, unnecessary inventory, unnecessary/excess motion, and defects. Within the context of SD and PM, Tom and Marry Poppendieck [1] translated the types of wastes in concepts closer to the language of software developers: partially done work, extra processes, extra features, task switching, waiting, motion and, of course, defects. To this list were added later further types of waste associated with resources, confusion and work conditions.

Defects in form of errors and bugs, ineffective communication, rework and overwork, waiting, repetitive activities like handoffs or even unnecessary meetings are usually the visible part of products and projects and important from the perspective of stakeholders, which in extremis can become sensitive when their volume increases out of proportion.

Unfortunately, lurking in the deep waters of projects and wrecking everything that stands in their way are the other forms of waste less perceivable from stakeholders’ side: unclear requirements/goals, code not released or not tested, specifications not implemented, scrapped code, overutilized/underutilized resources, bureaucracy, suboptimal processes, unnecessary optimization, searching for information, mismanagement, task switching, improper work condition, confusion, to mention just the important activities associated to waste.

Through their elusive nature, independently on whether they are or not visible to stakeholders, they all impact the costs of projects and products when the proper attention is not given to them and not handled accordingly.

Lean Management - The Waste Iceberg

[1] Mary Poppendieck & Tom Poppendieck (2003) Lean Software Development: An Agile Toolkit, Addison Wesley, ISBN: 0-321-15078-3

01 April 2021

SQL Reloaded: Processing JSON Files with Complex Structure in SQL Server 2016+

Unfortunately (or fortunately, for the challenge-searchers), not all JSON data files have a simple (matrix) structure, while the data might not even have a proper (readable) definition. It's the case of the unemployment data provided by the Cologne municipality (source). However with a language page translator and some small effort one can identify the proximate data definition:

Source FieldTarget FieldData Type

As previously stated (see post), it makes sense to build the logic over several iterations, making first sure that the references to file's columns were used correctly (observe the way the various elements were referenced in the queries):

, DAT.Nummer
, DAT.Name
, DAT.ALO_Total
FROM OPENROWSET (BULK 'D:\data\Arbeitsmarkt Statistik Koeln Stadtteil.json',CODEPAGE='65001', SINGLE_CLOB)  as jsonfile 
     CROSS APPLY OPENJSON(BulkColumn,'$.features')
	  ObjectId int '$.properties.OBJECTID'
	, Nummer int '$.properties.NUMMER'
	, Name nvarchar(max) '$.properties.NAME'
	, ALO_Total int '$.properties.AM_ALO_INSG_AA'
	, ALO_SGB2 int '$.properties.AM_ALO_SGB2_AA'
) AS DAT; 

Output (first 13 records):

The logic seems to work, however the German umlauts aren't displayed as expected ('Lövenich', when it should have been 'Lövenich'). This is caused by the differences in character sets. An easy way to address this is to use a functions which does the conversion (see the dbo.ReplaceCodes2Umlauts UDF from an older post). 

By applying the function on the Names, adding the further columns and an INSERT clause, the query becomes:
-- importing the JSON file
, DAT.Nummer
, dbo.ReplaceCodes2Umlauts(DAT.Name) Name
, DAT.ALO_Total
, DAT.ALO_Under25
, DAT.ALO_Total_Perc
, DAT.ALO_Under25_Perc
, DAT.ALO_Total_Hist
, DAT.ALO_Under25_Hist
, DAT.ALO_Total_HistPerc
, DAT.ALO_SGB2_HistPerc
, DAT.ALO_Under25_HistPerc
, DAT.SVB_Total
, DAT.SVB_Women
, DAT.SVB_German
, DAT.SVB_Under25Yo
, DAT.SVB_Over55Yo
, DAT.SVB_Total_Perc
, DAT.SVB_Men_Perc
, DAT.SVB_Women_Perc
, DAT.SVB_German_Perc
, DAT.SVB_Under25Yo_Perc
, DAT.SVB_Over55Yo_Perc
, DAT.SVB_Total_Hist
, DAT.SVB_Men_Hist
, DAT.SVB_Women_Hist
, DAT.SVB_German_Hist
, DAT.SVB_Under25Yo_Hist
, DAT.SVB_Over55Yo_Hist
, DAT.SVB_Total_HistPerc
, DAT.SVB_Men_HistPerc
, DAT.SVB_Women_HistPerc
, DAT.SVB_German_HistPerc
, DAT.SVB_Under25Yo_HistPerc
, DAT.SVB_Over55Yo_HistPerc
, DAT.Shape_Area 
, DAT.Shape_Len 
INTO dbo.Unemployment_Cologne
FROM OPENROWSET (BULK 'D:\data\Arbeitsmarkt Statistik Koeln Stadtteil.json',CODEPAGE='65001', SINGLE_CLOB)  as jsonfile 
     CROSS APPLY OPENJSON(BulkColumn,'$.features')
	  ObjectId int '$.properties.OBJECTID'
	, Nummer int '$.properties.NUMMER'
	, Name nvarchar(max) '$.properties.NAME'
	, ALO_Total int '$.properties.AM_ALO_INSG_AA'
	, ALO_SGB2 int '$.properties.AM_ALO_SGB2_AA'
	, ALO_Under25 int '$.properties.AM_ALO_UNTER25_AA'
	, ALO_Total_Perc float '$.properties.AM_ALO_INSG_AP'
	, ALO_SGB2_Perc float '$.properties.AM_ALO_SGB2_AP'
	, ALO_Under25_Perc float '$.properties.AM_ALO_UNTER25_AP'
	, ALO_Total_Hist int '$.properties.AM_ALO_INSG_HA'
	, ALO_SGB2_Hist int '$.properties.AM_ALO_SGB2_HA'
	, ALO_Under25_Hist int '$.properties.AM_ALO_UNTER25_HA'
	, ALO_Total_HistPerc float '$.properties.AM_ALO_INSG_HP'
	, ALO_SGB2_HistPerc float '$.properties.AM_ALO_SGB2_HP'
	, ALO_Under25_HistPerc float '$.properties.AM_ALO_UNTER25_HP'
	, SVB_Total int '$.properties.AM_SVB_INSG_AA'
	, SVB_Men int '$.properties.AM_SVB_MANN_AA'
	, SVB_Women int '$.properties.AM_SVB_FRAU_AA'
	, SVB_German int '$.properties.AM_SVB_DEUTSCH_AA'
	, SVB_AUSLAND int '$.properties.AM_SVB_AUSLAND_AA'
	, SVB_Under25Yo int '$.properties.AM_SVB_U25J_AA'
	, SVB_Over55Yo int '$.properties.AM_SVB_UEBER55J_AA'
	, SVB_Total_Perc float '$.properties.AM_SVB_INSG_AP'
	, SVB_Men_Perc float '$.properties.AM_SVB_MANN_AP'
	, SVB_Women_Perc float '$.properties.AM_SVB_FRAU_AP'
	, SVB_German_Perc float '$.properties.AM_SVB_DEUTSCH_AP'
	, SVB_AUSLAND_Perc float '$.properties.AM_SVB_AUSLAND_AP'
	, SVB_Under25Yo_Perc float '$.properties.AM_SVB_U25J_AP'
	, SVB_Over55Yo_Perc float '$.properties.AM_SVB_UEBER55J_AP'
	, SVB_Total_Hist int '$.properties.AM_SVB_INSG_HA'
	, SVB_Men_Hist int '$.properties.AM_SVB_MANN_HA'
	, SVB_Women_Hist int '$.properties.AM_SVB_FRAU_HA'
	, SVB_German_Hist int '$.properties.AM_SVB_DEUTSCH_HA'
	, SVB_AUSLAND_Hist int '$.properties.AM_SVB_AUSLAND_HA'
	, SVB_Under25Yo_Hist int '$.properties.AM_SVB_U25J_HA'
	, SVB_Over55Yo_Hist int '$.properties.AM_SVB_UEBER55J_HA'
	, SVB_Total_HistPerc float '$.properties.AM_SVB_INSG_HP'
	, SVB_Men_HistPerc float '$.properties.AM_SVB_MANN_HP'
	, SVB_Women_HistPerc float '$.properties.AM_SVB_FRAU_HP'
	, SVB_German_HistPerc float '$.properties.AM_SVB_DEUTSCH_HP'
	, SVB_AUSLAND_HistPerc float '$.properties.AM_SVB_AUSLAND_HP'
	, SVB_Under25Yo_HistPerc float '$.properties.AM_SVB_U25J_HP'
	, SVB_Over55Yo_HistPerc float '$.properties.AM_SVB_UEBER55J_HP'
	, Shape_Area float '$.properties."SHAPE.AREA"'
	, Shape_Len float '$.properties."SHAPE.LEN"'
) AS DAT; 

Once the data made available, one can go on and discover the data and the relationships existing between the various columns. 

Happy coding!

SQL Reloaded: Processing JSON Files with Flat Matrix Structure in SQL Server 2016+

Besides the CSV format, many of the data files made available under the open data initiatives are stored in JSON format, which makes data more difficult to process, even if JSON offers a richer structure that goes beyond the tabular structure of CSV files. Fortunately, starting with SQL Server 2016, JSON became a native format, which makes the processing of JSON files relatively easy, the easiness with which one can process the data depending on how they are structured.

Let’s consider as example a JSON file with the world population per country and year that can be downloaded from DataHub (source). The structure behind resembles a tabular model (see the table on the source website), having a flat structure. Just export the data to a file with the JSON extension (e.g. ‘population-figures-by-country.json’) locally (e.g. ‘D:/Data’). The next step is to understand file’s structure. Some repositories provide good documentation in this respect, though there are also many exceptions. Having a JSON editor like Visual Studio which reveals the structure makes easier the process. 

As in the case of CSV files, is needed to infer the data types. There are two alphanumeric fields (Country & Country Code), while the remaining fields are numeric. The only challenge raised by the data seems to be the difference in format between the years 2002 and 2015 in respect to the other years, as the values of the former contain a decimal after comma. All the numeric values should have been whole values. 

It’s recommended to start small and build the logic iteratively. Therefore, for the first step just look at files content via the OPENROWSET function:

-- looking at the JSON file 
FROM OPENROWSET (BULK 'D:\data\population-figures-by-country.json', SINGLE_CLOB)  as jsonfile 

In a second step one can add the OPENJSON function by looking only at the first record: 

-- querying a json file (one record)
FROM OPENROWSET (BULK 'D:\data\population-figures-by-country.json', SINGLE_CLOB)  as jsonfile 
     CROSS APPLY OPENJSON(BulkColumn,'$[0]')

In a third step one can add a few columns (e.g. Country & Country Code) to make sure that the select statement works correctly. 

-- querying a json file (all records, a few fields)
SELECT Country 
, CountryCode 
FROM OPENROWSET (BULK 'D:\data\population-figures-by-country.json', SINGLE_CLOB)  as jsonfile 
     CROSS APPLY OPENJSON(BulkColumn,'$')
 WITH ( 
  Country nvarchar(max) '$.Country'
, CountryCode nvarchar(3) '$.Country_Code'
) AS DAT; 

In a next step can be added all the columns and import the data in a table (e.g. dbo.CountryPopulation) on the fly: 

-- importing a json file (all records) on the fly
, DAT.CountryCode
, DAT.Y1960
, DAT.Y1961
, DAT.Y1962
, DAT.Y1963
, DAT.Y1964
, DAT.Y1965
, DAT.Y1966
, DAT.Y1967
, DAT.Y1968
, DAT.Y1969
, DAT.Y1970
, DAT.Y1971
, DAT.Y1972
, DAT.Y1973
, DAT.Y1974
, DAT.Y1975
, DAT.Y1976
, DAT.Y1977
, DAT.Y1978
, DAT.Y1979
, DAT.Y1980
, DAT.Y1981
, DAT.Y1982
, DAT.Y1983
, DAT.Y1984
, DAT.Y1985
, DAT.Y1986
, DAT.Y1987
, DAT.Y1988
, DAT.Y1989
, DAT.Y1990
, DAT.Y1991
, DAT.Y1992
, DAT.Y1993
, DAT.Y1994
, DAT.Y1995
, DAT.Y1996
, DAT.Y1997
, DAT.Y1998
, DAT.Y1999
, DAT.Y2000
, DAT.Y2001
, Cast(DAT.Y2002 as bigint) Y2002
, Cast(DAT.Y2003 as bigint) Y2003
, Cast(DAT.Y2004 as bigint) Y2004
, Cast(DAT.Y2005 as bigint) Y2005
, Cast(DAT.Y2006 as bigint) Y2006
, Cast(DAT.Y2007 as bigint) Y2007
, Cast(DAT.Y2008 as bigint) Y2008
, Cast(DAT.Y2009 as bigint) Y2009
, Cast(DAT.Y2010 as bigint) Y2010
, Cast(DAT.Y2011 as bigint) Y2011
, Cast(DAT.Y2012 as bigint) Y2012
, Cast(DAT.Y2013 as bigint) Y2013
, Cast(DAT.Y2014 as bigint) Y2014
, Cast(DAT.Y2015 as bigint) Y2015
, DAT.Y2016
INTO dbo.CountryPopulation
FROM OPENROWSET (BULK 'D:\data\population-figures-by-country.json', SINGLE_CLOB)  as jsonfile 
     CROSS APPLY OPENJSON(BulkColumn,'$')
 WITH ( 
  Country nvarchar(max) '$.Country'
, CountryCode nvarchar(3) '$.Country_Code'
, Y1960 bigint '$.Year_1960'
, Y1961 bigint '$.Year_1961'
, Y1962 bigint '$.Year_1962'
, Y1963 bigint '$.Year_1963'
, Y1964 bigint '$.Year_1964'
, Y1965 bigint '$.Year_1965'
, Y1966 bigint '$.Year_1966'
, Y1967 bigint '$.Year_1967'
, Y1968 bigint '$.Year_1968'
, Y1969 bigint '$.Year_1969'
, Y1970 bigint '$.Year_1970'
, Y1971 bigint '$.Year_1971'
, Y1972 bigint '$.Year_1972'
, Y1973 bigint '$.Year_1973'
, Y1974 bigint '$.Year_1974'
, Y1975 bigint '$.Year_1975'
, Y1976 bigint '$.Year_1976'
, Y1977 bigint '$.Year_1977'
, Y1978 bigint '$.Year_1978'
, Y1979 bigint '$.Year_1979'
, Y1980 bigint '$.Year_1980'
, Y1981 bigint '$.Year_1981'
, Y1982 bigint '$.Year_1982'
, Y1983 bigint '$.Year_1983'
, Y1984 bigint '$.Year_1984'
, Y1985 bigint '$.Year_1985'
, Y1986 bigint '$.Year_1986'
, Y1987 bigint '$.Year_1987'
, Y1988 bigint '$.Year_1988'
, Y1989 bigint '$.Year_1989'
, Y1990 bigint '$.Year_1990'
, Y1991 bigint '$.Year_1991'
, Y1992 bigint '$.Year_1992'
, Y1993 bigint '$.Year_1993'
, Y1994 bigint '$.Year_1994'
, Y1995 bigint '$.Year_1995'
, Y1996 bigint '$.Year_1996'
, Y1997 bigint '$.Year_1997'
, Y1998 bigint '$.Year_1998'
, Y1999 bigint '$.Year_1999'
, Y2000 bigint '$.Year_2000'
, Y2001 bigint '$.Year_2001'
, Y2002 decimal(19,1) '$.Year_2002'
, Y2003 decimal(19,1) '$.Year_2003'
, Y2004 decimal(19,1) '$.Year_2004'
, Y2005 decimal(19,1) '$.Year_2005'
, Y2006 decimal(19,1) '$.Year_2006'
, Y2007 decimal(19,1) '$.Year_2007'
, Y2008 decimal(19,1) '$.Year_2008'
, Y2009 decimal(19,1) '$.Year_2009'
, Y2010 decimal(19,1) '$.Year_2010'
, Y2011 decimal(19,1) '$.Year_2011'
, Y2012 decimal(19,1) '$.Year_2012'
, Y2013 decimal(19,1) '$.Year_2013'
, Y2014 decimal(19,1) '$.Year_2014'
, Y2015 decimal(19,1) '$.Year_2015'
, Y2016 bigint '$.Year_2016'
) AS DAT; 

As can be seen the decimal values were converted to bigint to preserve the same definition. Moreover, this enables data processing later, as no additional (implicit) conversions are necessary. 

Also, the columns’ names were changed either for simplification/convenience or simply taste. 

Writing such a monster query can be time-consuming, though preparing the metadata into Excel can decrease considerably the effort. With copy-past and a few tricks (e.g. replacing values, splitting columns based on a delimiter) one can easily prepare such a structure:

Source fieldTarget fieldDataTypeValueImport ClauseSelect Clause
CountryCountrynvarchar(max) emen Rep., Country nvarchar(max) '$.Country', DAT.Country
Country_CodeCountryCodenvarchar(3) YEM, CountryCode nvarchar(3) '$.Country_Code', DAT.CountryCode
Year_1960Y1960bigint5172135, Y1960 bigint '$.Year_1960', DAT.Y1960
Year_1961Y1961bigint5260501, Y1961 bigint '$.Year_1961', DAT.Y1961
Year_1962Y1962bigint5351799, Y1962 bigint '$.Year_1962', DAT.Y1962
Year_1963Y1963bigint5446063, Y1963 bigint '$.Year_1963', DAT.Y1963
Year_1964Y1964bigint5543339, Y1964 bigint '$.Year_1964', DAT.Y1964
Year_1965Y1965bigint5643643, Y1965 bigint '$.Year_1965', DAT.Y1965
Year_1966Y1966bigint5748588, Y1966 bigint '$.Year_1966', DAT.Y1966
Year_1967Y1967bigint5858638, Y1967 bigint '$.Year_1967', DAT.Y1967
Year_1968Y1968bigint5971407, Y1968 bigint '$.Year_1968', DAT.Y1968
Year_1969Y1969bigint6083619, Y1969 bigint '$.Year_1969', DAT.Y1969
Year_1970Y1970bigint6193810, Y1970 bigint '$.Year_1970', DAT.Y1970
Year_1971Y1971bigint6300554, Y1971 bigint '$.Year_1971', DAT.Y1971
Year_1972Y1972bigint6407295, Y1972 bigint '$.Year_1972', DAT.Y1972
Year_1973Y1973bigint6523452, Y1973 bigint '$.Year_1973', DAT.Y1973
Year_1974Y1974bigint6661566, Y1974 bigint '$.Year_1974', DAT.Y1974
Year_1975Y1975bigint6830692, Y1975 bigint '$.Year_1975', DAT.Y1975
Year_1976Y1976bigint7034868, Y1976 bigint '$.Year_1976', DAT.Y1976
Year_1977Y1977bigint7271872, Y1977 bigint '$.Year_1977', DAT.Y1977
Year_1978Y1978bigint7536764, Y1978 bigint '$.Year_1978', DAT.Y1978
Year_1979Y1979bigint7821552, Y1979 bigint '$.Year_1979', DAT.Y1979
Year_1980Y1980bigint8120497, Y1980 bigint '$.Year_1980', DAT.Y1980
Year_1981Y1981bigint8434017, Y1981 bigint '$.Year_1981', DAT.Y1981
Year_1982Y1982bigint8764621, Y1982 bigint '$.Year_1982', DAT.Y1982
Year_1983Y1983bigint9111097, Y1983 bigint '$.Year_1983', DAT.Y1983
Year_1984Y1984bigint9472170, Y1984 bigint '$.Year_1984', DAT.Y1984
Year_1985Y1985bigint9847899, Y1985 bigint '$.Year_1985', DAT.Y1985
Year_1986Y1986bigint10232733, Y1986 bigint '$.Year_1986', DAT.Y1986
Year_1987Y1987bigint10628585, Y1987 bigint '$.Year_1987', DAT.Y1987
Year_1988Y1988bigint11051504, Y1988 bigint '$.Year_1988', DAT.Y1988
Year_1989Y1989bigint11523267, Y1989 bigint '$.Year_1989', DAT.Y1989
Year_1990Y1990bigint12057039, Y1990 bigint '$.Year_1990', DAT.Y1990
Year_1991Y1991bigint12661614, Y1991 bigint '$.Year_1991', DAT.Y1991
Year_1992Y1992bigint13325583, Y1992 bigint '$.Year_1992', DAT.Y1992
Year_1993Y1993bigint14017239, Y1993 bigint '$.Year_1993', DAT.Y1993
Year_1994Y1994bigint14692686, Y1994 bigint '$.Year_1994', DAT.Y1994
Year_1995Y1995bigint15320653, Y1995 bigint '$.Year_1995', DAT.Y1995
Year_1996Y1996bigint15889449, Y1996 bigint '$.Year_1996', DAT.Y1996
Year_1997Y1997bigint16408954, Y1997 bigint '$.Year_1997', DAT.Y1997
Year_1998Y1998bigint16896210, Y1998 bigint '$.Year_1998', DAT.Y1998
Year_1999Y1999bigint17378098, Y1999 bigint '$.Year_1999', DAT.Y1999
Year_2000Y2000bigint17874725, Y2000 bigint '$.Year_2000', DAT.Y2000
Year_2001Y2001bigint18390135, Y2001 bigint '$.Year_2001', DAT.Y2001
Year_2002Y2002decimal(19,1) 18919179.0, Y2002 decimal(19,1) '$.Year_2002', Cast(DAT.Y2002 as bigint) Y2002
Year_2003Y2003decimal(19,1) 19462086.0, Y2003 decimal(19,1) '$.Year_2003', Cast(DAT.Y2003 as bigint) Y2003
Year_2004Y2004decimal(19,1) 20017068.0, Y2004 decimal(19,1) '$.Year_2004', Cast(DAT.Y2004 as bigint) Y2004
Year_2005Y2005decimal(19,1) 20582927.0, Y2005 decimal(19,1) '$.Year_2005', Cast(DAT.Y2005 as bigint) Y2005
Year_2006Y2006decimal(19,1) 21160534.0, Y2006 decimal(19,1) '$.Year_2006', Cast(DAT.Y2006 as bigint) Y2006
Year_2007Y2007decimal(19,1) 21751605.0, Y2007 decimal(19,1) '$.Year_2007', Cast(DAT.Y2007 as bigint) Y2007
Year_2008Y2008decimal(19,1) 22356391.0, Y2008 decimal(19,1) '$.Year_2008', Cast(DAT.Y2008 as bigint) Y2008
Year_2009Y2009decimal(19,1) 22974929.0, Y2009 decimal(19,1) '$.Year_2009', Cast(DAT.Y2009 as bigint) Y2009
Year_2010Y2010decimal(19,1) 23606779.0, Y2010 decimal(19,1) '$.Year_2010', Cast(DAT.Y2010 as bigint) Y2010
Year_2011Y2011decimal(19,1) 24252206.0, Y2011 decimal(19,1) '$.Year_2011', Cast(DAT.Y2011 as bigint) Y2011
Year_2012Y2012decimal(19,1) 24909969.0, Y2012 decimal(19,1) '$.Year_2012', Cast(DAT.Y2012 as bigint) Y2012
Year_2013Y2013decimal(19,1) 25576322.0, Y2013 decimal(19,1) '$.Year_2013', Cast(DAT.Y2013 as bigint) Y2013
Year_2014Y2014decimal(19,1) 26246327.0, Y2014 decimal(19,1) '$.Year_2014', Cast(DAT.Y2014 as bigint) Y2014
Year_2015Y2015decimal(19,1) 26916207.0, Y2015 decimal(19,1) '$.Year_2015', Cast(DAT.Y2015 as bigint) Y2015
Year_2016Y2016bigint27584213, Y2016 bigint '$.Year_2016', DAT.Y2016

Based on this structure, one can add two further formulas in Excel to prepare the statements as above and then copy the fields (last two columns were generated using the below formulas): 

=", " & TRIM(B2) & " " & C2 & " '$." & TRIM(A2) & "'" 
=", DAT." & TRIM(B2)

Consuming data in which the values are stored in a matrix structure can involve further challenges sometimes, even if this type of storage tends to save space. For example, adding the values for a new year would involve extending the table with one more column, while performing calculations between years would involve referencing each column in formulas. Therefore, transforming the data from a matrix to a normalized structure can have some benefit. This can be achieved by writing a query via the UNPIVOT operator:

-- unpivoting the data 
, RES.CountryCode
, Cast(Replace(RES.[Year], 'Y', '') as int) [Year]
, RES.Population
--INTO dbo.CountryPopulationPerYear
( -- basis data
	SELECT Country
	, CountryCode
	, Y1960, Y1961, Y1962, Y1963, Y1964, Y1965, Y1966, Y1967, Y1968, Y1969
	, Y1970, Y1971, Y1972, Y1973, Y1974, Y1975, Y1976, Y1977, Y1978, Y1979
	, Y1980, Y1981, Y1982, Y1983, Y1984, Y1985, Y1986, Y1987, Y1988, Y1989
	, Y1990, Y1991, Y1992, Y1993, Y1994, Y1995, Y1996, Y1997, Y1998, Y1999
	, Y2000, Y2001, Y2002, Y2003, Y2004, Y2005, Y2006, Y2007, Y2008, Y2009
	, Y2010, Y2011, Y2012, Y2013, Y2014, Y2015, Y2016
	FROM dbo.CountryPopulation
UNPIVOT  -- unpivot logic
   (Population FOR [Year] IN  (Y1960, Y1961, Y1962, Y1963, Y1964, Y1965, Y1966, Y1967, Y1968, Y1969
, Y1970, Y1971, Y1972, Y1973, Y1974, Y1975, Y1976, Y1977, Y1978, Y1979
, Y1980, Y1981, Y1982, Y1983, Y1984, Y1985, Y1986, Y1987, Y1988, Y1989
, Y1990, Y1991, Y1992, Y1993, Y1994, Y1995, Y1996, Y1997, Y1998, Y1999
, Y2000, Y2001, Y2002, Y2003, Y2004, Y2005, Y2006, Y2007, Y2008, Y2009
, Y2010, Y2011, Y2012, Y2013, Y2014, Y2015, Y2016)

Also this can be performed in two steps, first preparing the query, and in a final step inserting the data into a table (e.g. dbo.CountryPopulationPerYear) on the fly (re-execute the previous query after uncommenting the INSERT clause to generate the table). 

--reviewing the data 
SELECT Country
, CountryCode
, AVG(Population) AveragePopulation
, Max(Population) - Min(Population) RangePopulation
FROM dbo.CountryPopulationPerYear
WHERE [Year] BETWEEN 2010 AND 2019
GROUP BY Country
, CountryCode
ORDER BY Country

On the other side making comparisons between consecutive years is easier when using a matrix structure: 

--reviewing the data 
SELECT Country
, CountryCode
, Y2016
, Y2010
, Y2010-Y2010 [2016-2010]
, Y2011-Y2010 [2011-2010]
, Y2012-Y2011 [2011-2011]
, Y2013-Y2012 [2011-2012]
, Y2014-Y2013 [2011-2013]
, Y2015-Y2014 [2011-2014]
, Y2016-Y2015 [2011-2015]
FROM dbo.CountryPopulation
ORDER BY Country

Unless the storage space is a problem, in theory one can store the data in both formats as there can be requests which can benefit from one structure or the other. 

Happy coding!
Related Posts Plugin for WordPress, Blogger...