SQL Troubles: transformations

Showing posts with label transformations. Show all posts

01 September 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 83: Mapping the Future - A 25-Year Evolution of AI)

Prompt Engineering Series

Prompt: "write a post of 600 words that depicts a map that reflects the evolution of AI for the next 25 years"

Introduction

Artificial Intelligence (AI) is no longer a distant dream - it’s a living, learning force reshaping our world. But what lies ahead? If we were to chart a map of AI’s evolution over the next 25 years, it would resemble a dynamic landscape of breakthroughs, ethical crossroads, and societal transformation.

Let’s take a journey through this imagined terrain.

Phase 1: 2025–2030 - The Age of Specialization

In the next five years, AI will become deeply embedded in vertical industries:

Healthcare: AI will assist in diagnostics, drug discovery, and personalized treatment plans.
Finance: Predictive models will dominate risk assessment, fraud detection, and algorithmic trading.
Education: Adaptive learning platforms will tailor content to individual student needs.

This phase is marked by narrow intelligence - systems that excel in specific domains but lack general reasoning. The focus will be on trust, transparency, and explainability, as regulators begin to demand accountability for AI-driven decisions.

Phase 2: 2030–2035 - The Rise of Generalization

By the early 2030s, we’ll witness the emergence of Artificial General Intelligence (AGI) prototypes - systems capable of transferring knowledge across domains.

Key developments will include:

Unified models that can write code, compose music, and conduct scientific research.
Self-improving architectures that optimize their own learning processes.
Human-AI collaboration frameworks where machines act as creative partners, not just tools.

This era will challenge our definitions of intelligence, creativity, and even consciousness. Ethical debates will intensify around autonomy, rights, and the boundaries of machine agency.

Phase 3: 2035–2040 - The Cognitive Convergence

As AGI matures, AI will begin to mirror human cognitive functions more closely:

Emotional modeling: AI will simulate empathy, persuasion, and social nuance.
Meta-reasoning: Systems will reflect on their own limitations and biases.
Synthetic consciousness debates: Philosophers and technologists will grapple with whether machines can possess subjective experience.

This phase will see AI integrated into governance, law, and diplomacy. Machines may advise on policy, mediate conflicts, or even represent interests in global forums.

Phase 4: 2040–2045 - The Age of Integration

AI will no longer be a separate entity - it will be woven into the fabric of everyday life:

Neural interfaces will allow direct brain-AI communication.
Smart environments will anticipate human needs and adapt in real time.
Digital twins will simulate entire cities, economies, and ecosystems for planning and resilience.

The line between human and machine cognition will blur. Society will face existential questions: What does it mean to be human in a world of synthetic minds?

Phase 5: 2045–2050 - The Post-Intelligence Era

By mid-century, AI may surpass human intelligence in most domains. This doesn’t mean obsolescence - it means redefinition.

Collective intelligence: Humans and machines will form hybrid networks of decision-making.
Ethical superstructures: Global frameworks will govern AI rights, responsibilities, and coexistence.
Cultural transformation: Art, philosophy, and identity will evolve in response to machine creativity and cognition.

This era is not about domination - it’s about symbiosis. The map ends not with a singularity, but with a new equilibrium.

Navigating the Map

This 25-year map is speculative, but it’s grounded in current trajectories. It’s a tool for reflection, planning, and imagination. As we move forward, the challenge isn’t just technological - it’s philosophical, ethical, and deeply human.

AI’s evolution will be shaped not just by algorithms, but by our choices. The map is ours to draw - and to walk.

Just try the prompt on Copilot or your favorite AI-powered assistant! Have you got a different/similar result? How big or important is the difference? Any other thoughts?
Just share the link to the post with me and I'll add it to this post as a resource!

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

21 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)

Graphical Representation Series

Introduction

Creating simple charts or more complex data visualizations may appear trivial for many, though their authors shouldn't forget that readers have different backgrounds, degrees of literacy, many of them not being maybe able to make sense of graphical displays, at least not without some help.

Beginners start with a limited experience and build upon it, then, on the road to mastery, they get acquainted with the many possibilities, a deeper sense is achieved and the choices become a few. Independently of one's experience, there are seldom 'yes' and 'no' answers for the various choices, but everything is a matter of degree that varies with one's experience, available time, audience's expectations, and many more aspects might be considered in time.

The following questions are intended to expand, respectively narrow down our choices when dealing with data visualizations from a data professional's perspective. The questions are based mainly on [1] though they were extended to include a broader perspective.

General Questions

Where does the data come from? Is the source reliable, representative (for the whole population in scope)? Is the data source certified? Are yhe data actual?

Are there better (usable) sources? What's the effort to consider them? Does the data overlap? To what degree? Are there any benefits in merging the data? How much this changes the overall picture? Are the changes (in trends) explainable?

Was the data collected? How, from where, and using what method? [1] What methodology/approach was used?

What's the dataset about? Can one recognize the data, the (data) entities, respectively the structures behind? How big is the fact table (in terms of rows and columns)? How many dimensions are in scope?

What transformations, calculations or modifications have been applied? What was left out and what's the overall impact?

Any significant assumptions were made? [1] Were the assumptions clearly stated? Are they entitled? Is it more to them?

Were any transformation applied? Do the transformations change any data characteristics? Were they adequately documented/explained? Do they make sense? Was it something important left out? What's the overall impact?

What criteria were used to include/exclude data from the display? [1] Are the criteria adequately explained/documented? Do they make sense?

Are similar data publicly available? Is it (freely) accessible/usable? To what degree? How much do the datasets overlap? Is there any benefit to analyze/use the respective data? Are the characteristics comparable? To what degree?

Dataviz Questions

What's the title/subtitle of the chart? Is it meaningful for the readers? Does the title reflect the data, respectively the findings adequately? Can it be better formulated? Is it an eye-catcher? Does it meet the expectations?

What data is shown? Of what type? At what level is the data aggregated?

What chart (type) is being used? [1] Are the readers familiar with the chart type? Does it needs further introduction/clarifications? Are there better means to represent the data? Does the chart offer the appropriate perspective? Does it make sense to offer different (complementary) perspective(s)? To what degree other perspectives help?

What items of data do the marks represent? What value associations do the attributes represent? [1] Are the marks visible? Are the marks adequately presented (e.g. due to missing data)?

What range of values are displayed? [1] What approximation the values support? To what degree can the values be rounded without losing meaning?

Is the data categorical, ordinal or continuous?

Are the axes property chosen/displayed/labeled? Is the scale properly chosen (linear, semilogarithmic, logarithmic), respectively displayed? Do they emphasize, diminish, distort, simplify, or clutter the information?

What features (shapes, patterns, differences or connections) are observable, interesting or vital for understanding the chart? [1]

Where are the largest, mid-sized and smallest values? (aka ‘stepped magnitude’ judgements). [1]

Where lie the most/least values? Where is the average or normal? (aka ‘global comparison’ judgements)” [1] How are the values distributed? Are there any outliers present? Are they explainable?

What features are expected or unexpected? [1] To what degree are they unexpected?

What features are important given the subject? [1]

What shapes and patterns strike readers as being semantically aligned with the subject? [1]

What is the overall feeling when looking at the final result? Is the chart overcrowded? Can anything be left out/included?

What colors were used? [1] Are the colors adequately chosen, respectively meaningful? Do they follow the general recommendations?

What colors, patterns, forms do readers see first? What impressions come next, respectively last longer?

Are the various elements adequately/intuitively positioned/distinguishable? What's the degree of overlapping/proximity? Do the elements respect an intuitive hierarchy? Do they match readers' expectations, respectively the best practices in scope? Are the deviations entitled?

Is the space properly used? To what degree? Are there major gaps?

Know Your Audience

What audience targets the visualization? Which are its characteristics (level of experience with data visualizations; authors, experts or casual attendees)? Are there any accidental attendees? How likely is the audience to pay attention?

What is audience’s relationship with the subject matter? What knowledge do they have or, conversely, lack about the subject? What assistance might they need to interpret the meaning of the subject? Do they have the capacity to comprehend what it means to them? [1]

Why do the audience wants/needs to understand the topic? Are they familiar, respectively actively interested or more passive? Is it able to grasp the intended meaning? [1] To what degree? What kind of challenges might be involved, of what nature?

What is their motivation? Do they have a direct, expressed need or are they more passive and indifferent? Is it needed a way to persuade them or even seduce them to engage? [1] Can this be done without distorting the data and its meaning(s)?

What are their visualization literacy skill set? Do they require assistance perceiving the chart(s)? Are they sufficiently comfortable with operating features of interactivity? Do they have any visual accessibility issues (e.g. red–green color blindness)? Do they need to be (re)factored into the design? [1]

Reflections

What has been learnt? Has it reinforced or challenged existing knowledge? [1] Was new knowledge gained? How valuable is this knowledge? Can it be reused? In which contexts?

Do the findings meet one's expectations? To what degree? Were the expectations entitled? On what basis? What's missing? What's gaps' relevance?

What feelings have been stirred? Has the experience had an impact emotionally? [1] To what degree? Is the impact positive/negative? Is the reaction entitled/explainable? Are there any factors that distorted the reactions? Are they explainable? Do they make sense?

What does one do with this understanding? Is it just knowledge acquired or something to inspire action (e.g. making a decision or motivating a change in behavior)? [1] How relevant/valuable is the information for us? Can it be used/misused? To what degree?

Are the data and its representation trustworthy? [1] To what degree?

Previous Post <<||>> Next Post

References:
[1] Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019

01 April 2024

📊R Language: Data Transformations (Part I: Temperatures' comparison between F° and C°)

The time series used for weather analysis use either Fahrenheit (F°) or Celsius (C°) for the temperature values. Looking at the A and B plots below that represent the values of the same dataset in F°, respectively C°, there seems to be no difference between the two plots independently on whether one works with F° or C°, however the scales are different. Once one uses the same scale for both values (see C) the plots are distorted according to the formula used for transformation.

Comments:
(1) Typically, it makes sense to adapt the temperature scale to the audience, though on the Web there will be always a mix of audiences (and that's why weather websites allow to choose one of the values).
(2) Not starting from 0 might show in the end the same trend at same scale, though the behavior can change occasionally. As long as the Y-axis is correctly labeled, this shouldn't be a problem. Conversely, it's better to control the scale and provide the min-max values for the axis accordingly.
(3) When creating such plots, it's important to be aware of the distortion that might be introduced by transformations. For linear transformations of the type a*x+b, the value of the "a" coefficient tells how much the resulting values are stretched or contracted.

I used as exemplification the airquality dataset which contains data for 1973, the temperature being given in F°. Unfortunately, the dataset contains only the day and the month, so the date must be constructed and added to the dataset. For simplification, I've added the calculated temperature in C° as column as well:

#reviewing the data
help("airquality")

#preparing the data
head(airquality)
airquality$date <- with(airquality, as.Date(ISOdate(1973, Month, Day))) #adding the date
airquality$TempC <- with(airquality, (Temp - 32) * 5/9) #adding the temperature in C°
head(airquality)

And, here's the code used to generate the plots:

#Temperatures' comparison between F° and C°
par(mfrow = c(2,2)) #1x2 matrix display

plot(airquality$date, airquality$Temp, ylab="Temperature (F°)", xlab="date", type="l", col="blue", main="A")

plot(airquality$date, airquality$TempC, ylab="Temperature (C°)", xlab="date", type="l", col="brown", main="B")

plot(airquality$date, airquality$Temp, ylab="Temperature (F°) vs (C°)", xlab="date", ylim=c(0,100), type="l", col="blue", main="C")
lines(airquality$date, airquality$TempC, col="brown")

# using inline formula
plot(airquality$date, (airquality$Temp - 32) * 5/9, ylab="(Temp-32)*5/9", xlab="date", ylim=c(0,100), type="l", col="brown", main="D")

mtext("© sql-troubles@blogspot.com @sql_troubles, 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)
title("Temperatures' comparison between F° and C°", line = -1, outer = TRUE)

In the fourth plot I directly used the formula for transforming the values from F° and C°. If the values based on the formula need to be used repeatedly, it's probably better to add a column to the dataset.

Unfortunately, the standard library has its limitations when creating visualizations. While writing this post I tried to work also with the plotly library, which offers a richer set of tools and can be used to create wonderful visualizations (though it proves also more complex to use).

install.packages("plotly")
library("plotly")

Here's the code used to plot the below graphic (the points have labels, much like in Power BI):

fig <- plot_ly(airquality, type = 'scatter', mode = 'lines+markers')%>%
  add_trace(x = ~date, y = ~Temp, name = 'Temp (F)')%>%
  add_trace(x = ~date, y = ~TempC, name = 'Temp (C)')%>%
  layout(showlegend = F, title="Temperatures' comparison between K° and C°")

fig

The temperatures via Plotly

Happy coding!

Previous Post <<||>> Next Post

15 November 2018

🔭Data Science: Transformations (Just the Quotes)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithmic transformation serves several purposes: (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"Compound errors can begin with any of the standard sorts of bad statistics - a guess, a poor sample, an inadvertent transformation, perhaps confusion over the meaning of a complex statistic. People inevitably want to put statistics to use, to explore a number's implications. [...] The strengths and weaknesses of those original numbers should affect our confidence in the second-generation statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"All forms of complex causation, and especially nonlinear transformations, admittedly stack the deck against prediction. Linear describes an outcome produced by one or more variables where the effect is additive. Any other interaction is nonlinear. This would include outcomes that involve step functions or phase transitions. The hard sciences routinely describe nonlinear phenomena. Making predictions about them becomes increasingly problematic when multiple variables are involved that have complex interactions. Some simple nonlinear systems can quickly become unpredictable when small variations in their inputs are introduced." (Richard N Lebow, "Forbidden Fruit: Counterfactuals and International Relations", 2010)

"Either a logarithmic or a square-root transformation of the data would produce a new series more amenable to fit a simple trigonometric model. It is often the case that periodic time series have rounded minima and sharp-peaked maxima. In these cases, the square root or logarithmic transformation seems to work well most of the time." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Transformations of data alter statistics. For example, the mean of a data set can be found, but it is not easy to relate the mean of a data set to the mean of the logarithm of that data set. The median is far friendlier to transformations. If the median of a data set is found, then the logarithm of the data set is analyzed; the median of the log transformed data will be the log of the original median." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Transforming data to measurements of a different kind can clarify and simplify hypotheses that have already been generated and can reveal patterns that would otherwise be hidden." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Feature generation (or engineering, as it is often called) is where the bulk of the time is spent in the machine learning process. As social science researchers or practitioners, you have spent a lot of time constructing features, using transformations, dummy variables, and interaction terms. All of that is still required and critical in the machine learning framework. One difference you will need to get comfortable with is that instead of carefully selecting a few predictors, machine learning systems tend to encourage the creation of lots of features and then empirically use holdout data to perform regularization and model selection. It is common to have models that are trained on thousands of features." (Rayid Ghani & Malte Schierholz, "Machine Learning", 2017)

"Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. 'Structure' has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Symmetries directly point to invariants, which pinpoint intrinsic properties of the data and of the background empirical domain of interest. As our data models change, so too do our perspectives on analysing data." (Fionn Murtagh, "Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics", 2018)

"Many statistical procedures perform more effectively on data that are normally distributed, or at least are symmetric and not excessively kurtotic (fat-tailed), and where the mean and variance are approximately constant. Observed time series frequently require some form of transformation before they exhibit these distributional properties, for in their 'raw' form they are often asymmetric." (Terence C Mills, "Applied Time Series Analysis: A practical guide to modeling and forecasting", 2019)

24 December 2011

📉Graphical Representation: Transformations (Just the Quotes)

"Data should not be forced into an uncomfortable or improper mold. For example, data that is appropriate for line graphs is not usually appropriate for circle charts and in any case not without some arithmetic transformation. Only graphs that are designed to fit the data can be used profitably." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"The prevailing style of management must undergo transformation. A system cannot understand itself. The transformation requires a view from outside. The aim [...] is to provide an outside view - a lens - that I call a system of profound knowledge. It provides a map of theory by which to understand the organizations that we work in." (Dr. W. Edwards Deming, "The New Economics for Industry, Government, Education", 1994)

"The real value of dashboard products lies in their ability to replace hunt‐and‐peck data‐gathering techniques with a tireless, adaptable, information‐flow mechanism. Dashboards transform data repositories into consumable information." (Gregory L Hovis, "Stop Searching for Information Monitor it with Dashboard Technology," DM Direct, 2002)

"Data is transformed into graphics to understand. A map, a diagram are documents to be interrogated. But understanding means integrating all of the data. In order to do this it’s necessary to reduce it to a small number of elementary data. This is the objective of the 'data treatment' be it graphic or mathematic." (Jacques Bertin [interview], 2003)

"A grammar of graphics facilitates coordinated activity in a set of relatively autonomous components. This grammar enables us to develop a system in which adding a graphic to a frame (say, a surface) requires no adjustments or changes in definitions other than the simple message 'add this graphic'. Similarly, we can remove graphics, transform scales, permute attributes, and make other alterations without redefining the basic structure."(Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Any conclusion drawn from an analysis of a transformed variable must be retranslated into the original domain - which is usually not an easy task. A special handling of outliers, be it a complete removal, or just visual suppression such as hot-selection or shadowing, must have a cogent motivation. At any rate, transformations of data are usually part of a data preprocessing step that might precede a data analysis. Also it can be motivated by initial findings in a data analysis which revealed yet undiscovered problems in the dataset." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"It is the responsibility of the ‘transformer’ to understand the data, to get all necessary information from the expert, to decide what is worth transmitting to the public, how to make it understandable, how to link it with general knowledge or with information already given in other charts. In this sense, the transformer is the trustee of the public." (Marie Neurath & Robin Kinross, "The transformer: principles of making Isotype charts", 2009)

"Data captures actions and characteristics of the real world and transforms them into something that can be examined and explored after the fact." (Zach Gemignani et al, "Data Fluency", 2014)

"Many statistical procedures perform more effectively on data that are normally distributed, or at least are symmetric and not excessively kurtotic" (fat-tailed), and where the mean and variance are approximately constant. Observed time series frequently require some form of transformation before they exhibit these distributional properties, for in their 'raw' form they are often asymmetric." (Terence C Mills, "Applied Time Series Analysis: A practical guide to modeling and forecasting", 2019)

"Knowing the semantics of your data helps with sensible data transformations." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Data becomes more useful once it’s transformed into a data visualization or used in a data story. Data storytelling is the ability to effectively communicate insights from a dataset using narratives and visualizations. It can be used to put data insights into context and inspire action from your audience. Color can be very helpful when you are trying to make information stand out within your data visualizations." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

SQL Troubles

Pages

01 September 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 83: Mapping the Future - A 25-Year Evolution of AI)

21 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)

01 April 2024

📊R Language: Data Transformations (Part I: Temperatures' comparison between F° and C°)

15 November 2018

🔭Data Science: Transformations (Just the Quotes)

24 December 2011

📉Graphical Representation: Transformations (Just the Quotes)

About Me