SQL Troubles: IoT

Showing posts with label IoT. Show all posts

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Intelligence (RTI) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Real-Time Intelligence [RTI]

[def]

{goal} provide a complete real-time SaaS platform within MF

{benefit} helps gain actionable insights from data, with the ability to ingest, transform, query, visualize, and act on it in real time [4]

{goal} provides a single place for data-in-motion

{benefit} allows to pull event streams from Real Time Hub

provides a single data estate for data in motion simplifying the ingestion, curation and processing of streaming data from Microsoft and external sources [4]
empowers users to extract insights and visualize data in motion [1]

{goal} enable rapid solution development

{benefit} provides a range of no-code, low-code and pro-code experiences for various scenarios [4]

everything from business insight discovery to complex stream processing, and application and model development [4]

{goal} enable real-time AI insights

{benefit} scales beyond human monitoring and drive actions with built in, automated capabilities [4]

allows anyone in the organization to take advantage of [4]

offers an end-to-end solution for

event-driven scenarios

⇐ rather than schedule-driven solutions.

streaming data
data logs

{benefit} help customers accelerate speed and precision of business by providing [4]

{goal} operational efficiency

by allowing to streamline processes and make data driven decisions with accurate, up to date information [4]

{goal} end-to-end visibility

by allowing to gain a holistic understanding of business health and discover actionable insights for timely action [4]

{goal} competitive advantage

by allowing to quickly react to shifting market trends, identify opportunities and mitigate risk in real time [4]

seamlessly connects time-based data from various sources using no-code connectors [1]

enables immediate

visual insights
geospatial analysis
trigger-based reactions
⇐ all are part of an organization-wide data catalog [1]

⇐ time oriented data is difficult to manage, yet critical for success [4]

{challenge} capture high throughput data from disparate sources in real time [4]
{challenge} model scenarios using event data [4]
{challenge} choose from an array of bespoke technologies and data formats [4]
{challenge} leverage the power of AI against data in real time [4]
without the ability to leverage time oriented data, businesses are vulnerable to risks [4]

{risk} poor decision-making
{risk} financial loss
{risk} reduced operational efficiency
{risk} impaired data integrity
{risk} non-compliance
{risk} negative user experience

{capability} single unified SaaS solution

in opposition to a fragmented, fragile tech stack
allows to ingest & process all event sources, in any data format [4]

one can connect to diverse streaming sources and leverage no code and low code experiences to process and route quickly [4]

via out of the box connectors for streaming and event data sources [4]

events can be routed to other Fabric and 3rd party entities [4]
organizational BI reports can be enhanced with enriched data [4]

allows to analyze and transform data event streams using queries and visual exploration to discover insights in real time [4]

one can manage an unlimited amount of data [4]
multiple databases can be monitored and managed at once [4]

allows to act quickly on top of data

via triggers and alerts on changing data to respond automatically and set action when specific conditions are detected [4]

helps drive actions on a per instance state that evolves over time [4]
helps to act on data without needing a deep schema and semantic modeling [4]

{capability} accessible data and analytics tools

in opposition to advanced skillsets required

{capability} real-time stream processing

in opposition to batch data processing

handles

data ingestion
data transformation
data storage
data analytics
data visualization
data tracking
AI
real-time actions

can be used for

data analysis
immediate visual insights
centralization of data in motion for an organization
actions on data
efficient querying, transformation, and storage of large volumes of structured or unstructured data [1]

helps evaluate data from

IoT systems
system logs
free text
semi structured data, or contribute data for consumption by others in your organization,

provides a versatile solution

transforms the data into a dynamic, actionable resource that drives value across the entire organization

its components are built on trusted, core Microsoft rather than schedule-driven solutions

⇐ together they extend the overall Fabric capabilities to provide event-driven solutions [1]

{feature} Real-Time hub

serves as a centralized catalog that facilitates the easy access, addition, exploration, and data sharing [1]
expands the range of data sources

⇐ it enables broader insights and visual clarity across various domains [1]

ensures that data is accessible to all [1]

promoting quick decision-making and informed action

the sharing of streaming data from diverse sources unlocks the potential to build BI solutions across the organization [1]
use the data consumption tools to explore the data [1]

{feature} Real-Time dashboards

come equipped with out-of-the-box interactions

{benefit} simplify the process of understanding data, making it accessible to anyone who wants to make decision based on data in motion using visual tools, Natural Language and Copilot [1]

query the data in real-time as it’s being loaded [6]

every time a query is run, it leverages the latest data available in an Eventhouse or OneLake [6]

behave much like DirectQuery, but without the need to load data into a semantic model. [6]

{feature} Activator

{benefit} allows to turn insights into actions by setting up alerts from various parts of Fabric to react to data patterns or conditions in real-time [1]
takes events as they are being processed into Eventstreams or Eventhouses and connects them to downstream systems to make data actionable [6]

{feature} Real-Time hub events

a catalog of data in motionless
contains:

data streams

all data streams that are actively running in Fabric to which the user has access to
once a stream of data is connected, the entire SaaS solution becomes accessible [1]

Microsoft sources:

easily discover streaming sources that the users have and quickly configure ingestion of those sources into Fabric

e.g. Azure Event Hubs, Azure IoT Hub, Azure SQL DB CDC, Azure Cosmos DB CDC, PostgreSQL DB CDC

Fabric events

event-driven capabilities support real-time notifications and data processing

⇒ one can monitor and react to events [1]

e.g. Fabric Workspace Item events, Azure Blob Storage events

⇐ the events can be used to trigger other actions or workflows [1]

e.g. invoking a data pipeline or sending a notification via email.

the events can be sent to other destinations via eventstreams [1]

{feature} Eventstreams

event processing capabilities

⇐ behave like event listeners that wait for data to be pushed to them [6]

{benefit} allow to capture, transform, and route high volumes of real-time events to various destinations with a no-code experience [1]
support multiple data sources and data destinations [1]
{benefit} allow to do filtering, data cleansing, transformation, windowed aggregations, and dupe detection, to land the data in the needed shape [1]
one can use the content-based routing capabilities to send data to different destinations based on filters [1]
derived eventstreams allows constructing new streams as a result of transformations and/or aggregations that can be shared to consumers in Real-Time hub [1]

{feature} Eventhouses

the ideal analytics engine to process data in motion

scalable ingestion engine with the ability to handle up to millions of events per hour [6]

tailored to time-based, streaming events with structured, semi structured, and unstructured data [1]
data is automatically indexed and partitioned based on ingestion time

⇐ provides fast and complex analytic querying capabilities on high-granularity data [1]

the stored data can be made available in OneLake for consumption by other Fabric experiences [1]

⇐ the data is ready for lightning-fast query using various code, low-code, or no-code options in Fabric [1]

the data can be queried in native KQL or in T-SQL in the KQL query set [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Fabric (2024) What is Real-Time Intelligence? [link]
[2] Microsoft Fabric (2024) Real-Time Intelligence documentation in Microsoft Fabric [link]

[3] Microsoft Fabric Updates Blog (2024) Fabric workloads are now generally available! [link]

[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]
[5] Microsoft Fabric Community (2024) Benefits of Migrating to Fabric RTI [link]
[6] Microsoft Fabric Update Blog (2025) Operational Reporting with Microsoft Fabric Real-Time Intelligence [link]
[7] Microsoft Learn (2025) Get started with Real-Time Intelligence in Microsoft Fabric [link]
[8] Microsoft Learn (2025) Implement Real-Time Intelligence with Microsoft Fabric [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Learn (2024) Microsoft Fabric RTI Demo Application [link] [GitHub]
[R3] Microsoft Fabric Updates Blog (2024) Understanding Real-Time Intelligence usage reporting and billing [link]

[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
AI - Artificial Intelligence
CDC - Change Data Capture
DB - database
IoT - Internet of Things
KQL - Kusto Query Language
MF - Microsoft Fabric
RTI - Real-Time Intelligence
SaaS - Software-as-a-Service
SQL - Structured Query Language

31 December 2018

🔭Data Science: Big Data (Just the Quotes)

"If we gather more and more data and establish more and more associations, however, we will not finally find that we know something. We will simply end up having more and more data and larger sets of correlations." (Kenneth N Waltz, "Theory of International Politics Source: Theory of International Politics", 1979)

“There are those who try to generalize, synthesize, and build models, and there are those who believe nothing and constantly call for more data. The tension between these two groups is a healthy one; science develops mainly because of the model builders, yet they need the second group to keep them honest.” (Andrew Miall, “Principles of Sedimentary Basin Analysis”, 1984)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Big data can change the way social science is performed, but will not replace statistical common sense." (Thomas Landsall-Welfare, "Nowcasting the mood of the nation", Significance 9(4), 2012)

"Big Data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it." (Edd Wilder-James, "What is big data?", 2012) [source]

"The secret to getting the most from Big Data isn’t found in huge server farms or massive parallel computing or in-memory algorithms. Instead, it’s in the almighty pencil." (Matt Ariker, "The One Tool You Need To Make Big Data Work: The Pencil", 2012)

"Big data is the most disruptive force this industry has seen since the introduction of the relational database." (Jeffrey Needham, "Disruptive Possibilities: How Big Data Changes Everything", 2013)

"No subjective metric can escape strategic gaming [...] The possibility of mischief is bottomless. Fighting ratings is fruitless, as they satisfy a very human need. If one scheme is beaten down, another will take its place and wear its flaws. Big Data just deepens the danger. The more complex the rating formulas, the more numerous the opportunities there are to dress up the numbers. The larger the data sets, the harder it is to audit them." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"There is convincing evidence that data-driven decision-making and big data technologies substantially improve business performance. Data science supports data-driven decision-making - and sometimes conducts such decision-making automatically - and depends upon technologies for 'big data' storage and engineering, but its principles are separate." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Our needs going forward will be best served by how we make use of not just this data but all data. We live in an era of Big Data. The world has seen an explosion of information in the past decades, so much so that people and institutions now struggle to keep pace. In fact, one of the reasons for the attachment to the simplicity of our indicators may be an inverse reaction to the sheer and bewildering volume of information most of us are bombarded by on a daily basis. […] The lesson for a world of Big Data is that in an environment with excessive information, people may gravitate toward answers that simplify reality rather than embrace the sheer complexity of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The other buzzword that epitomizes a bias toward substitution is 'big data'. Today’s companies have an insatiable appetite for data, mistakenly believing that more data always creates more value. But big data is usually dumb data. Computers can find patterns that elude humans, but they don’t know how to compare patterns from different sources or how to interpret complex behaviors. Actionable insights can only come from a human analyst (or the kind of generalized artificial intelligence that exists only in science fiction)." (Peter Thiel & Blake Masters, "Zero to One: Notes on Startups, or How to Build the Future", 2014)

"We have let ourselves become enchanted by big data only because we exoticize technology. We’re impressed with small feats accomplished by computers alone, but we ignore big achievements from complementarity because the human contribution makes them less uncanny. Watson, Deep Blue, and ever-better machine learning algorithms are cool. But the most valuable companies in the future won’t ask what problems can be solved with computers alone. Instead, they’ll ask: how can computers help humans solve hard problems?" (Peter Thiel & Blake Masters, "Zero to One: Notes on Startups, or How to Build the Future", 2014)

"As business leaders we need to understand that lack of data is not the issue. Most businesses have more than enough data to use constructively; we just don't know how to use it. The reality is that most businesses are already data rich, but insight poor." (Bernard Marr, Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance, 2015)

"Big data is based on the feedback economy where the Internet of Things places sensors on more and more equipment. More and more data is being generated as medical records are digitized, more stores have loyalty cards to track consumer purchases, and people are wearing health-tracking devices. Generally, big data is more about looking at behavior, rather than monitoring transactions, which is the domain of traditional relational databases. As the cost of storage is dropping, companies track more and more data to look for patterns and build predictive models." (Neil Dunlop, "Big Data", 2015)

"Big Data often seems like a meaningless buzz phrase to older database professionals who have been experiencing exponential growth in database volumes since time immemorial. There has never been a moment in the history of database management systems when the increasing volume of data has not been remarkable." (Guy Harrison, "Next Generation Databases: NoSQL, NewSQL, and Big Data", 2015)

"Dimensionality reduction is essential for coping with big data - like the data coming in through your senses every second. A picture may be worth a thousand words, but it’s also a million times more costly to process and remember. [...] A common complaint about big data is that the more data you have, the easier it is to find spurious patterns in it. This may be true if the data is just a huge set of disconnected entities, but if they’re interrelated, the picture changes." (Pedro Domingos, "The Master Algorithm", 2015)

"Science’s predictions are more trustworthy, but they are limited to what we can systematically observe and tractably model. Big data and machine learning greatly expand that scope. Some everyday things can be predicted by the unaided mind, from catching a ball to carrying on a conversation. Some things, try as we might, are just unpredictable. For the vast middle ground between the two, there’s machine learning." (Pedro Domingos, "The Master Algorithm", 2015)

"The human side of analytics is the biggest challenge to implementing big data." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"To make progress, every field of science needs to have data commensurate with the complexity of the phenomena it studies. [...] With big data and machine learning, you can understand much more complex phenomena than before. In most fields, scientists have traditionally used only very limited kinds of models, like linear regression, where the curve you fit to the data is always a straight line. Unfortunately, most phenomena in the world are nonlinear. [...] Machine learning opens up a vast new world of nonlinear models." (Pedro Domingos, "The Master Algorithm", 2015)

"Underfitting is when a model doesn’t take into account enough information to accurately model real life. For example, if we observed only two points on an exponential curve, we would probably assert that there is a linear relationship there. But there may not be a pattern, because there are only two points to reference. [...] It seems that the best way to mitigate underfitting a model is to give it more information, but this actually can be a problem as well. More data can mean more noise and more problems. Using too much data and too complex of a model will yield something that works for that particular data set and nothing else." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"We are moving slowly into an era where Big Data is the starting point, not the end." (Pearl Zhu, "Digital Master: Debunk the Myths of Enterprise Digital Maturity", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Big data is, in a nutshell, large amounts of data that can be gathered up and analyzed to determine whether any patterns emerge and to make better decisions." (Daniel Covington, Analytics: Data Science, Data Analysis and Predictive Analytics for Business, 2016)

"Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"While Big Data, when managed wisely, can provide important insights, many of them will be disruptive. After all, it aims to find patterns that are invisible to human eyes. The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"Big Data allows us to meaningfully zoom in on small segments of a dataset to gain new insights on who we are." (Seth Stephens-Davidowitz, "Everybody Lies: What the Internet Can Tell Us About Who We Really Are", 2017)

"Effects without an understanding of the causes behind them, on the other hand, are just bunches of data points floating in the ether, offering nothing useful by themselves. Big Data is information, equivalent to the patterns of light that fall onto the eye. Big Data is like the history of stimuli that our eyes have responded to. And as we discussed earlier, stimuli are themselves meaningless because they could mean anything. The same is true for Big Data, unless something transformative is brought to all those data sets… understanding." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"The term [Big Data] simply refers to sets of data so immense that they require new methods of mathematical analysis, and numerous servers. Big Data - and, more accurately, the capacity to collect it - has changed the way companies conduct business and governments look at problems, since the belief wildly trumpeted in the media is that this vast repository of information will yield deep insights that were previously out of reach." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Just as they did thirty years ago, machine learning programs (including those with deep neural networks) operate almost entirely in an associational mode. They are driven by a stream of observations to which they attempt to fit a function, in much the same way that a statistician tries to fit a line to a collection of points. Deep neural networks have added many more layers to the complexity of the fitted function, but raw data still drives the fitting process. They continue to improve in accuracy as more data are fitted, but they do not benefit from the 'super-evolutionary speedup'." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. [...] The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement. [...] A third data science myth is that modern data science software is easy to use, and so data science is easy to do. [...] The last myth about data science [...] is the belief that data science pays for itself quickly. The truth of this belief depends on the context of the organization. Adopting data science can require significant investment in terms of developing data infrastructure and hiring staff with data science expertise. Furthermore, data science will not give positive results on every project." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Apart from the technical challenge of working with the data itself, visualization in big data is different because showing the individual observations is just not an option. But visualization is essential here: for analysis to work well, we have to be assured that patterns and errors in the data have been spotted and understood. That is only possible by visualization with big data, because nobody can look over the data in a table or spreadsheet." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"With the growing availability of massive data sets and user-friendly analysis software, it might be thought that there is less need for training in statistical methods. This would be naïve in the extreme. Far from freeing us from the need for statistical skills, bigger data and the rise in the number and complexity of scientific studies makes it even more difficult to draw appropriate conclusions. More data means that we need to be even more aware of what the evidence is actually worth." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Making big data work is harder than it seems. Statisticians have spent the past two hundred years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster, and cheaper these days, but we must not pretend that the traps have all been made safe. They have not." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Many people have strong intuitions about whether they would rather have a vital decision about them made by algorithms or humans. Some people are touchingly impressed by the capabilities of the algorithms; others have far too much faith in human judgment. The truth is that sometimes the algorithms will do better than the humans, and sometimes they won’t. If we want to avoid the problems and unlock the promise of big data, we’re going to need to assess the performance of the algorithms on a case-by-case basis. All too often, this is much harder than it should be. […] So the problem is not the algorithms, or the big datasets. The problem is a lack of scrutiny, transparency, and debate." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The problem is the hype, the notion that something magical will emerge if only we can accumulate data on a large enough scale. We just need to be reminded: Big data is not better; it’s just bigger. And it certainly doesn’t speak for itself." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"[...] the focus on Big Data AI seems to be an excuse to put forth a number of vague and hand-waving theories, where the actual details and the ultimate success of neuroscience is handed over to quasi- mythological claims about the powers of large datasets and inductive computation. Where humans fail to illuminate a complicated domain with testable theory, machine learning and big data supposedly can step in and render traditional concerns about finding robust theories. This seems to be the logic of Data Brain efforts today. (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"We live on islands surrounded by seas of data. Some call it 'big data'. In these seas live various species of observable phenomena. Ideas, hypotheses, explanations, and graphics also roam in the seas of data and can clarify the waters or allow unsupported species to die. These creatures thrive on visual explanation and scientific proof. Over time new varieties of graphical species arise, prompted by new problems and inner visions of the fishers in the seas of data." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"Visualisation is fundamentally limited by the number of pixels you can pump to a screen. If you have big data, you have way more data than pixels, so you have to summarise your data. Statistics gives you lots of really good tools for this." (Hadley Wickham)

SQL Troubles

Pages

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Intelligence (RTI) [Notes]

31 December 2018

🔭Data Science: Big Data (Just the Quotes)

About Me