SQL Troubles: data source

Showing posts with label data source. Show all posts

11 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Dashboards (RTD) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 10-Mar-2025

Real-Time Intelligence architecture [5]

[Microsoft Fabric] Real-Time Dashboard

[def]

a collection of tiles

optionally organized in pages

act as containers of tiles
organize tiles into logical groups

e.g. by data source or by subject area

used to create a dashboard with multiple views

e.g. dashboard with a drillthrough from a summary page to a details page [1]

each tile has

an underlying query
a visual representation

exists within the context of a workspace [1]

always associated with the workspace used to create it [1]

{concept} tile

uses KQL snippets to retrieve data and render visuals [1]
can be added directly from queries written in a KQL queryset [1]

{concept} data source

reusable reference to a specific database in the same workspace as the dashboard [1]

{concept} parameters

significantly improve dashboard rendering performance [1]
enable to use filter values as early as possible in the query

filtering is enabled when the parameter is included in the query associated with the tiles [1]

{concept} cloud connection

uses dashboard owner's identity to give access to the underlying data source to other users [2]
when not used for 90 days, it will expire [2]

⇒ a new gateway connection must be set up [2]
via Manage connections >> Gateways page >> Edit credentials and verify the user again

a separate connection is needed for each data source [2]

{feature} natively export KQL queries to a dashboard as visuals and later modify their underlying queries and visual formatting as needed [1]

the fully integrated dashboard experience provides improved query and visualization performance [1]

{feature} encrypted at rest

dashboards and dashboard-related metadata about users are encrypted at rest using Microsoft-managed keys [1]

{feature} auto refresh

allows to automatically update the data on a dashboard without manually reloading the page or clicking a refresh button [1]
can be set by a database editor

both editors and viewers can change the actual rate of auto refresh while viewing a dashboard [1]

database editors can limit the minimum refresh rate that any viewer can set

⇐ reduces the cluster load
when set, database users can't set a refresh rate lower than the minimum [1]

{feature} explore data

enables users to extend the exploration of dashboards beyond the data displayed in the tiles [3]

begins with viewing the data and its corresponding visualization as they appear on the tile [3]
users can add or removing filters and aggregations, and use further visualizations [3]

⇐ no knowledge of KQL is needed [3]

{feature} conditional formatting

allows users to format data points based on their values, utilizing

colors

{rule} color by condition

allows to set one or more logical conditions that must be met for a value to be colored [4]
available for table, stat, and multi stat visuals [4]

{rule} color by value

allows to visualize values on a color gradient [4]
available for table visuals [4]

tags
icons

can be applied either

to a specific set of cells within a designated column [4]
to entire rows [4]

one or more conditional formatting rules can be applied for each visual [4]

when multiple rules conflict, the last rule defined takes precedence over any previous ones [4]

{action} export dashboard

dashboards can be exported to a JSON file
can be useful in several scenarios

{scenario} version control

the file can be used to restore the dashboard to a previous version [1]

{scenario} dashboard template

the file can be used as template for creating new dashboards [1]

{scenario} manual editing
edit the file to modify the dashboard and imported the file back to the dashboard [1]

ADX dashboards can be exported and imported as RT dashboards [6]

{action} share dashboard

one can specify if the user can view, edit, or share [2]
⇐ the permissions are not for the underlying data [2]

permissions are set by defining the identity that the dashboard uses for accessing data from each data sources[2]
{type|default} pass-through identity

used when authenticating to access the underlying data source [2]
the user is only able to view the data in the tiles [2]

{type} dashboard editor’s identity:

allows the user to use editor’s identity, and thus permissions[2]

the editor defines a cloud connection that the dashboard uses to connect to the relevant data source [2]
only editors can define cloud connections and permissions for a specific real-time dashboard [2]

each editor that modifies the real-time dashboard needs to set up own cloud connection [2]
if a valid connection doesn't exist, the user is able to view the real-time dashboard but will only see data if they themselves have access to it [2]

{action} revoke a user’s access permissions

remove access from the dashboard [2]
remove the cloud connection.

via Settings >> Manage connections and gateways >> Options >> Remove

remove the user from the cloud connection.

via Settings >> Manage connections and gateways >> Options >> Manage users >> {select User} >> Delete

edit the Data source access permissions.

via Data source >> New data source >> edit >> Data source access >> Pass-through identity
⇐ the user uses own identity to access the data source [2]

{prerequisite} a workspace with a Microsoft Fabric-enabled capacity [1]
{prerequisite} a KQL database with data [1]
{setting} Users can create real-time dashboards [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Fabric: Create a Real-Time Dashboard [link]

[2] Microsoft Learn (2024) Fabric: Real-Time Dashboard permissions (preview) [link]

[3] Microsoft Learn (2024) Fabric: Explore data in Real-Time Dashboard tiles [link]
[4] Microsoft Learn (2024) Fabric: Apply conditional formatting in Real-Time Dashboard visuals [link]

[5] Microsoft Learn (2025) Fabric: Real Time Intelligence L200 Pitch Deck [link]
[6] Microsoft Fabric Updates Blog (2024) Easily recreate your ADX dashboards as Real-Time Dashboards in Fabric, by Michal Bar [link]
[7] Microsoft Learn (2025) Create Real-Time Dashboards with Microsoft Fabric [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Fabric Updates Blog (2024) Announcing Real-Time Dashboards generally available [link]

[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
ADX - Azure Data Explorer
KQL - Kusto Query Language
MF - Microsoft Fabric
RT - Real-Time

29 March 2021

Notes: Team Data Science Process (TDSP)

Team Data Science Process (TDSP)

an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently [1]
{goal} help customers fully realize the benefits of their analytics program [1]
{component} data science lifecycle definition
- {description} a framework to structure the development of data science projects [1]
- {goal} designed for data science projects that ship as part of intelligent applications that deploy ML & AI models for predictive analytics [1]
- {benefit} can be used in the context of other DM methodologies as they have common ground [1]
  - e.g. CRISP-DM, KDD
- {benefit} exploratory data science projects or improvised analytics projects can also benefit from using this process [1]
{component} standardized project structure
- {description} a directory structure that includes templates for project documents
  - ⇒makes it easy for team members to find information [1]
  - ⇐templates for the folder structure and required documents are provided in standard locations [1]
  - all code and documents are stored in an agile VCS tracking repository [1]
    - {recommendation} create a separate repository for each project on the VCS for versioning, information security, and collaboration [1]
- {benefit} organizes the code for the various activities [1]
- {benefit} allows tracking the progress [1]
- {benefit} provides checklist with key questions for each project to guarantee process and deliverables’ quality [1]
- {benefit} enables team collaboration [1]
- {benefit} allows closer tracking of the code for individual features [1]
- {benefit} enables teams to obtain better cost estimates [1]
- {benefit} helps build institutional knowledge across the organization [1]
{component} recommended infrastructure
- {description} a set of recommendations for the infrastructure and resources needed for analytics and storage [1]
- {benefit} addresses cloud and/or on-premises requirements [1]
- {benefit} enables reproducible analysis [1]
- {benefit} avoids infrastructure duplication [1]
  - ⇒minimizes inconsistencies and unnecessary infrastructure costs [1]
- {tools} tools are provided to provision the shared resources, track them, and allow each team member to connect to those resources securely [1]
- {good practice} create a consistent compute environment [1]
  - ⇐allows team members replicate and validate experiments [1]
{component} recommended tools and utilities
- {description} a set of recommendations for the tools and utilities needed for project’s execution [1]
- {benefit} help lower the barriers and increase the consistency of their adoption [1]
- {benefit} provides an initial set of tools and scripts to jump-start methodology’s adoption [1]
- {benefit} helps automate some of the common tasks in the data science lifecycle [1]
  - e.g. data exploration and baseline modeling [1]
- {benefit} well-defined structure provided for individuals to contribute shared tools and utilities into their team's shared code repository [1]
  - ⇐ resources can then be leveraged by other projects [1]
{phase} 1: business understanding
- {goal} define and document the business problem, its objectives, the needed attributes, and the metric(s) used to determine project’s success
- {goal} identify and document the relevant data sources
- {step} 1.1: define project’s objectives
  - elicit together with the stakeholders the requirements, define and document the problem and its objectives, respectively the metric(s) used to determine project’s success
    - requires a good understanding of the business processes, data and further characteristics
- {step} 1.2: identify data sources
  - identify the attributes and the data sources relevant to the problem under study
- {step} 1.3: define project plan and team*
  - develop a high-level milestone plan and identify the resources needed for executing it
- {tool} project charter
  - standard template that documents the business problem, the scope of the project, the business objectives and metric(s) used to determine project’s success
{phase} 2: data acquisition & understanding
- {goal} prepare the base dataset(s) as needed by the modeling phase into the target repository
- {goal} build the data ETL/ELT architecture and processes needed for provisioning the basis data
- {step} 2.1: ingest data
  - make the required data available for the team in the repository where the analytics operations take place
- {step} 2.2: explore data
  - understand data’s characteristics by leveraging specific tools (visualization, analysis)
  - prepare the data as needed for further processing
- {step} 2.3: set up pipelines
  - build the pipelines needed for data actualization and qualitative assessment [3]
  - set up a process to score new data or refresh the data regularly [3]
- {step} 2.4: feasibility analysis*
  - reevaluate the project to determine whether the value expected is sufficient to continue pursuing it
- {tool} data quality report
  - report that includes data summaries, data mappings, variable ranking, data qualitative assessment(s) and further information [3]
- {tool} solution architecture
  - diagram and/or textual-based description of the data pipeline(s), technical assumptions and further aspects
- {tool} data reports
  - document the structure and statistics of the raw data
- {tool} checkpoint decision
  - decision template document that
    - summarizes the findings of the feasibility analysis step
    - includes a set of choices and recommendations for the next steps
    - serves as basis for the decision on whether to continue or not the project, respectively what the next steps are
{phase} 3: modeling
- {goal} create a machine-learning model that addresses the prediction requirements and that's suitable for production
- {step} 3.1: feature engineering
  - the inclusion, aggregation, and transformation of raw variables to create the features used in the analysis [4]
    - ⇐requires a good understanding of how the features relate to each other and how the ML algorithms use those features [4]
- {step} 3.2: model selection*
  - choose one or more modeling algorithms that address problem’s characteristics the best
- {step} 3.3: model training
  - involves the following steps:
    - split the input data into training and test datasets
    - build the models by using the training dataset
    - evaluate the training and the test data set
    - determine the optimal setup and methods
- {step} 3.4: model evaluation
  - evaluate the performance of the model(s)
- {step} 3.5: feasibility analysis*
  - evaluate the readiness of the models for use into production, respectively on whether they fulfill project’s objectives
- {tool} feature sets
  - describe the features developed for the modeling and how they were generated
  - contains pointers to the code used to generate the features
- {tool} model report
  - a standard, template-based report that provides details on each experiment’s outcomes
  - created for each model tried
- {tool} checkpoint decision
- {tool} model performance metrics
  - e.g. ROC curves or MSE
{phase} 4: deployment
- {goal} deploy the models and the data pipelines to the environment used for final user acceptance
- {step} 4.1: operationalize architecture
  - prepare the models and data pipelines for use into production
  - {best practice} expose the models over an open API interface
    - enables models’ consumption from various applications
  - {best practice} build telemetry and monitoring into the models and the data pipelines [5]
    - helps in monitoring and troubleshooting [5]
- {step} 4.2: deploy solution*
  - deploy the architecture into production
- {tool} status dashboard
  - displays data on system’s health and key metrics
- {tool} model report
  - the report in its final form with deployment information
- {tool} solution architecture
  - the document in its final form
{phase} 5: customer acceptance
- {goal} confirm that project’s objectives were fulfilled and get customer’s acceptance
- {step} 5.1: system validation
  - validate system’s performance and outcomes and confirm that it fulfills customer’s needs
- {step} 5.2: project signoff*
  - finalize and review documentation
  - handover the solution and afferent documentation to customer
  - evaluate the project against the defined objectives and get customer’ signoff
- {tool} exit report
- {tool} technical report
  - contains all the details of the project that are useful for learning about how to operate the system [6]

Acronyms:

Artificial Intelligence (AI)

Cross-Industry Standard Process for Data Mining (CRISP-DM)

Data Mining (DM)

Knowledge Discovery in Databases (KDD)

Team Data Science Process (TDSP)

Version Control System (VCS)

Visual Studio Team Services (VSTS)

Resources:

[1] Microsoft Azure (2020) What is the Team Data Science Process? [source]

[2] Microsoft Azure (2020) The business understanding stage of the Team Data Science Process lifecycle [source]

[3] Microsoft Azure (2020) Data acquisition and understanding stage of the Team Data Science Process [source]

[4] Microsoft Azure (2020) Modeling stage of the Team Data Science Process lifecycle [source]

[5] Microsoft Azure (2020) Deployment stage of the Team Data Science Process lifecycle [source]

[6] Microsoft Azure (2020) Customer acceptance stage of the Team Data Science Process lifecycle [source]

23 November 2018

🔭Data Science: Missing Data (Just the Quotes)

"Place little faith in an average or a graph or a trend when those important figures are missing." (Darell Huff, "How to Lie with Statistics", 1954)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The progress of science requires more than new data; it needs novel frameworks and contexts. And where do these fundamentally new views of the world arise? They are not simply discovered by pure observation; they require new modes of thought. And where can we find them, if old modes do not even include the right metaphors? The nature of true genius must lie in the elusive capacity to construct these new modes from apparent darkness. The basic chanciness and unpredictability of science must also reside in the inherent difficulty of such a task." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 1985)

"We often think, naïvely, that missing data are the primary impediments to intellectual progress - just find the right facts and all problems will dissipate. But barriers are often deeper and more abstract in thought. We must have access to the right metaphor, not only to the requisite information. Revolutionary thinkers are not, primarily, gatherers of facts, but weavers of new intellectual structures." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 1985)

"[...] as the planning process proceeds to a specific financial or marketing state, it is usually discovered that a considerable body of 'numbers' is missing, but needed numbers for which there has been no regular system of collection and reporting; numbers that must be collected outside the firm in some cases. This serendipity usually pays off in a much better management information system in the form of reports which will be collected and reviewed routinely." (William H. Franklin Jr., Financial Strategies, 1987)

"We have found that some of the hardest errors to detect by traditional methods are unsuspected gaps in the data collection (we usually discovered them serendipitously in the course of graphical checking)." (Peter Huber, "Huge data sets", Compstat ’94: Proceedings, 1994)

"Unfortunately, just collecting the data in one place and making it easily available isn’t enough. When operational data from transactions is loaded into the data warehouse, it often contains missing or inaccurate data. How good or bad the data is a function of the amount of input checking done in the application that generates the transaction. Unfortunately, many deployed applications are less than stellar when it comes to validating the inputs. To overcome this problem, the operational data must go through a 'cleansing' process, which takes care of missing or out-of-range values. If this cleansing step is not done before the data is loaded into the data warehouse, it will have to be performed repeatedly whenever that data is used in a data mining operation." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"If you have only a small proportion of cases with missing data, you can simply throw out those cases for purposes of estimation; if you want to make predictions for cases with missing inputs, you don’t have the option of throwing those cases out." (Warren S Sarle, "Prediction with missing inputs", 1998)

"Every statistical analysis is an interpretation of the data, and missingness affects the interpretation. The challenge is that when the reasons for the missingness cannot be determined there is basically no way to make appropriate statistical adjustments. Sensitivity analyses are designed to model and explore a reasonable range of explanations in order to assess the robustness of the results." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The best rule is: Don't have any missing data, Unfortunately, that is unrealistic. Therefore, plan for missing data and develop strategies to account for them. Do this before starting the study. The strategy should state explicitly how the type of missingness will be examined, how it will be handled, and how the sensitivity of the results to the missing data will be assessed." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"A sin of omission – leaving something out – is a strong one and not always recognized; itʼs hard to ask for something you donʼt know is missing. When looking into the data, even before it is graphed and charted, there is potential for abuse. Simply not having all the data or the correct data before telling your story can cause problems and unhappy endings." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Quality without science and research is absurd. You can't make inferences that something works when you have 60 percent missing data." (Peter Pronovost, "Safe Patients, Smart Hospitals", 2010)

"The only thing we know for sure about a missing data point is that it is not there, and there is nothing that the magic of statistics can do change that. The best that can be managed is to estimate the extent to which missing data have influenced the inferences we wish to draw." (Howard Wainer, "14 Conversations About Three Things", Journal of Educational and Behavioral Statistics Vol. 35(1, 2010)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Accuracy and coherence are related concepts pertaining to data quality. Accuracy refers to the comprehensiveness or extent of missing data, performance of error edits, and other quality assurance strategies. Coherence is the degree to which data - item value and meaning are consistent over time and are comparable to similar variables from other routinely used data sources." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"How good the data quality is can be looked at both subjectively and objectively. The subjective component is based on the experience and needs of the stakeholders and can differ by who is being asked to judge it. For example, the data managers may see the data quality as excellent, but consumers may disagree. One way to assess it is to construct a survey for stakeholders and ask them about their perception of the data via a questionnaire. The other component of data quality is objective. Measuring the percentage of missing data elements, the degree of consistency between records, how quickly data can be retrieved on request, and the percentage of incorrect matches on identifiers (same identifier, different social security number, gender, date of birth) are some examples." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"When we find data quality issues due to valid data during data exploration, we should note these issues in a data quality plan for potential handling later in the project. The most common issues in this regard are missing values and outliers, which are both examples of noise in the data." (John D. Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Unless we’re collecting data ourselves, there’s a limit to how much we can do to combat the problem of missing data. But we can and should remember to ask who or what might be missing from the data we’re being told about. Some missing numbers are obvious […]. Other omissions show up only when we take a close look at the claim in question." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"[Making reasoned macro calls] starts with having the best and longest-time-series data you can find. You may have to take some risks in terms of the quality of data sources, but it amazes me how people are often more willing to act based on little or no data than to use data that is a challenge to assemble." (Robert J Shiller)

11 February 2017

⛏️Data Management: Data Collection (Definitions)

"The gathering of information through focus groups, interviews, surveys, and research as required to develop a strategic plan." (Teri Lund & Susan Barksdale, "10 Steps to Successful Strategic Planning", 2006)

"The process of gathering raw or primary specific data from a single source or from multiple sources." (Adrian Stoica et al, "Field Evaluation of Collaborative Mobile Applications", 2008)

"A combination of human activities and computer processes that get data from sources into files. It gets the file data using empirical methods such as questionnaire, interview, observation, or experiment." (Jens Mende, "Data Flow Diagram Use to Plan Empirical Research Projects", 2009)

"A systematic process of gathering and measuring information about the phenomena of interest." (Kaisa Malinen et al, "Mobile Diary Methods in Studying Daily Family Life", 2015)

"The process of capturing events in a computer system. The result of a data collection operation is a log record. The term logging is often used as a synonym for data collection." (Ulf Larson et al, "Guidance for Selecting Data Collection Mechanisms for Intrusion Detection", 2015)

"This refers to the various approaches used to collect information." (Ken Sylvester, "Negotiating in the Leadership Zone", 2015)

"Set of techniques that allow gathering and measuring information on certain variables of interest." (Sara Eloy et al, "Digital Technologies in Architecture and Engineering: Exploring an Engaged Interaction within Curricula", 2016)

"with respect to research, data collection is the recording of data for the purposes of a study. Data collection for a study may or may not be the original recording of the data." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"The process of retrieving data from different sources and storing them in a unique location for further use." (Deborah Agostino et al, "Social Media Data Into Performance Measurement Systems: Methodologies, Opportunities, and Risks", 2018)

"It is the process of gathering data from a variety of relevant sources in an established systematic fashion for analysis purposes." (Yassine Maleh et al, 'Strategic IT Governance and Performance Frameworks in Large Organizations", 2019)

"A process of storing and managing data." (Neha Garg & Kamlesh Sharma, "Machine Learning in Text Analysis", 2020)

"The process and techniques for collecting the information for a research project." (Tiffany J Cresswell-Yeager & Raymond J Bandlow, "Transformation of the Dissertation: From an End-of-Program Destination to a Program-Embedded Process", 2020)

"The method of collecting and evaluating data on selected variables, which helps in analyzing and answering relevant questions is known as data collection." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)

"Datasets are created by collecting data in different ways: from manual or automatic measurements (e.g. weather data), surveys (census data), records of decisions (budget data) or ongoing transactions (spending data), aggregation of many records (crime data), mathematical modelling (population projections), etc." (Open Data Handbook)

14 August 2011

📈Graphical Representation: Data Flow Diagram (Definitions)

"A diagram that shows the data flows in an organization, including sources of data, where data are stored, and processes that transform data." (Jan L Harrington, "Relational Database Dessign: Clearly Explained" 2nd Ed., 2002)

"A diagram of the data flow from sources through processes and files to users. A source or user is represented by a square; a data file is represented by rectangles with missing righthand edges; a process is represented by a circle or rounded rectangle; and a data flow is represented by an arrow." (Jens Mende, "Data Flow Diagram Use to Plan Empirical Research Projects", 2009)

"A diagram used in functional analysis which specifies the functions of the system, the inputs/outputs from/to external (user) entities, and the data being retrieved from or updating data stores. There are well-defined rules for specifying correct DFDs, as well as for creating hierarchies of interrelated DFDs." (Peretz Shoval & Judith Kabeli, "Functional and Object-Oriented Methodology for Analysis and Design", 2009)

[Control Data Flow Graph (CDFG):] " Represents the control flow and the data dependencies in a program." (Alexander Dreweke et al, "Text Mining in Program Code", 2009)

"A graphic method for documenting the flow of data within an organization." (Jan L Harrington, "Relational Database Design and Implementation: Clearly explained" 3rd Ed., 2009)

"A graphic representation of the interactions between different processes in an organization in terms of data flow communications among them. This may be a physical data flow diagram that describes processes and flows in terms of the mechanisms involved, a logical data flow diagram that is without any representation of the mechansm, or an essential data flow diagram that is a logical data flow diagram organized in terms of the processes that respond to each external event." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Data-flow diagrams (DFDs) are system models that show a functional perspective where each transformation represents a single function or process. DFDs are used to show how data flows through a sequence of processing steps." (Ian Sommerville, "Software Engineering" 9th Ed., 2011)

"A model of the system that shows the system’s processes, the data that flow between them (hence the name), and the data stores used by the processes. The data flow diagram shows the system as a network of processes, and is thought to be the most easily recognized of all the analysis models." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"A picture of the movement of data between external entities and the processes and data stores within a system." (Jeffrey A Hoffer et al, "Modern Systems Analysis and Design" 7th Ed., 2014)

"A schematic indicating the direction of the movement of data" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A Data Flow Diagram (DFD) is a graphical representation of the 'flow' of data through an information system, modeling its process aspects. Often it is a preliminary step used to create an overview of the system that can later be elaborated." (Henrikvon Scheel et al, "Process Concept Evolution", 2015)

"Data flow maps are tools that graphically represent the results of a comprehensive data assessment to illustrate what information comes into an organization, for what purposes that information is used, and who has access to that information." (James R Kalyvas & Michael R Overly, "Big Data: A Business and Legal Guide", 2015)

"A graphical representation of the logical or conceptual movement of data within an existing or planned system." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"a visual depiction using standard symbols and conventions of the sources of, movement of, operations on, and storage of data." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"A data-flow diagram is a way of representing a flow of data through a process or a system (usually an information system). The DFD also provides information about the outputs and inputs of each entity and the process itself." (Wikipedia) [source]

"A graphical representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction." (IQBBA)

23 April 2011

🪄SSRS (& Paginated Reports): First Steps (Part I: Wizarding a Report)

Introduction

One year back I started a set of posts on SSIS showing how to create a package with ‘SQL Server Import and Export Wizard’ using as source a SQL Server, respectively Oracle database, in a third post showing how to use the Data Flow Task. The “wizarding” thematic was based on the fact the respective posts were showing how to make use of the wizards built within SQL Server and Visual Studio in order to create a basic export package. Therefore the posts were targeting mainly beginners, my intention at that time was to use them as a starting point in showing various objects and techniques. I also intended to start a set of similar posts on SSRS, so here I am, in this first post showing how to use the Report Wizard to create a report in BIDS (SQL Server Business Intelligence Development Studio) using a SQL Server database.

This short tutorial uses a query based on AdventureWorks2008 sample database and considers that your Reporting Services instance is installed on the same computer with the database. For simplicity, I focused on the minimal information required in order to built a simple report, following to cover some of themes in detail in other posts. The tutorial can be used together with the information provided in MSDN, see Creating a Report Using Report Wizard and Report Layout How-to Topics sections.

Step 1: Create the Query

Before creating a report of any type or platform, it’s recommended to create and stabilize the query on which the report is based. For simplification let’s use a query based on Sales.vIndividualCustomer view:

-- Customer Addresses 
SELECT SIC.Title  
, SIC.FirstName  
, SIC.LastName  
, SIC.AddressLine1  
, SIC.City  
, SIC.PostalCode  
, SIC.CountryRegionName  
, SIC.PhoneNumber  
, SIC.EmailAddress  
FROM Sales.vIndividualCustomer SIC

Step 2: Create the Project

Launch the BIDS from Windows Menu, create a new Project (File/New/Project) by using the “Report Server Project Wizard” Template, give the Project an appropriate Name and other related information.

This will open the Report Wizard, and unless you have chosen previously not to show this step, it will appear a tab in which are shown the step that will be performed:
- Select a data source from which to retrieve data
- Design a query to execute against the data source
- Choose the type of report you want to create
- Specify the basic layout of the report
- Specify the formatting for the report
- Select the report type, choose tabular

Step 3: Create/select the Data Source

Creating a Data Source pointing to the local server isn’t complicated at all, all you have to do is to give your data source a meaningful name (e.g. Adventure Works) and then define connection’s properties by selecting the “Server name” and database’s name, in this case AdventureWorks2008.

Test the connection, just to be sure that everything works. In the end your data source might look like this:

You can define your data source as shared by clicking the “Make this a shared data source” checkbox, allowing you thus to reuse the respective data source between several projects.

Step 4: Provide the Query

Because the query for our report was created beforehand, is enough to copy it in “Query string” textbox. We could have used the “Query Builder” to built the query, though a tool like SSMS (SQL Server Management Studio) can be considered a better choice for query design. As pointed above, it’s recommended to built the query and stabilize its logic before starting the work on the actual report.

Step 5: Generate the Report

Creating a report supposes choosing a Report Type (Tabular vs. Matrix), the level at which the fields will be displayed (page, group or details), and the Table Style. For this tutorial choose a Tabular type and, as no grouping is needed, in “Design the Table” step choose all the Available fields and drag-and-drop them in “Details”, the bottommost list from “Displayed fields” section.

In Choose the Table Style go with the first option (e.g. Slate), though it’s up to you which one of the styles you prefer.

Step 6: Deploy the Report

You can deploy the report if your report server is already configured, however in order to test the report you don’t have to go that far because you can test the report directly in BIDS. So you can go with the actual settings:

SSRS Tutorial Choose the Deployment Location

In the last step provide a meaningful name for the Report (e.g. Customer Addresses) and here is the report in Design mode:

By clicking on “Preview” tab you can see how the actual report will look like:

Now I would recommend you to check what objects the wizard has created and what properties are used. Just play with the layout, observe the behavior and what the documentation says. There are plenty of tutorials on the web, so don't be afraid to search further.

Happy coding!

Previous Post <<||>> Next Post

20 March 2009

🛢DBMS: Data Source (Definitions)

"The source of data for an object such as a cube or a dimension. Also, the specification of the information necessary to access source data. Sometimes refers to a DataSource object." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A repository for storing data. An ODBC/JDBC term." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A file that contains the connection string that Analysis Services uses to connect to the database that hosts the data as well as any necessary authentication credentials." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"A system or application that generates data for use by another system or by an end user. The data source may also be the system of origin for the data." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"An information store that can be connected to by various SQL Server technologies such as SQL Server Reporting Services for data retrieval." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"An entity or group of entities from which data can be collected. The entities may be people, objects, or processes." (Jens Mende, "Data Flow Diagram Use to Plan Empirical Research Projects", 2009)

"An object containing information about the location of data. The data source leverages a connection string." (Jim Joseph et al, "Microsoft® SQL Server™ 2008 Reporting Services Unleashed", 2009)

"A repository of data to which a federated server can connect and then retrieve data by using wrappers. A data source can contain relational databases, XML files, Excel spreadsheets, table-structured files, or other objects. In a federated system, data sources seem to be a single collective database." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

SQL Troubles

Pages