Showing posts with label availability. Show all posts
Showing posts with label availability. Show all posts

07 December 2024

🏭 💠Data Warehousing: Microsoft Fabric (Part IV: SQL Databases for OLTP scenarios) [new feature]

Data Warehousing Series
Data Warehousing Series

One interesting announcements at Ignite is the availability in public preview of SQL databases in Microsoft Fabric, "a versatile and developer-friendly transactional database built on the foundation of Azure SQL database". With this Fabric can address besides OLAP also OLTP scenarios, evolving thus from analytics to a data platform [1]. According to the announcement, besides the AI-optimized architectural aspects, the feature makes the SQL Azure simple, autonomous and secure by design [1], and these latest aspects are considered in this post. 

Simplicity revolves around the deployment and configuration of databases, the creation of a new database requiring giving a name and the database is created in seconds [1]. It’s a considerable improvement compared with the relatively complex setup needed for on-premise configurations, though sometimes more flexibility in configuration is needed upfront or over database’s lifetime. To get a database ready for testing one can import a sample database or get specific data via data flows and/or pipelines [1]. As development tools one can use Visual Studio Code or SSMS [1], and probably more tools will be available in time.

The integration with both GitHub and Azure DevOps allows to configure each database under source control, which is needed for many scenarios especially when multiple resources make changes to the database objects [1]. Frankly, that’s mainly important during the development phase, respectively in scenarios in which multiple people make in parallel changes to the logic. It will be interesting to see how much overhead or challenges the feature adds to development and how smoothly everything works together!

The most important aspect for many solutions is the replication of data in near-real time to the (open-source) delta parquet format in OneLake and thus making the data available for analytics almost immediately [1]. Probably, from this aspect many cloud-based applications can benefit, even if the performance might not be as good as in other well-established architectures. However, there are many other scenarios in which one needs to maintain and use data for OLTP/OLAP purposes. This invites adequate testing and a good weighting of the advantages and disadvantages involved. 

A SQL database is a native item in Fabric, and therefore it utilizes Fabric capacity units like other Fabric workloads [1]. One can use the Fabric SKU estimator (still in private preview) to estimate the costs [2], though it will be interesting to see how cost-effective the solutions are. Probably, especially when the infrastructure is already available outside of Fabric, it will be easier and cost-effective to use the mirroring functionality. One should test and have a better estimator before moving blindly from the existing infrastructure to Fabric. 

SQL databases in Fabric are autonomous by design, while allowing to get the best performance and availability by default [1]. High availability is reached through zone redundancy, while performance is achieved by scaling automatically the storage and compute to accommodate the workloads [1]. The auto-optimization capability is achieved with the help of the latest Intelligent Query Processing (IQP) enhancements, respectively the creation of missing indexes to improve query performance [1]. It will be interesting to see how the whole process works, given that the maintenance of indexes usually involves some challenges (e.g. identifying covering indexes, indexes needed only for temporary workloads, duplicated indexes).

SQL databases in Fabric are automatically configured for high availability with zone redundancy, while storage and compute scale automatically to accommodate the user workload [1]. The database is auto-optimized through the latest IQP enhancements while the system creates any missing indexes to improve query performance. All data is replicated to OneLake by default [1]. Finally, the database always receives the latest security updates with auto-patching, while automatic backups help in disaster recovery scenarios  [1], which can be of real help for database administrators. 

References:

[1] Microsoft Fabric Updates Blog (2024) Announcing SQL database in Microsoft Fabric Public Preview [link

[2] Microsoft Fabric Updates Blog (2024) Announcing New Recruitment for the Private Preview of Microsoft Fabric SKU Estimator [link]


26 August 2019

🛡️Information Security: Denial of Service [DoS] (Definitions)

"A type of attack on a computer system that ties up critical system resources, making the system temporarily unusable." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Any attack that affects the availability of a service. Reliability bugs that cause a service to crash or hang are usually potential denial-of-service problems." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"This is a technique for overloading an IT system with a malicious workload, effectively preventing its regular service use." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Occurs when a server or Web site receives a flood of traffic - much more traffic or requests for service than it can handle, causing it to crash." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Causing an information resource to be partially or completely unable to process requests. This is usually accomplished by flooding the resource with more requests than it can handle, thereby rendering it incapable of providing normal levels of service." (Mark Rhodes-Ousley, "Information Security: The Complete Reference, Second Edition" 2nd Ed., 2013)

"Attacks designed to disable a resource such as a server, network, or any other service provided by the company. If the attack is successful, the resource is no longer available to legitimate users." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"An attack from a single attacker designed to disrupt or disable the services provided by an IT system. Compare to distributed denial of service (DDoS)." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"A coordinated attack in which the target website or service is flooded with requests for access, to the point that it is completely overwhelmed." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"An attack that can result in decreased availability of the targeted system." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"An attack that generally floods a network with traffic. A successful DoS attack renders the network unusable and effectively stops the victim organization’s ability to conduct business." (Weiss, "Auditing IT Infrastructures for Compliance" 2nd Ed., 2015)

"A type of cyberattack to degrade the availability of a target system." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"Any action, or series of actions, that prevents a system, or its resources, from functioning in accordance with its intended purpose." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"The prevention of authorized access to resources or the delaying of time-critical operations." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"An attack shutting down running of a service or network in order to render it inaccessible to its users (whether human person or a processing device)." (Wissam Abbass et al, "Internet of Things Application for Intelligent Cities: Security Risk Assessment Challenges", 2021)

"Actions that prevent the NE from functioning in accordance with its intended purpose. A piece of equipment or entity may be rendered inoperable or forced to operate in a degraded state; operations that depend on timeliness may be delayed." (NIST SP 800-13)

"The prevention of authorized access to resources or the delaying of time-critical operations. (Time-critical may be milliseconds or it may be hours, depending upon the service provided)." (NIST SP 800-12 Rev. 1)

"The prevention of authorized access to a system resource or the delaying of system operations and functions." (NIST SP 800-82 Rev. 2)


27 April 2017

⛏️Data Management: Availability (Definitions)

"Corresponds to the information that should be available when necessary and in the appropriate format." (José M Gaivéo, "Security of ICTs Supporting Healthcare Activities", 2013)

"A property by which the data is available all the time during the business hours. In cloud computing domain, the data availability by the cloud service provider holds a crucial importance." (Sumit Jaiswal et al, "Security Challenges in Cloud Computing", 2015) 

"Availability: the ability of the data user to access the data at the desired point in time." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"It is one of the main aspects of the information security. It means data should be available to its legitimate user all the time whenever it is requested by them. To guarantee availability data is replicated at various nodes in the network. Data must be reliably available." (Omkar Badve et al, "Reviewing the Security Features in Contemporary Security Policies and Models for Multiple Platforms", 2016)

"Timely, reliable access to data and information services for authorized users." (Maurice Dawson et al, "Battlefield Cyberspace: Exploitation of Hyperconnectivity and Internet of Things", 2017)

"A set of principles and metrics that assures the reliability and constant access to data for the authorized individuals or groups." (Gordana Gardašević et al, "Cybersecurity of Industrial Internet of Things", 2020)

"Ensuring the conditions necessary for easy retrieval and use of information and system resources, whenever necessary, with strict conditions of confidentiality and integrity." (Alina Stanciu et al, "Cyberaccounting for the Leaders of the Future", 2020)

"The state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user." (CODATA)

"The state that exists when data can be accessed or a requested service provided within an acceptable period of time." (NISTIR 4734)

"Timely, reliable access to information by authorized entities." (NIST SP 800-57 Part 1)

14 March 2017

⛏️Data Management: Data Protection (Definitions)

"The protecting of data from damage, destruction, and unauthorized alteration." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Deals with issues such as data security, privacy, and availability. Data protection controls are required by regulations and industry mandates such as Sarbanes-Oxley, European Data Protection Law, and others." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"A set of rules that aim to protect the rights, freedoms and interests of individuals when information related to them is being processed." (Maria Tzanou, "Data Protection in EU Law after Lisbon: Challenges, Developments, and Limitations", 2015)

"An umbrella term for various procedures that ensure information is secure and available only to authorized users." (Peter Sasvari & Zoltán Nagymate, "The Empirical Analysis of Cloud Computing Services among the Hungarian Enterprises", 2015)

"Protection of the data against unauthorized access by third parties as well as protection of personal data (such as customer data) in the processing of data according to the applicable legal provisions." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Legal control over access to, and use of, data in computers." (Lucy Self & Petros Chamakiotis, "Understanding Cloud Computing in a Higher Education Context", 2018)

"Data protection is a task of safeguarding personal or sensitive data which are complex and widely distributed." (M Fevzi Esen & Eda Kocabas, "Personal Data Privacy and Protection in the Meeting, Incentive, Convention, and Exhibition (MICE) Industry", 2019)

"Process of protecting important information from corruption, compromise, or loss." (Patrícia C T Gonçalves, "Medical Social Networks, Epidemiology and Health Systems", 2021)

"The process involving use of laws to protect data of individuals from unauthorized disclosure or access." (Frank Makoza, "Learning From Abroad on SIM Card Registration Policy: The Case of Malawi", 2019)

"Is the process in information and communication technology that deals with the ability an organization or individual to safeguard data and information from corruption, theft, compromise, or loss." (Valerianus Hashiyana et al, "Integrated Big Data E-Healthcare Solutions to a Fragmented Health Information System in Namibia", 2021)

"The mechanisms with which an organization enables individuals to retain control of the personal data they willingly share, where security provides policies, controls, protocols, and technologies necessary to fulfill rules and obligations in accordance with privacy regulations, industry standards, and the organization's ethics and social responsibility." (Forrester)

20 August 2009

🛢DBMS: Clustering (Definitions)

 "Technology that enables you to create a hot spare. That is a server that is actually running and can take over immediately. This technology enables you to mirror an entire server to another computer." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"The use of multiple computers to provide increased reliability, capacity, and management capabilities." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

[federated cluster:] "A grouping of SQL servers used together to achieve scalability by employing a distributed partition view. A federated cluster is not used for availability, only for achieving scalability through scale out." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"Any collection of distinct computers that are connected and used as a parallel computer, or to form a redundant system for higher availability. The computers in a cluster are not specialized to cluster computing and could, in principle, be used in isolation as standalone computers. In other words, the components making up the cluster, both the computers and the networks connecting them, are not custom-built for use in the cluster." (Beverly A Sanders, "Patterns for Parallel Programming", 2004)

"Connecting two or more computers in such a way that they behave like a single computer to an application or client. Clustering is used for parallel processing, load balancing, and fault tolerance." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"A method of keeping database files physically close to one another on the storage media for improving performance through sequential pre-fetch operations." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"(1) The condition whereby data is physically ordered contiguously by a specified key (usually implemented by means of an index). (2) The use of multiple, 'independent' computing systems working together to form what appears to users as a single highly available system." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed, 2012)

"The tendency of elements to become unevenly distributed in the hash table, with many adjacent locations containing elements" (Nell Dale et al, "Object-Oriented Data Structures Using Java 4th Ed.", 2016)

09 August 2009

🛢DBMS: NoSQL (Definitions)

"An umbrella term for non-relational data stores, hence the name. These stores sacrifice ACID transactions for greater scalability and availability." (Dean Wampler, "Functional Programming for Java Developers", 2011)

"A set of technologies that created a broad array of database management systems that are distinct from relational database systems. One major difference is that SQL is not used as the primary query language. These database management systems are also designed for distributed data stores." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A class of database management systems that consist of non-relational, distributed data stores. These systems are optimized for supporting the storage and retrieval requirements of massive-scale data-intensive applications." (IBM, "Informix Servers 12.1", 2014)

"A database that doesn’t adhere to relational database structures. Used to organize and query unstructured data." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"Any of a class of database management systems that reject the limitations and drawbacks dictated by, or associated with, the relational model. NoSQL products tend to specialize in a single or limited number of areas, such as high-performance processing, big data (giga-record systems), diverse data types (video, pictures, mathematical models), documents, and so on. Their specialized focus often requires deemphasizing other areas such as data consistency and backup and recovery." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"In general, NoSQL databases provide a mechanism for storage and retrieval of data modeled in means other than the tabular relations used in relational databases." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"NoSQL means 'not only SQL' or 'no SQL at all'. Being a new type of non-relational databases, NoSQL databases are developed for efficient and scalable management of big data." (Zongmin Ma & Li Yan, "Towards Massive RDF Storage in NoSQL Databases: A Survey", 2019)

"A broad term for a set of data access technologies that do not use the SQL language as their primary mechanism for reading and writing data. Some NoSQL technologies act as key-value stores, only accepting single-value reads and writes; some relax the restrictions of the ACID methodology; still others do not require a pre-planned schema." (MySQL, "MySQL 8.0 Reference Manual Glossary")

"A NoSQL database is distinguished mainly by what it is not - it is not a structured relational database format that links multiple separate tables. NoSQL stands for 'not only SQL', meaning that SQL, or structured query language is not needed to extract and organize information. NoSQL databases tend to be more diverse and flatter than relational databases (in a flat database, all data is contained in the same, large table)." (Statistics.com)

"NoSQL is a database management system built for the complexities of working with Big Data. Unlike SQL, NoSQL does not store data in a relational format." (Xplenty) [source]

"No-SQL (aka not only SQL) database systems are distributed, non-relational databases designed for large-scale data storage and for massively-parallel data processing across a large number of commodity servers." (IBM) 

"NoSQL is short for 'not only SQL'. NoSQL databases include mechanisms for storage and retrieval of data based on means other than the tabular relations used in relational databases." (Idera) [source]

"sometimes referred to as ‘Not only SQL’ as it is a database that doesn’t adhere to traditional relational database structures. It is more consistent and can achieve higher availability and horizontal scaling." (Analytics Insight)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.