SQL Troubles: OData

Showing posts with label OData. Show all posts

03 February 2021

📦Data Migrations (DM): Conceptualization (Part IV: Data Access)

Data Migrations Series

Once the data sources for a Data Migration (DM) were identified the first question is how the data can be accessed. The legacy systems relying on ODBC-based databases are in theory relatively easy to access as long they allow the direct access to their data, which would enable thus a pull strategy. Despite this, there are organizations that don’t allow the direct access to the data even for read-only operations, being preferred to push the data directly to the consumers (aka push strategy) or push the data to a given location from where the consumer can use the data as needed (aka hybrid strategy).

The direct access to the data allows in theory the best flexibility as the solution can extract the data when needed and this especially important during the initial phases of the project when the data need to be pulled more frequently until the requirements and logic is stabilized. A push strategy tends to add additional overhead as usually somebody else oversees the data exports, respectively the data need to be prepared in the expected format. On the other side, it would make sense to make an exception for a DM and allow the direct access to the data.

Hybrid strategies tend to be more complex and require additional resources or overhead as the data are stored temporarily at a separate location. Unfortunately, in certain scenarios this is the only approach can be used. Are preferred data files that allow keeping the integrity of the data and facilitate data consumption. Therefore, tabular text files or JSON files are preferred in the detriment of XML or Excel files. It’s preferable to export one data structure individually then storing parent-child solutions even if the latter can prove to be useful in certain scenarios. When there’s no other solution one can use also the standard reports available in the legacy systems.

When storing data outside the legacy systems for further processing it’s recommended to follow organization’s best practices, respectively to address the data security and privacy requirements. ETL tools allows accessing data from password protected areas like FTP, OneDrive or SharePoint. The fewer security layers in between the lower is in theory the overhead. Therefore, given its stability and simplicity FTP might prove to be a better storage solution than OneDrive, SharePoint or other similar technologies.

Ideally the extraction/export mechanisms should use the database objects that encapsulate already the logic in the legacy systems otherwise the team will need to reengineer the logic – for master data this can prove to be easy, though the logic of transactional data like on-hand or open invoices can be relatively complex to reengineer. Otherwise, the logic can be implemented directly in the extraction/export mechanisms or sometimes is more advisable to create database objects (usually on a different schema) on the legacy systems and just call the respective objects.

When connecting directly to the data source it’s advisable using the data provider which allows the best performance and flexibility, however several tests might be needed to determine the best fit. It would be useful to check the limitations of each provider and find a stable driver version. OLEDB and ADO.Net data providers provide in general a good performance, though native drivers of the legacy systems can be a better option upon case.

Some legacy systems allow the access to their data only via service-based technologies like OData. OData tends to have poor performance for large data exports than standard access methods and therefore not indicated in such scenarios. In such cases might be a good idea to export the data directly from the legacy system.

Previous Post <<||>> Next Post

11 June 2020

🧭🪄☯Business Intelligence: SQL Server Reporting Services (The Good, the Bad and the Ugly)

SQL Server Reporting Services (SSRS) is the oldest solution from the modern Microsoft BI stack. Released as add-on to SQL Server 2000, it allows covering most of an organization's reporting requirements, either if we talk about tables, matrices or crosstab displays, raw data, aggregations, KPIs or visualizations like charts, gauges, sparklines, tree maps or sunbursts.

The Good: Once you have a SQL query based on any standard data sources (SQL Server, Oracle, SharePoint, OData, XML, etc.), it can be used in just a few minutes to create a report with the help of a wizard. Sure, adding the needed formatting, parameters, custom code, drilldown and drill-through functionality might take some effort, though in less than an hour you have a running report. The use of templates and a custom branding allows providing a common experience across the enterprise.

The whole service is available once you have a SQL Server license, fact that makes from the SSRS a cost-effective tool. The shallow learning curve and the integration with SharePoint facilitates the development and consumption of reports.

With its pixel-accurate display of data, SSRS is ideal for printing business documents. This was probably one of the reasons why SSRS become with Microsoft Dynamics AX 2009 also the main reporting platform for the further versions. One can use an AX 2009 class as source for the report, or directly use the base tables, which can increase reports’ performance in the detriment of reengineering the logic from AX 2009. With a few exceptions in finance area the reporting logic is easy to build.

With SQL Server 2016 it got a HTML5 rendering engine, while with SSRS 2017 it supports a responsive web design. The integration of the SSRS and Power BI environments has the chance to further extend the value provided by this powerful combination, however it depends also in which direction Microsoft will develop this idea.

The Bad: One of the important downsides of SSRS is that it doesn’t allow custom authentication. Even if some examples exist on the Web, it’s hard to understand Microsoft’s stubbornness of not providing this by design.

Because SSRS still uses an older MS Office driver, it allows exporting only 65536 records to Excel, fact that makes data consumption more complicated. In addition, the pixel-perfect isn’t that perfect, the introduction of empty columns when exporting to Excel, adds some unnecessary burden.

In total, the progress made by SSRS between the various releases is small when compared with the changes suffered by SQL Server. Even if the visualization capabilities cover most of the requests, it loses field when compared with Power BI and similar visualization tools.

The Ugly: SSRS, as the typical BI developer knows it, is different than the architecture frameworks provided when working with Business Central, respectively Dynamics 365 and CRM. Even if there are maybe entitled reasons, Microsoft failed to unite the three architectures into a flexible solution. Almost all the examples available on the Web target CRM, and frankly it’s hard to understand that. It feels like Microsoft wants to sabotage their own product?! What’s hard to understand is that besides SSRS and Power BI Microsoft has several other reporting tools for Dynamics 365. Building reports for Business Central or Dynamics 365 requires certain skills, while the development time increased considerably, thus SSRS losing from the appeal it previously had, allowing other tools to join the landscape (e.g. electronic documents).

SSRS can’t be smoothly integrated with Office 365 Online, remaining mainly a solution for on-premise architectures. This can become a bottleneck when the customers move to the cloud, the BI strategy needing to be eventually rethought as well.

Previous Post <<||>> Next Post

24 May 2020

🧊🎡☯Data Warehousing: SQL Server Integration Services (The Good, the Bad and the Ugly)

Microsoft SQL Server Integration Services (SSIS) is a platform for building (enterprise-level) data integrations and data transformations solutions by using a rich set of built-in tasks and transformations, graphical tools for building packages, respectively a catalog for storing the packages. Formally called Data Transformation Services (DTS), it was introduced with SQL Server 2000 and with SQL Server 2005 it was rebranded as SSIS.

The Good: Since its introduction it was adopted by DBAs and (database) programmers because it allowed the import and export of data on the fly from and to SQL Server, flat files, other relational data sources, in fact any resource exposing a driver for ODBC or OLEDB libraries. The extract/load functionality was extended by a basic set of transformations, making from DTS the ideal ETL tool for data warehousing and integrations. The data from multiple sources and targets could be processed in parallel or sequentially, the ETL logic being encapsulated in one or more packages that could be run manually or scheduled via the SQL Server agent flexibly.

With SQL Server 2005 and further versions the SSIS framework was extended to support further data sources including XML, CAML-based SharePoint lists, OData, Hadoop or Azure Bloob. It allowed the storage of packages on the local storage or within the built-in catalog.

One could thus develop rich ETL functionality without writing a single line of code. In theory the packages could be run and modified also by non-IT users, which can be a plus in certain scenarios. On the other side one could build custom packages programmatically from the beginning, and thus extend the available data processing logic as seemed fit, being able to using existing code and whole libraries embedded into the packages or run via dlls calls .

The Bad: Despite the rich functionality, a data pipeline usually has a lower performance and is more difficult to troubleshoot compared with the built-in RDBMS functionality for data processing. Most, if not all transformations can be handled over SQL-based queries more efficiently as long the data are available on the same SQL Server instance. In addition, SQL provides better code reuse, maintainability, chances for refactoring, scalability and the solutions are easier to deploy. Therefore, one practice resumes in using SSIS only for import/export, the further logic being encapsulated into stored procedures and further database objects. This isn’t necessarily bad, on contrary, though specific expertise is needed then to modify the code.

The Ugly: SSIS is in general suitable for data warehousing and integrations solutions whose logic is ideally stable and well-defined. Therefore, SSIS is less suitable for ERP data migrations or similar task which at least at the beginning have an exploratory nature and an overwhelming complexity, multiple iterations being needed before the requirements were fully identified and understood. In extremis each iteration can involve a redesign, which can prove to be time-consuming. One could in theory attempt first understanding all the data, though this could mean starting the development late in the process, while the data for testing are required much earlier. One can still use SSIS for specific tasks, though implementing a whole solution could imply certain challenges that otherwise could have been avoided.

SSIS is not suitable for real-time complex data integrations which require the processing of a considerable amount of data, when specific architectures like SOA, Restful calls or other solution could be more efficient. When not adequately implemented a data integration can lead to more problems than it can solve. Best example is the increase in execution time with the volume of data, fact that can easily lead to time-outs and locking of data.