Showing posts with label OLEDB. Show all posts
Showing posts with label OLEDB. Show all posts

03 February 2021

📦Data Migrations (DM): Conceptualization (Part IV: Data Access)

Data Migration
Data Migrations Series

Once the data sources for a Data Migration (DM) were identified the first question is how the data can be accessed. The legacy systems relying on ODBC-based databases are in theory relatively easy to access as long they allow the direct access to their data, which would enable thus a pull strategy. Despite this, there are organizations that don’t allow the direct access to the data even for read-only operations, being preferred to push the data directly to the consumers (aka push strategy) or push the data to a given location from where the consumer can use the data as needed (aka hybrid strategy). 

The direct access to the data allows in theory the best flexibility as the solution can extract the data when needed and this especially important during the initial phases of the project when the data need to be pulled more frequently until the requirements and logic is stabilized. A push strategy tends to add additional overhead as usually somebody else oversees the data exports, respectively the data need to be prepared in the expected format. On the other side, it would make sense to make an exception for a DM and allow the direct access to the data. 

 Hybrid strategies tend to be more complex and require additional resources or overhead as the data are stored temporarily at a separate location. Unfortunately, in certain scenarios this is the only approach can be used. Are preferred data files that allow keeping the integrity of the data and facilitate data consumption. Therefore, tabular text files or JSON files are preferred in the detriment of XML or Excel files. It’s preferable to export one data structure individually then storing parent-child solutions even if the latter can prove to be useful in certain scenarios. When there’s no other solution one can use also the standard reports available in the legacy systems.

When storing data outside the legacy systems for further processing it’s recommended to follow organization’s best practices, respectively to address the data security and privacy requirements. ETL tools allows accessing data from password protected areas like FTP, OneDrive or SharePoint. The fewer security layers in between the lower is in theory the overhead. Therefore, given its stability and simplicity FTP might prove to be a better storage solution than OneDrive, SharePoint or other similar technologies.

Ideally the extraction/export mechanisms should use the database objects that encapsulate already the logic in the legacy systems otherwise the team will need to reengineer the logic – for master data this can prove to be easy, though the logic of transactional data like on-hand or open invoices can be relatively complex to reengineer. Otherwise, the logic can be implemented directly in the extraction/export mechanisms or sometimes is more advisable to create database objects (usually on a different schema) on the legacy systems and just call the respective objects. 

When connecting directly to the data source it’s advisable using the data provider which allows the best performance and flexibility, however several tests might be needed to determine the best fit. It would be useful to check the limitations of each provider and find a stable driver version.  OLEDB and ADO.Net data providers provide in general a good performance, though native drivers of the legacy systems can be a better option upon case. 

Some legacy systems allow the access to their data only via service-based technologies like OData. OData tends to have poor performance for large data exports than standard access methods and therefore not indicated in such scenarios. In such cases might be a good idea to export the data directly from the legacy system. 

Previous Post <<||>> Next Post

20 June 2010

🎡SSIS: Using Oracle as Data Source

Making SSIS work with Oracle doesn’t seem to be a complicated task especially after several years of experience in doing that, though, as usual, something new appears in the landscape – new software versions, new requirements, an environment with its own particularities, etc. In general, when an application needs a connection to an Oracle server, is needed to install on the client computer several components that come with Oracle Client, add/configure the TNS (Transparent Network Substrate) name, eventually set-up some global variables in case the components were not installed in the default location, and depending on the chosen provider might be needed to configure also a DSN (Data Source Name) pointing to the Oracle server. SSIS makes actually no exception from these steps, once the steps performed you should be in theory ready to develop, test, deploy, schedule and run packages using a connection to an Oracle server.

Unfortunately, from my experience, every 2-3 installations, there is a problem with the Oracle Client and its configuration, most of the times the solution being quite simple – removing some declarations from the set-up files, correcting the global variables or the TNS name. As it seems the Oracle Client is quite sensible to the changes in the default installation path, therefore is indicated to install the Client using the default path unless you’d like to gain more experience in troubleshooting such installation issues. In addition, as any respectable company, some of the products come with their own defects, the patching of such issues being not so easy, therefore it’s advisable to check beforehand the known issues coming with the Client version you’d need to install, preferably in case you need to take advantage of the latest Oracle features, it makes sense to install the latest stable Client.  

Because the Oracle Client downloadable package has a few hundred MB, there is a thinner Oracle alternative to Oracle Client, namely Instant Client package, the components could be downloaded from Oracle site (here) and installed individually, as a minimum being necessary to install Instant Client Package – Basic or Basic Lite versions, the ODBC libraries and the SQL *Plus libraries in case you want to test the connection. Of course, after case could be installed also the JDBC, SDK or any other packages Oracle made available.

On 64 bit platforms might be needed to install in parallel the 32 and 64 bit Oracle/Instant Clients (see SSIS, Oracle and X64 post from Business Vision DEV Team), while in order to troubleshoot the various issues could be a good idea to check the differences between 32 bit and 64 bit registry (here). 

There are several drivers that allows you to connect to an Oracle database using SSIS, the most popular ones:
- Microsoft OLE BD Provider for Oracle
- Oracle Provider for OLE DB
- .Net Framework Data Provider for Oracle
- .Net Framework Data Provider for Odbc
- Oracle Data Provider for .NET

The Microsoft drivers come with MDAC (Microsoft Data Access Components), its latest version being MDAC 2.8. Starting with Windows Vista and Windows Server 2008, Microsoft changed MDAC into WDAC (Windows Data Access Components), including it as part of the operating system, removing thus the need to redistribute the components. See Data Access Technologies Road Map, FAQ and Troubleshooting MDAC/WDAC for MDAC/WDAC architecture, components and releases, respectively troubleshooting. Given the various issues that exist with a particular MDAC/WDAC library, see in KB301202 how you could check the current version by using the registers, while for  Windows version up to Windows Server 2003 could be used the Component Checker MDAC Utility.

If .Net Framework Data Provider for Oracle, respectively for Odbc come with the .Net framework, in exchange, Oracle Data Provider for .NET, see also FAQ is Oracle’s implementation for ADO.NET data access,  supposed to take advantage of advanced Oracle database functionality, it comes with ODAC (Oracle Data Access Components). Please note that you might need in install the respective components in addition to the Oracle Client.

There are also third party drivers for Oracle or even particularly for SSIS, for example Microsoft Connectors Version 1.1 for Oracle and Teradata for use with SSIS from Attunity; see also a few tips from SQL Server Performance blog.

Before taking any decision on which driver to use, it might be a good idea to look also at drivers’ limitations and advantages. In the past I often used the Microsoft Oracle ODBC Driver until I run into in an important limitation, namely its inexistent support for unicodes, this residing, according to KB244661, in the fact that “from Microsoft Data Access Components (MDAC) version 2.5 and later versions, both the Microsoft ODBC Driver and OLE DB Provider support ONLY Oracle 7 and Oracle 8i”. Also Oracle Provider for OLE DB seems to have its own limitations.

SQL Server provides linked servers, the powerful functionality of executing commands against OLE DB data sources on remote servers, including Oracle. As linked servers offer the ability to create cross-vendor distributed queries, in certain scenarios they could prove to be a powerful alternative of querying Oracle databases, in such cases not being needed to create in SSIS package an additional connection to Oracle. KB280106 and an article on Oracle Provider for OLE DB describe how to set up and troubleshoot a linked server to an Oracle database in SQL Server.

You might want to check also the SQL Server Integration Services with Oracle Database 10 White Paper coming from Microsoft. “Connectivity and SQL Server 2005 Integration Services”  MSDN article written by Bob Beauchemin, Scott Barrett’s blog or this SSIS 64 bit – Using Oracle Provider post.

07 February 2009

🛢DBMS: Linked Server (Definitions)

"A definition of an OLE DB data source used by SQL Server distributed queries. The linked server definition specifies the OLE DB provider required to access the data, and includes enough addressing information for the OLE DB provider to connect to the data. Any rowsets exposed by the OLE DB data source can then be referenced as tables, called linked tables, in SQL Server distributed queries." (Microsoft Technet, "Glossary", 0)

"An abstraction of an OLE DB data source that looks like another server to the local SQL Server. A linked server has an associated OLE DB provider that manages the data source. Linked servers allow heterogeneous data access as if the data were local SQL Server data." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A database object that represents a particular data source and the attributes, including security and collation attributes, necessary to access the data source." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"An OLE DB data source used by SQL Server distributed queries." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"A definition that specifies an external OLE DB database source, such as another SQL Server, or an Oracle server. Once defined, the linked server can be used for distributed queries using only the four-part name in the query." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"Database objects that provide the connection information for remote data sources housed on another server. The other server could be a SQL Server, an Oracle server, a Microsoft Access database, or one of many other data sources. Linked servers are created when the remote data source will be accessed more than once or twice." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"Database objects that provide the connection information for remote data sources housed on another server. The other server could be a SQL Server, an Oracle server, a Microsoft Access database, or one of many other data sources. Linked servers are created when the remote data source will be accessed more than once or twice. See also ad hoc query." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"A definition of an OLE DB data source used by SQL Server distributed queries. The data exposed by a linked server is then referenced as tables, called linked tables." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A definition of an OLE DB data source used by SQL Server distributed queries. The linked server definition specifies the OLE DB provider required to access the data, and includes enough addressing information for the OLE DB provider to connect to the data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"This is the remote servers in a distributed query." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.