Showing posts with label data frames. Show all posts
Showing posts with label data frames. Show all posts

22 December 2019

💠SQL Server: Using the R Language in SQL Server 2016 (Hello World & Working with Data Frames) [new feature]

One of the most interesting features coming with SQL Server 2016 is the possibility to run external scripts written in the R language or Python, taking thus advantage of the numerical and statistical packages coming with the respective languages. The next examples are based on the R language.

As the scripts in R are considered as external scripts, is needed first to enable the 'external scripts enabled' configuration option by using the following script (a server restart is required):

-- enable external scripts 
sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE;

To makes sure that the functionality works as expected, it makes sense to attempt first a "hello world" example:

-- hello world script
EXECUTE sp_execute_external_script 
        @language = N'R',  
        @script = N'print("Hello world")'



The R language come with a few predefined datasets and for the following examples I’ll be using the mtcars dataset. Because the dataset contains several columns, I will use only the first 3. For this one care run a script as the following into the R console:


Usually it’s useful to look first at the structure of the dataset, this by using str(mtcars) command in the R console:


To return the dataset from R one can use the following call to the sp_execute_external_script stored procedure:

  -- returning the first 3 columns 
 EXEC sp_execute_external_script  
       @language = N'R'  
     , @script = N'cars <- mtcars[1:3];'
     , @input_data_1 = N''  
     , @output_data_1_name = N'cars'
     WITH RESULT SETS (("mpg" float not null 
         , "cyl" float not null 
         , "disp" float not null 
       ));  


As can be seen, besides the script is needed to define a variable in which the returning dataset is stored, as well the resulting dataset. Unfortunately, this script doesn’t return rows’ names. To do that I had to use a small trick by concatenating the initial data frame with the one resulting from row’s names. (Please let me know if you have another way of achieving the same.)

The script becomes:

 -- returning the first 3 columns including rows' name
 EXEC   sp_execute_external_script  
       @language = N'R'  
     , @script = N'cars <- data.frame(rownames(mtcars), mtcars[1:3]);'
     , @input_data_1 = N''  
     , @output_data_1_name = N'cars'
     WITH RESULT SETS (("Car" varchar(100)
      , "mpg" float not null 
      , "cyl" float not null 
      , "disp" float not null 
       ));  



To reuse the script, it can be included in a stored procedure, similarly like the examples provided by the Microsoft documentation for the sp_execute_external_script stored procedure.

A few pointers:
1. It’s useful to test your scripts first in the R console.
2. If 'external scripts enabled' was not enabled, then the following error message will appear:
Msg 39023, Level 16, State 1, Procedure sp_execute_external_script, Line 1 [Batch Start Line 22]
'sp_execute_external_script' is disabled on this instance of SQL Server. Use sp_configure 'external scripts enabled' to enable it.

3. It might be needed to start the “SQL Server Launchpad” service manually, a hint in this direction comes from the following error message:
Msg 39011, Level 16, State 1, Line 24
SQL Server was unable to communicate with the LaunchPad service. Please verify the configuration of the service.
 
4. Once the examples tested, it might be recommended to disable the 'external scripts enabled' configuration option as long is not needed anymore.
5. Check the various ways to analyse the mtcars dataset using the R language: https://rpubs.com/BillB/217355
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.