SAS for SQL Developers

I am writing this guide for SQL developers out there that is learning SAS for the first time.

In case you are wondering, SAS is in the top ten data science programming language. In Australia, major corporations uses SAS in many of its departments. The one reason they do this instead of Python or R is because for its Enterprise Support.

The following guide is written for some of the most common SAS functions.

Proc SQL

Let’s get one thing first. The SAS has a Proc SQL feature that allows its dataset to be queried in SQL syntax. This means that you can use join, unions, where, etc to chop and slice your datasets. You can also join the SAS datasets with another dataset from another library such as ODBC, Teradata or Oracle.

For more information on Proc SQL, read here.

Basic SAS syntax

%let put_dir = C:\Users\bigdataguy\Documents\SASPractise;
libname myhome "&dir." ;

data myhome.new_table;  /*this is the desired new table to be created. If no new table is desired, you can put the existing table name here*/
   set myhome.existing_table;  /*this is the existing dataset that you have*/

   /*To update existing column*/
   existing_column = existing_column + 1;
   
   /*To add new column and set the value for all columns*/
   new_column = existing_column * 2;

   /*To remove a column*/
   drop existing_column;

   /*To filter your dataset like a where clause*/
   where existing_column = 'criteria';  

   /*Another way of filtering your dataset, less efficient than using where however*/
   if existing_column = 'criteria'; 
run;

For more information regarding the difference between using where and if to filter datasets, read here.

Create a Dataset

There are several ways to create SAS dataset, and most usual way is by populating it from a file or another Database. The syntax below is useful when creating a dataset by inputting the data manually.

It is important here not to add empty spaces like tabs or spaces before the data.

Data names;
    input firstname $ surname $ age 2.;
    datalines;
Arnoldo Haddix 32
Pennie Cassinelli 33
Ken Thibeaux 54
Avelina Holte 12
Joie Otis 76
Sherlyn Erhardt 43
Dwain Payeur 23
Sharilyn Naylor 54
Tamiko Goldie 65
Jasmin Kneip 24
Corrin Marcantel 87
Dannie Ainsworth 16
;
run ;

Union

/*Union is simply done by using set*/
data new_dataset;
   set old_data1
         old_data2;
run;

Joining

/*Joins in SAS is done with Merge
Be mindful to sort the data using the by column prior to merging
*/

/*inner join*/
data new_data;
   set old_data1 old_data2;  /*name of the tables that you would like to join*/

   by user_id;  /*the join by column.*/
run;

/*For left or right join, you need to use if condition*/
data new_data;
   set old_data1 (in=a)
          old_data2 (in=b);  /*in is used as an alias*/
   by user_id;

   if a; /*this means that the output should only contains rows that exists in dataset a, not b. In other words this is a left join.*/
   if b; /*this means that the output should only contains rows that exists in dataset b, not a. In other words this is a right join.*/
   if a or b; /*this means that the output should contains rows in either datasets. In other words this is a full join.*/
run;

For more information on merge, read here.

Removing Duplicates

/*To remove duplicate IDs
This will keep the very first user_id and remove the subsequent ones
*/
proc sort data=existing_dataset NODUPKEY;
    by user_id;
run;

/*To remove all duplicate rows*/
proc sort data=existing_dataset NODUP;
    by descending user_id;   /*this is the sorting variable. Note that descending is define prior or after.*/
run;

For more information on proc sort, read here.

Qualify Rank/Row Number

Qualify Rank is used to pick the first row for a defined partition of data. To do that in SAS, you can do it in two ways:

/*1st way: to sort the data and remove duplicate keys*/
proc sort NODUP data=users;
   by user_id descending modified_date;
run;
proc sort NODUPKEY;
   by user_id;
run;

/*2nd way: by sorting then keeping the first row only*/
proc sort NODUP data=users;
   by user_id descending modified_date;
run;
data users;
   set users;
   by user_id;

   if first.user_id;
run;

Import/Export to Excel

This is not exactly a SQL guide, but I find myself needing to export dataset to Excel more often than printing it to the log. And here is what I need in case I forget the code.

/*Import*/
PROC IMPORT OUT= myhome.data
     DATAFILE= "&input_dir.\import_file.xlsx"
     DBMS=EXCEL REPLACE;
     sheet="Sheet1"; 
     GETNAMES=YES;
     MIXED=NO;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

/*Export*/
PROC EXPORT data=myhome.data
            OUTFILE= "&input_dir.\export_file.xlsx"
            DBMS=excel REPLACE;
RUN;

Descriptive Statistics function

In SQL, you would have to run a long query with Group By statement. While that still has its merits, if you just want a quick sum of your population, this is an easier way:

proc means data=your_dataset n sum mean max min range std fw=10;
   var score;
   title 'Summary of Your Data';
run;
/*From left to right: n=number of records, sum=total, mean=average, max=maximum, min=minimum, range=max-min, std=standard dev, fw=field width incl decimal*/

That’s all I got for now, I will add a few more things or two in the future. But I believe this is the simplest way to get started on your SAS skills. If you wish to know more, SAS do offer a basic Level 1 training for free on their website. I do recommend this training, and it should take 2 days to complete. Good luck!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s