SAS Code to Solve Common Data Problems

In my line of work, often we are required to ‘prime’ the data before we can actually use it. This is some of the SAS code that I find it useful in my day-to-day work to prime the data.

Time Series Dataset

Databases that I have worked with often have the data setup either with a snap date or the date ranges (from-to dates). Often I need to convert date ranges data to daily snaps so then I can analyse it further. The following code creates a table with a series of calendar dates, and some other useful variables.

data calendar;
input Start_Date date9. day 8.; 
format Start_Date date9. day 8.;  
01JUN2001 0
run ;

Proc timeseries data=calendar out=calendar;
    id start_date interval=day start="01JUN2001"d end="31DEC2020"d setmiss=0;
    var day;

data calendar;
    set calendar (rename=(start_date=calendar_date));
    drop day;
    format mthstart mthend wkstart wkend date9.;

    mthstart = intnx('month',calendar_date,0,"BEGINNING");
    mthend = intnx('month',calendar_date,0,"ENDING");
    month_key =  put(mthstart, yymmn6.);

    wkstart = intnx('week',calendar_date,0,"BEGINNING");
    wkend = intnx('week',calendar_date,0,"ENDING");
    wkday = weekday(calendar_date); /*1=Sunday, 2=Monday, ..., 7=Saturday*/

Flagging Continuous Rows

Often I need this code to flag rows that have specific criteria. For example, I have a set of bills with due dates, and I want to duplicate the due date data for all calendar dates from the day it is due to the date it is paid. This data then can be used to create expired flag. This method can also be used to create a cumulative amount.

proc sort data=bills_data;
   by bill_id;
data bills_data;
    set bills_data;
    by bill_id;

    if first.bill_id then expiry_date = .;
    if due_date >= calendar_date then expiry_date = due_date;

    retain expiry_date;

Macros: Union multiple tables with different names

Macros in SAS allows you to do programming on top of SQL. For example, supposed you have monthly snapshot tables where the monthkey is set as the table name. Supposed you want to combine them into one table for analysis. You could run the same SQL query with each of the names and union them all, but that is not efficient. Or, you could do this:

/*I put the sql query in separate macro to allow a more complex query without messing up your code.*/
%macro exec_sql(mthkey);
   proc sql;
       create table monthly_data_&mthkey. as
       select *
       from dbc.monthly_snap_data_&mthkey. as d

%macro combine_tables(start_date, end_date);
    data _null_;
        loop = intck('month',&start_date., &end_date.);
        call symputx('loop',loop);
    %put &loop.;

    %do i=0 %to &loop.;
        data _null_;
            mth = intnx('month',&start_date.,&i,'E');
            month_key = put(intnx('month', mth, 0), yymmn6.);
            call symputx('mth',month_key);
        %put &mth.;

      proc append base=combined_data data=monthly_data_&mthkey force; run;

%combine_tables('01Sep2010'd, '31Mar2021'd);

That’s it, I hope this helps your work. I will add a few more things in the future as I encounter other common problems.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s