Thursday, July 31, 2014

Top Five Functions in SAS/SQL with examples. Most commonly used SAS functions.

Based on my experience, here are the most commonly used functions in SAS which you would end up using in almost every program. All are equally important so read them all. 

SUBSTR : It’s used to extract a part of a string. It’s used only with character variables.

Syntax: SUBSTR(Variable_name, Argument_1, Argument_2)
Argument_1 = Starting position from where you want SAS to start extracting.
Argument_2 = Number of characters to extract.

Example: A company has assigned four digit product_ID to each product. You are asked to combine all the Product_IDs starting with 35, 36 and 37 into one Product Category “Mens Shoes” and product IDs starting with 22, 23 and 24 as second product category “Kids Shoes”.
Mens_Shoes = SUBSTR(Product_id,1,2) IN (“35”,”36”,”37”);
Kids_Shoes = SUBSTR(Product_ID,1,2) IN (“22”,”23”,”24”);

IN is the SAS statement which means any of these values in parenthesis.
It’s very common that you have to extract a part of the string in normal day to day analysis job so remember SUBSTR.



COMPRESS: Removes ALL the blanks from a string.

Syntax: COMPRESS(Variable_name)

Example: Sometimes your data in not in the correct format and might have some leading or trailing blanks or you may want to remove all the blanks from a variable “Phone_number”. This usually is the case when you combine data from different sources.
Emp_ID = COMPRESS(Phone_Number).
There are two more similar functions in SAS a) Strip – Strips removes the leading blanks and b) Trim: Removes the trailing blanks.


UPCASE: To capitalize all the characters in a String.

Syntax: UPCASE(Variable_name)

Example: It’s usually used in conjunction with other functions. For example: From variable Products, you want to select any products that start with ‘Men’. This column might contain values like Mens_Shoes, Mens_Shirts, MENS_Socks, MEN_Jackets etc. I am sure you observed that few values have different formats like Upper Case, Lower Case, Capitals etc and also, one of them is MEN (Not Mens). We can combine two functions here – SUBSTR and UPCASE.
If SUBSTR(UPCASE(Products),1,3) EQ “MEN” then New_Product = “Men_Clothing”;
SAS will use the inner most function first, in this UPCASE and then then the SUBSTR function to match values.



It’s very common that you have to convert variable type in SAS. So here are two main functions in SAS which are very frequently used.

PUT: To convert a variable from Numeric to Character. Also used for date conversions.

Syntax: PUT(Variable_Name,format)

Format is the character format you want to assign to new variable.
Example: Taking the same Product_ID example. You have Product_ID stored as Numeric. You want to extract first two characters of Product_ID and if its 35, 36 or 37 then product is Mens_Shoes. However, as the product_ID is numeric, you cannot use the SUBSTR function. So you have to first convert the Numeric to Character variable and then perform SUBSTR. Here you go…
If SUBSTR(PUT(Product_ID,$8.),1,2) IN (“35”,”36”,”37”) then Product EQ “Mens_Shoes”;



INPUT: To convert a variable from Character to Numeric. Also used for Date Conversions.

Syntax: INPUT(Variable_Name, format);
Format is the numeric format you want to assign to new variable.

Example: Suppose you want to match two tables based on variable Product_ID. However in one of the tables you have product_id as numeric and in other table its character format. So you have to change format of these variables. In this example, let’s change numeric variable to a character.
New_Product_ID = INPUT(Product_ID,8.);
Then you can rename, Drop or keep this variable (as you wish) and join with other table. 


Last and most powerful
INDEX: This is one of the most commonly used functions in SAS. I said this for almost all the functions above and I meant it for all of them – they are all important. They are used in everyday programming and very often asked in interviews.
INDEX is used to find Character/word or part of string from a Variable.

Syntax: INDEX(Variable_Name,”MEN”).

Example: In SUBTSR, you knew the starting position and number of characters you want to extract. But what if you want to see if the word “MEN” appears anywhere in the Variable then assign it to a Category – Mens_Clothing. Here you do not know the exact position as word MEN as appear anywhere in the variable. Here is the solution:
IF INDEX(UPCASE(Product),”MEN”) then Category = “MENS_CLOTHING”;
I used UPCASE just in case if MEN appear in the lower case in the variable. 


Ya I know, I said five but I gave you Six. Just couldn’t help.

Any questions/doubts/suggestions/complaints, please comment below.

No comments:

Post a Comment