Sunday, August 3, 2014

Summing up in SAS or SQL - SUM function with Group By statement

There can be two ways to sum-up numeric data in a table - horizontally (row sum) or vertically (Column sum). I will try to explain both separately in SAS and SQL, you can choose which ever you find easy. Personally I use SQL for both because code is easy to write and understand...

Example:

Name        Salary      Bonus      Year        Gross_Income
A               100         10            2013
B                200        15            2013
A               150         15            2013
A               80           12            2014


Horizontal summation (Row Sum) - You need to calculate Gross_Income for each row. 


SQL 
Proc SQl;
Create table New as
select *, SUM(Salary,Bonus) as Gross_Income
from Old;
quit;
The above code is self explanatory again. Creating a new variable Gross_Income and using SUM function in Proc SQL to sum up all the variables in parenthesis. (* brings all the existing variables as well in the final table)

SAS 
Data New;
Set Old;
Gross_Income = SUM(Salary,Bonus);
run;
The above code is self explanatory. Creating a new variable Gross_Income and using SUM function to sum up all the variables in parenthesis.



Vertical Summation - Summing by Groups: You need to find Total_salary per person per year. That means you need to add up total for A and B for each year. 


Proc SQL: Group by SUM
Proc SQL;
Create table New_Table as
select Name, Year, SUM(Salary)  as Yearly_Salary
from Old_Table
Group by Name, Year;
Quit;

*Same SUM function that you used for horizontal summation followed by new variable name.
*Add Group By Statement and mention the Category or Group Level at which you want to sum up. In this case, Its per person per Year.


SAS (using class statement):
Proc Means data = Old_table noprint;
VAR Salary;
CLASS Name Year;
output out = New_table
SUM = SUM_Salary;
run;

*You have to use Procedure Means to sum up vertically in SAS.
*In VAR Statement: You need to mention all the variables you want to sum up.
*In CLASS Statement: You need to mention the Category or Group Level at which you want to sum up. In this case, Its per person per Year.
*You have to use "Output Out=" statement to get results in the new_table. Otherwise results will be just print in the output window and no table will be created.
* You have to give new variable a name. In this case its Sum_Salary. If you do not give name to new variable that you are creating, SAS will give it a default name which might be confusing you have too many variables to sum up or find averages.



Hence I find Proc SQL easy for summations or averages.

Any questions, please comment.










No comments:

Post a Comment