úterý 9. září 2014

Comparison of SAS data step and SQL

The default tool for ETL in SAS is data step. However, SAS also offers support for SQL. When to use which?

The main advantages of data step are:

  1. Drop keyword. Let's imagine that you want to remove one column from a table with 2000 columns. In SQL you would have to name all columns you want to keep. But in data step it is enough to just name the column you don't want to include. Awesome.
  2. Wildcards. If you want to select all columns beginning with "pred_", all you have to do in data step is to write "pred_:" (note the column). In SQL you would have to write name of each predictor.
  3. Speed. SQL in SAS is not implemented overly effectively. 
  4. LAG command. In SQL you have to perform a slow and cumbersome join to get the corresponding functionality. 
The main advantages of SQL:
  1. Group by command. Simply because data step doesn't offer such functionality.
  2. Order by command. Again you can't sort directly in data step.  
  3. Metadata. Queries on the metadata are so addictive!

Žádné komentáře: