SAS Tutorial Part 5: Data Input and Output - Mastering Data Movement
Welcome to Part 5 of our comprehensive SAS programming tutorial series! In this installment, we’ll dive deep into one of the most essential skills for any SAS programmer: reading data from external sources and writing data to various formats. Whether you’re importing customer records, processing survey responses, or exporting analysis results, mastering data input/output operations is crucial for effective data management.
Prerequisites
Before proceeding with this tutorial, you should be comfortable with:
- SAS DATA step basics (covered in Part 3)
- Variables, formats, and data types (covered in Part 4)
- Basic SAS programming concepts
- File system navigation and understanding of file paths
Introduction to Data Input and Output
In the real world, data rarely exists solely within SAS. You’ll frequently need to:
- Import data from CSV files, Excel spreadsheets, or text files
- Read fixed-width reports from legacy systems
- Export analysis results for use in other applications
- Create formatted reports for stakeholders
SAS provides multiple methods for handling these scenarios, from the flexible DATA step approaches to convenient PROC procedures. Understanding when and how to use each method will make you a more efficient programmer.
Reading External Data with the DATA Step
The DATA step provides the most control and flexibility when reading external data. The key statements are:
- INFILE: Specifies the external file to read
- INPUT: Defines how to read the data
Basic CSV File Reading
Let’s start with a simple CSV file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* Sample CSV file: customers.csv
CustomerID,Name,Age,City,PurchaseAmount
1001,John Smith,35,New York,1250.50
1002,Jane Doe,28,Los Angeles,875.25
1003,Bob Johnson,42,Chicago,2100.00
*/
data customers;
infile '/path/to/customers.csv' dsd firstobs=2;
input CustomerID Name $ Age City $ PurchaseAmount;
run;
proc print data=customers noobs;
run;
Key points:
dsd
option handles comma-separated values and consecutive delimitersfirstobs=2
skips the header row- Character variables are indicated with
$
Handling Different Delimiters
Not all files use commas as delimiters:
1
2
3
4
5
6
7
8
9
10
11
12
/* Tab-delimited file */
data sales_data;
infile '/path/to/sales.txt' dlm='09'x firstobs=2;
input Date :yymmdd10. Product $ Quantity Revenue;
format Date date9.;
run;
/* Pipe-delimited file */
data inventory;
infile '/path/to/inventory.dat' dlm='|' dsd;
input ItemCode $ Description :$50. Stock Price;
run;
Reading Fixed-Width Files
Legacy systems often produce fixed-width files where each field occupies specific columns:
1
2
3
4
5
6
7
8
9
10
11
12
13
/* Fixed-width file example:
Positions: 1-5: ID, 6-25: Name, 26-28: Age, 29-35: Salary
00123John Doe 025 50000
00124Jane Smith 032 65000
*/
data employees;
infile '/path/to/employees.txt';
input @1 ID $5.
@6 Name $20.
@26 Age 3.
@29 Salary 7.;
run;
Advanced INFILE Statement Options
The INFILE statement offers numerous options for handling various file scenarios:
MISSOVER vs. TRUNCOVER
These options control how SAS handles short records:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* MISSOVER: Sets remaining variables to missing */
data test_missover;
infile datalines missover;
input Name $ Score1 Score2 Score3;
datalines;
John 85 90 95
Jane 88 92
Bob 90
;
run;
/* TRUNCOVER: Reads available data without going to next line */
data test_truncover;
infile datalines truncover;
input Name $10. Description $20.;
datalines;
John Senior Analyst
Jane Manager
Bob Jr Dev
;
run;
Error Handling Options
Control how SAS responds to data errors:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
data robust_import;
infile '/path/to/messy_data.csv' dsd firstobs=2
missover /* Handle short records */
lrecl=32767 /* Maximum record length */
pad /* Pad short records */
termstr=crlf; /* Handle different line endings */
/* Use _ERROR_ and _N_ for debugging */
input CustomerID Name :$50. Age Revenue;
if _ERROR_ then do;
put "Error on line " _N_ ": " _INFILE_;
_ERROR_ = 0; /* Reset error flag */
end;
run;
Advanced INPUT Techniques
The INPUT statement provides multiple ways to read data:
Column Input with Pointers
Use pointer controls for precise positioning:
1
2
3
4
5
6
7
8
9
data complex_read;
infile '/path/to/report.txt' firstobs=5;
input @1 Date mmddyy10.
@12 Region $3.
@16 Sales comma12.2
@30 Manager $20.
@51 Status $1.;
format Date date9. Sales dollar12.2;
run;
Formatted Input
Read data with specific formats:
1
2
3
4
5
6
7
8
9
10
11
12
13
data formatted_data;
infile datalines dsd;
input TransDate :mmddyy10.
Time :time8.
Amount :comma12.2
Category :$15.
Description :$50.;
format TransDate date9. Time time8. Amount dollar12.2;
datalines;
01/15/2024,14:30:00,"1,250.50",Electronics,Laptop purchase
02/20/2024,09:15:30,"350.25",Office,Printer supplies
;
run;
Multiple Records Per Observation
Sometimes one observation spans multiple input lines:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
data customer_addresses;
infile datalines;
input #1 CustomerID 4. Name $25.
#2 Street $30.
#3 City $20. State $2. Zip $5.;
datalines;
1001 John Smith
123 Main Street
New York NY10001
1002 Jane Doe
456 Oak Avenue
Los Angeles CA90001
;
run;
Named Input for Flexibility
Named input allows variables to appear in any order:
1
2
3
4
5
6
7
8
9
data flexible_input;
infile datalines dsd;
input Name=$ Age= City=$ Salary=;
datalines;
Name=John,Age=30,City=Boston,Salary=75000
Age=25,Name=Jane,Salary=65000,City=Seattle
City=Denver,Salary=70000,Age=35,Name=Bob
;
run;
Writing Data to External Files
The FILE and PUT statements allow you to write data:
Basic File Output
1
2
3
4
5
6
7
8
9
10
11
data _null_;
set customers;
file '/path/to/output.csv' dsd;
/* Write header on first observation */
if _N_ = 1 then
put 'CustomerID,Name,Age,City,PurchaseAmount';
/* Write data */
put CustomerID Name Age City PurchaseAmount;
run;
Creating Formatted Reports
Generate professional-looking reports:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
data _null_;
set sales_summary end=last;
file '/path/to/sales_report.txt' print header=header_section;
/* Calculate running total */
retain total 0;
total + Revenue;
/* Write detail lines */
put @5 Region $10. @20 Product $15. @40 Revenue dollar12.2;
/* Write total at end */
if last then do;
put @5 60*'-';
put @5 'Grand Total:' @40 total dollar12.2;
end;
return;
header_section:
put @20 'Monthly Sales Report';
put @20 19*'=';
put;
put @5 'Region' @20 'Product' @40 'Revenue';
put @5 60*'-';
return;
run;
Dynamic File Names
Create multiple output files based on data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
data _null_;
set sales_by_region;
/* Create separate file for each region */
length filename $50;
filename = cats('/path/to/output/', Region, '_sales.csv');
file dummy filevar=filename dsd mod;
/* Write header if first record for region */
if first.Region then
put 'Date,Product,Quantity,Revenue';
put Date Product Quantity Revenue;
run;
PROC IMPORT: Quick and Easy Importing
PROC IMPORT provides a convenient way to import data without writing INPUT statements:
Basic PROC IMPORT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/* Import CSV file */
proc import datafile='/path/to/sales_data.csv'
out=sales_imported
dbms=csv
replace;
getnames=yes;
datarow=2;
run;
/* Import Excel file */
proc import datafile='/path/to/quarterly_report.xlsx'
out=quarterly_data
dbms=xlsx
replace;
sheet='Q1_2024';
getnames=yes;
run;
/* Import tab-delimited file */
proc import datafile='/path/to/inventory.txt'
out=inventory
dbms=tab
replace;
getnames=yes;
run;
PROC IMPORT Options
Control the import process:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
proc import datafile='/path/to/customer_data.csv'
out=customers
dbms=csv
replace;
getnames=yes; /* Use first row as variable names */
datarow=2; /* Data starts on row 2 */
guessingrows=100; /* Rows to scan for variable types */
run;
/* Review the generated DATA step code */
proc import datafile='/path/to/sample.csv'
out=sample
dbms=csv
replace;
getnames=yes;
run;
/* Check the log for the generated code */
PROC EXPORT: Efficient Data Export
PROC EXPORT simplifies writing data to external files:
Basic PROC EXPORT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/* Export to CSV */
proc export data=analysis_results
outfile='/path/to/results.csv'
dbms=csv
replace;
putnames=yes;
run;
/* Export to Excel */
proc export data=monthly_summary
outfile='/path/to/summary.xlsx'
dbms=xlsx
replace;
sheet='Summary';
run;
/* Export to tab-delimited */
proc export data=product_list
outfile='/path/to/products.txt'
dbms=tab
replace;
run;
Controlling Export Format
1
2
3
4
5
6
7
8
/* Export with specific delimiter */
proc export data=custom_output
outfile='/path/to/output.txt'
dbms=dlm
replace;
delimiter='|';
putnames=no;
run;
When to Use DATA Step vs. PROC IMPORT/EXPORT
Use DATA Step When:
- You need precise control over data types
- The file has a complex structure
- You need to perform transformations during import
- You’re dealing with fixed-width files
- Error handling is critical
Use PROC IMPORT/EXPORT When:
- The file structure is straightforward
- You want SAS to determine variable types
- You’re working with standard formats (CSV, Excel)
- You need a quick solution
- The file has consistent formatting
Best Practices for Data Input/Output
1. Data Validation During Input
Always validate data as you read it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
data validated_import;
infile '/path/to/customer_data.csv' dsd firstobs=2;
input CustomerID Name :$50. Age Revenue;
/* Validate age */
if Age < 0 or Age > 120 then do;
put "WARNING: Invalid age " Age " for customer " CustomerID;
Age = .;
end;
/* Validate revenue */
if Revenue < 0 then do;
put "WARNING: Negative revenue " Revenue " for customer " CustomerID;
delete; /* Skip this record */
end;
run;
2. Error Checking and Logging
Implement comprehensive error checking:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
%let infile = /path/to/daily_transactions.csv;
data transactions
errors(keep=LineNumber ErrorDescription);
infile "&infile" dsd firstobs=2 missover;
length ErrorDescription $200;
input TransID Date :mmddyy10. Amount Category $;
/* Track line numbers */
LineNumber = _N_;
/* Check for missing required fields */
if missing(TransID) then do;
ErrorDescription = "Missing Transaction ID";
output errors;
delete;
end;
/* Validate date */
if missing(Date) or Date > today() then do;
ErrorDescription = cats("Invalid date: ", Date);
output errors;
end;
else output transactions;
run;
/* Report errors */
proc print data=errors;
title "Import Errors for &infile";
run;
3. Performance Considerations
Optimize large file operations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/* Use PROC APPEND for large datasets */
proc append base=master_dataset
data=daily_update;
run;
/* Use views for repetitive operations */
data monthly_import / view=monthly_import;
infile '/path/to/monthly_*.csv' dsd firstobs=2;
input Date :yymmdd10. Store $ Sales;
format Date date9.;
run;
/* Buffer size optimization */
data large_file_import;
infile '/path/to/huge_file.csv' dsd firstobs=2
bufsize=32768 /* Increase buffer size */
lrecl=4096; /* Set appropriate record length */
input /* variables */;
run;
Practical Exercises
Exercise 1: Multi-Format Import
Create a program that reads customer data from three different sources:
- A CSV file with basic customer information
- A fixed-width file with transaction history
- A pipe-delimited file with contact preferences
Merge these into a single comprehensive customer dataset.
Exercise 2: Dynamic Report Generation
Write a program that:
- Reads sales data from a CSV file
- Creates separate output files for each region
- Generates a summary report with totals
- Exports the summary to Excel format
Exercise 3: Error-Resistant Import
Develop a robust import process that:
- Handles missing values appropriately
- Validates data ranges
- Logs all errors to a separate dataset
- Creates a summary report of import statistics
Exercise 4: Complex File Structure
Read a file where:
- The first line contains metadata
- Each customer record spans 3 lines
- Some fields are optional
- Date formats vary by record type
Common Pitfalls and Solutions
Problem 1: Truncated Character Variables
1
2
3
4
5
6
7
8
9
10
11
12
/* Problem: Names getting cut off */
data truncated;
infile 'data.csv' dsd;
input Name $ Age; /* Default length is 8 */
run;
/* Solution: Specify length */
data fixed;
infile 'data.csv' dsd;
length Name $50;
input Name $ Age;
run;
Problem 2: Date Format Mismatches
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* Handle multiple date formats */
data flexible_dates;
infile 'dates.csv' dsd;
input DateText :$10. @@;
/* Try multiple formats */
if not missing(DateText) then do;
Date = input(DateText, mmddyy10.);
if missing(Date) then Date = input(DateText, date9.);
if missing(Date) then Date = input(DateText, yymmdd10.);
end;
format Date date9.;
run;
Problem 3: Embedded Delimiters
1
2
3
4
5
/* Handle commas within quoted fields */
data quoted_fields;
infile 'complex.csv' dsd; /* DSD handles quotes */
input Company :$50. Revenue :comma12. Description :$100.;
run;
Summary
In this tutorial, we’ve covered:
- Reading data using INFILE and INPUT statements
- Handling various file formats and delimiters
- Advanced INPUT techniques for complex data
- Writing data with FILE and PUT statements
- Using PROC IMPORT and EXPORT for convenience
- Best practices for robust data operations
- Performance optimization strategies
Mastering data input and output operations is fundamental to becoming an effective SAS programmer. These skills enable you to work with data from any source and share your results in any format required by your organization.
What’s Next?
In Part 6, we’ll explore Data Manipulation and Processing, where you’ll learn:
- Advanced data subsetting techniques
- Merging and joining datasets
- Data transformation strategies
- BY-group processing
- Creating complex derived variables
Continue practicing with different file formats and data structures. The more experience you gain with various data sources, the more prepared you’ll be for real-world data challenges!
Additional Resources
- SAS Documentation: INFILE Statement
- SAS Documentation: INPUT Statement
- PROC IMPORT Guide
- Sample datasets for practice (available in course repository)
Happy coding, and see you in Part 6!