0
   

Using Zip Codes for Analyzing Binary Outcomes in SAS

 
 
Nives1
 
Reply Mon 1 Jun, 2015 07:41 pm
My outcome variable is the occurrence of Parkinson's Disease in an individual or not. My covariates include age, sex, Ozone Exposure (categorized 1 2 3 4: low med med-high high) and Particulate Matter Exposure (categorized 1 2 3 4: low med med-high high), and Zip code.

The problem: I am unsure of what to do with the zip code variable. I want to try to run Proc SurveyLogistic and use the Cluster command for zip code but the problem is there are not enough subjects in each zip code. Some Zip codes have only 1 subject in them. Then I was thinking of grouping Zip Codes somehow and then clustering on this new grouping variable. I do not know how to go about this.

If there is any advice or examples you can offer, please do.
Thank you!
 
jespah
 
  3  
Reply Tue 2 Jun, 2015 05:55 am
@Nives1,
Sure, cluster the Zip codes.

Note - I am not a statistician. But I have worked with Zip codes as a data point. The post office does seem to have some rules about the numbers, so you can make some general inferences. I caution you that those rules are imperfect. E. g. New England Zip codes start with a zero, but so do those in New Jersey.

This guy has decent zip code lists - http://www.quine.org/

Just scroll down; there are lists on the side. I get the strong impression that he developed these lists manually, plus he acknowledges that postal codes do change at times (and newer ones are often added). For a truly comprehensive and up to date list, you'll need to go to the post office itself and pay for it.

As for how to aggregate, I'll tell you how I did it, back in the day. It involved a road atlas and a compass (the kind you draw a circle with, not the kind you use for navigation). I would look at the Zip code for, say, downtown Boston (there are several, but 02215 is as good as any other for this purpose). I would then check Zip codes in all directions to the end of the city limits. My rough estimate was the the numbers in between were all within Boston. For areas not so well-defined, I would use a compass and draw a circle around the downtown Zip code. The size of that circle's radius had a lot to do with the area. E. g. Houston has a lot of sprawl, whereas what is called Los Angeles versus LA County and the LA area are rather different things.

The system was imperfect, but it was a lot faster than trying to determine where the edges of Boston or Houston really were by looking up every single Zip code. If you have the time, you might feel the need to do this. But check with the post office if you have a need for such excruciatingly accurate and minute data. I have to believe that it's out there, for the right price.
0 Replies
 
 

Related Topics

 
  1. Forums
  2. » Using Zip Codes for Analyzing Binary Outcomes in SAS
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 0.08 seconds on 11/16/2024 at 03:51:11