Reply
Sat 18 Mar, 2023 07:20 am

My work for clients often calls for me to survey relatively small, closed groups, e.g., several hundred participants in a program. We typically use a total population sampling approach, that is, we send the survey to everyone in the group, rather than randomly selecting within the group, and we try to get as a high a response rate as we can. Often we employ a pre-post design, surveying once before a program and then again after it. Other times we have a single cross sectional survey only. If we have relevant external data about all participants, we will weight the achieved sample to the total population to adjust for non-response.

Once our surveys are collected, cleaned and weighted (if possible), our typical analyses consist of difference of means or proportions tests (sometime pre-post differences on outcome variables, sometime differences across groups in a single cross-sectional survey). We occasionally will do a regression analysis.

We use SPSS for our analyses.

My question is this: are the inferential statistical tests and p-values that SPSS produces applicable to our analyses? In other words, is statistical significance relevant to our findings?

I don't think so because the tests and p-values that allow us to assess statistical significance assume a probability survey of a large target population, whereas we have a (failed) total sampling of a small, closed group (failed in the sense of anything less than a 100% response rate). To put it concretely, if we have a group of 200 people and we get 150 responses to a survey, SPSS treats our N of 150 as if it came from a probability survey of a much larger target population. As such, it produces relatively large standard deviations and standard errors - larger than say if we had an N of 800 or 1,000 respondents - and the tests and p-values fail to reach conventional levels of statistical significance. But our N of 150, which is 3/4 of a closed group of 200 people in which we tried to survey everyone, has a very different meaning than an N of 150 from a probability survey of a large target population. And in that case, I don't believe the inferential statistical tests are applicable. Am I correct?

If so, are there alternative, quantifiable ways to assess the validity and/or reliability of our survey findings? My thinking is that the validity/reliability of the results is dependent primarily on a "qualitative" assessment of three things: 1) response rates 2) the extent to which the achieved sample resembles the total closed group on relevant external benchmarks and 3) substantive differences that make sense for the particular analyses we are conducting. In other words, validity/reliability increases the more responses from the target group we collect, the more an achieved sample looks like a target group (or can be adjusted to look like the target group through weighting), and the more we can justify a substantive/meaningful difference or pattern for a particular problem.

Are there other criteria for assessing findings in this situation? And, how do I respond to clients who just want to know if the results are "statistically significant"?

Thanks in advance for your thoughts.