Group Assortment measure

Forums: Statistics, Probability
Email this Topic • Print this Page

Hi
I need a formula that returns a value representative of the amount of ‘assortment’ a group shows. The groups are made up of individuals, all of a binary class (e.g. male or female), are of difference sizes, and can be from different populations (i.e. different ratio of males to females). I have thought of the logical rules and examples for this, but am having difficulty formulising it properly, despite extensive attempts using binomial probabilities. I think the best way to explain is give some examples, of some groups, and which would rank the highest in ‘assortment’:

e.g. In a population with equal ratio of males:females

GROUP-A = 1Male & 1Female
GROUP-B = 2M & 0F
G-C = 0M & 2F
G-A is the most ‘dissassorted’ whilst G-B and G-C are equally assorted

G-D = 5M & 0F
G-D is more assorted than both G-B, and G-C, as the probability of getting 5 males in a group of 5 is much lower than getting 2 in a group of 2

Now, consider some groups from a population of with 9 males to each females
G-E = 5M & 0F
G-F = 5M & 5F

G-E demonstrates less assortment that G-D, as chances of getting 5M 0F is much higher when chance of male occurrence is 0.9 (i.e. 9:1 M:F)
G-G demonstrates much more ‘assortment’ than G-F (or G-B or G-C), as the chances of getting 5F at with 0.1 chance of getting each female (even in a group of 10 individuals), is very low.

Therefore, a need a measure that would give a value of assortment for any given group, and would make sense that the more ‘assorted’ a group is, and the reduced likelihood of getting it, the higher the value is.
I have tried lots of things with binomial probabilities, and one of the main problems with my best attempts is that a group with no actual assortment (e.g. 1M & 1F) could score higher than a group which potentially displays assortment (e.g. 2M & 0F) if , for example, the chance of a female occurring is very low.

Topic Stats
Top Replies
Link to this Topic

Type: Question • Score: 0 • Views: 1,036 • Replies: 18

No top replies

@josh1,

Is 1M/1F from a 1:1 population scoring higher than 2M/0F from a 99:1 population a bad thing?

How would you rank these:
A: 1M/1F from a 1:1 population
B: 2M/0F from a 1:1 population
C: 1M/1F from a 9:1 population
D: 2M/0F from a 9:1 population
E: 0M/2F from a 9:1 population

How about these:
a: 1M/1F from a 1:1 population
b: 5M/5F from a 1:1 population
c: 10M/10F from a 1:1 population

If possible, give some sense of scale (e.g. A is much higher than B, but just a little higher than C).

1 Reply

@markr,

Hi Mark
Thanks very much for your reply
Yes, 1M/ 1F (from 1:1) definitely does not show any assortment (i.e. propensity for 'males to be with males' and 'females to be with females'), whilst 2M/0F (from 99:1) may show some assortment, even though the chance of it occurring by chance alone (i.e. with no actual underlying assortment) is very high. Therefore, 2M/0F (from 99:1) should actually be scored higher for assortment than 1M/1F (from 1:1), does that make sense?

As for the rankings of your groups, from 'most assorted' to 'most balanced/disassorted', the first set would be scored:
E: 0M/2F from a 9:1 population (high assortment, very low probability group would arise by chance alone)

B: 2M/0F from a 1:1 population (medium assortment, could just be chance)

D: 2M/0F from a 9:1 population (low assortment, but assortment may just be due to chace)

A: 1M/1F from a 1:1 population (no assortment, balanced, could be due to chance)

C: 1M/1F from 9:1 population (no assortment, actually very high balance, as low probability that this disassortment is due to chance.

And for the second set (again, most assorted to most 'balanced):
a: 1M/1F from a 1:1 population (no assortment, shows balance)
b: 5M/5F from a 1:1 population (no assortment, shows high 'balance', lower probability this is due to chance)
c: 10M/10F from a 1:1 population (no assortment, showing even higher balance as very low chance this dissassortment is due to chance alone)

Thanks for your interest, it would be great to hear your thoughts, it has been suggested I look into using entrophy measures, but I am still considering how this might work

Thanks again

1 Reply

@josh1,

I was thinking along those lines myself, but after seeing how you ranked my examples, I've got another question. Would you ever rank a more balanced group above a more assorted group based on probability? For example, how would you rank these groups that are all from a 8:2 population:
A: 0M/10F
B: 1M/9F
C: 2M/8F
D: 3M/7F
E: 4M/6F
F: 5M/5F
G: 6M/4F
H: 7M/3F
I: 8M/2F
J: 9M/1F
K: 10M/0F

1 Reply

@markr,

good question, and yes, a group which initially appears more 'balanced' could actually be ranked as less assorted than a group which appears 'less balanced', based on probability. Although I struggle to rank the new examples exactly (because I don't have a real measure yet), I hope this 'off the top of my head' example still expresses what I mean:
A , B , C , D, E/K, F/J, G, H, I
Where the '/' represents when I am highly unsure which would come first.

1 Reply

@josh1,

I guess this:
"Therefore, 2M/0F (from 99:1) should actually be scored higher for assortment than 1M/1F (from 1:1), does that make sense?"
doesn't really make sense to me. In the absence of any other factors (propensity for men to be with men and/or women to be with women), the former is almost twice as likely to occur as the latter. Seems to me that the latter speaks more strongly to such propensities.

By the way, how do such propensities even come into play? Why isn't the score strictly based on the distribution of the group? What is the selection process for the group? I guess it's not a random sample from the population.

1 Reply

@markr,

From the questions you are asking I think we are thinking about this in a very similar way, which is very good. Apart from, this point, which I hope to address (or understand where you are coming from)

"Therefore, 2M/0F (from 99:1) should actually be scored higher for assortment than 1M/1F (from 1:1), does that make sense?"
doesn't really make sense to me. In the absence of any other factors (propensity for men to be with men and/or women to be with women), the former is almost twice as likely to occur as the latter. Seems to me that the latter speaks more strongly to such propensities.

Basically, I am thinking of the 'assortment' score of the group as a way of measuring the "propensity for men to be with men and/or women to be with women". Thus, from a 99:1, an 'assorted' group which is very unlikely to occur by chance (e.g. 0M 2F) should have very high assortment score, as this gives us lots of confidence that such 'propensity' is in play. Then, a group which shows assortativity yet is extremely likely to occur by chance alone (e.g. 2M 0F) indicates to us that some 'propensity' may be occurring, but our observed group could have easily occurred by chance even in the absence of this propensity, therefore only scores a low value. However, a group of 1M 1F does not indicate any sort of "propensity for men to be with men and/or women to be with women" to us, therefore, no matter how unlikely it is to occur by chance, it does not score higher than 2M 0F in terms of the score. Does this make sense?

To attempt an answer at your other questions:
1)The mechanism of how such propensities come into play hopefully shouldnt be restrictive (individual preferences, spatial distributions, etc).
2) I cannot think of a suitable method of scoring (to assess such propensity) based strictly on the distribution of the group
3) I believe the null hypothesis would be that a 'random sampling' process creates they group, whilst the alternative would be that some underlying propensity/assortment preference drives the process

I understand this is far from a 'straight forward' question, particularly because of the informal definition of 'assortment' at this stage, which I hope can be more logically formalised once mathematically forumalised. Thanks for your ongoing interest in solving this problem

1 Reply

@josh1,

Ah, so you're trying to measure the assortment propensity of the sampling method and/or population - not the group. Correct?

Is it possible for that propensity to differ between men and women in the same population? That would seem to complicate things.

Perhaps it makes sense to come up with a model for how the selection process might work. For instance, one person is selected at random, and the remaining people are selected at random from the set of friends of the originally selected person. Keeping it simple, you could assume that each member of the population has the same ratio of same-sex friends (SSF) to opposite-sex friends (OSF) (making it difficult, you could assume some probability distribution for this ratio, and/or it could differ between men and women). That ratio would be the score you're trying to determine. Then the problem becomes, "Given an M:W population, what ratio of SSF to OSF would most likely yield the group that was selected?"

1 Reply

@markr,

Hi
Initially I was wanting the assortment of the actual groups too, to use these values in later models. However, yes, I also need the assortment propensity of the sampled population, in terms of the groups that are appearing.
Also, a final solution allow the propensity to differ between men and women in the same population, but I think we should stay away from that for now (unless it actually makes things simpler)

Lets change the goal slightly, heres an analogy (excuse the childishness):
A college canteen has mutiple groups come to it throughout year 1, I would like to calculate the amount of assortment these groups show (i.e. know if they show that male are with male and female are with female) and compare this to the amount of assortment groups shown in year 2, 3, 4 etc. And essential, in the end say, the groups in year 1 show assortment bias than groups in year t, and therefore the population is more assorted.
Any ideas on how to do this? prehaps combining binomial probabilities over all the groups in some way? It still needs to be the case that e.g. a group of 0M 10F would represent more assortment than 0M 2F, but prehaps we can assume a 50/50 ratio if that makes things more possible.

1 Reply

@josh1,

Thanks for the real world example. That helps make sense of it all.

Borrowing from chi squared, how about (M-N/2)^2 / (N/2) where N = M+W.

Others:
N * [1 - min(M,W)/max(M,W)]
[max(M,W) - min(M,W)]^2 / max(M,W)

1 Reply

@markr,

Thanks, these are very usable solutions to the problem when a 50/50 ratio is assumed. I prefer no.1 and no.3, as I think no.2 would have a slight problem with groups with odd numbers in, whereby the most possible 'balanced group' the 'assortment' score is increased as size increases. e.g. 101m 100f would be rated as more assorted than 2m 1f even though the former actually shows a lot of 'balance'.
A few questions for you
1) In the formula, a maximum 'balanced' or 'disassorted' group is always scored the same (i.e. 0), is there a way that a group of 100m 100f would be classed as more balanced/disassorted than 1m 1f?

2) Do you think taking a mean of the final group scores, and comparing to other data, or even to permutations of the same data, would show 'overall assortment' in the groups? Do you know of anyway that a summary statistic could be calculated over all the group compositions, rather than working out balance scores (e.g. multiple binomial tests maybe? or something else?)

3) Finally, an obvious one, would there be a way of expanding this to account for a non-balanced overall population?

Thanks again markr!

1 Reply

@josh1,

Quote:

1) In the formula, a maximum 'balanced' or 'disassorted' group is always scored the same (i.e. 0), is there a way that a group of 100m 100f would be classed as more balanced/disassorted than 1m 1f?

Perhaps something like:
((M-N/2) + 1)^2 / (N/2)

Quote:

2) Do you think taking a mean of the final group scores, and comparing to other data, or even to permutations of the same data, would show 'overall assortment' in the groups? Do you know of anyway that a summary statistic could be calculated over all the group compositions, rather than working out balance scores (e.g. multiple binomial tests maybe? or something else?)

I don't think I understand this question:
- comparing to other data?
- overall assortment in the groups?
- permutations of the same data?
- all the group compositions?

Quote:

3) Finally, an obvious one, would there be a way of expanding this to account for a non-balanced overall population?

Well, assuming a measure of randomness does a good job of measuring assortment, and continuing to use the chi squared formula, N/2 becomes M/(M+W). The formula is (observed - expected)^2 / expected.

I'm on the road through Saturday, so responses may be slower and less thought out.

1 Reply

@markr,

Hi markr, thanks again.

In response to parts 1 and parts 3, these still yield some issues with more 'assorted' groups been classed as more 'dissassorted' but I'll continuing working on it and let you know how it goes.

In regards to part 2, this is taking a slightly different approach, of asking whether, over all the groups, we see assortment. For example, taking all the groups of individuals that turned up to the canteen over the entire year, we can ask whether we see that groups show assortment (i.e. likely to be biased). So far I have been using the standard proportion test for this (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/prop.test.html - if you are an R user), and seems to work quite well. We can then compare the chi-sqed value to those generated from node permutations of the same data. I think this may be effective, but might have some problems with non-independence (hopefully mitigated by the comparison to permutations). Would you have any opinions about this approach? or aware of any alternatives?

Thanks very much

1 Reply

@josh1,

Part 2: I'm not sure what you'd be permuting and why, but you're getting into some statistical stuff that I'm not well versed in, so it probably doesn't matter.

Parts 1,3: Since I can't convey tone, let me say that this is intended to be constructive. It seems to me that any deviation from the expected group represents a bias, and the magnitude of that bias can be somewhat quantified by something like chi squared. However, the results don't always line up with some measure that you have in mind (and these are with groups with the same number of members - it's bound to get more troublesome to come up with a consistent measure when you compare groups of different sizes or groups from different populations). I'm curious what the justification is for wanting to reorder some of the results. It seems so complicate the problem, so if it's valid, we need to figure out a way to capture it in a formula. If it's not valid, then I think things get easier.

1 Reply

@markr,

part 2: OK fair enough, maybe ignore the permutation part , thats just added complications. I suppose the real question is simply: when given lots of groups of individuals (just like the canteen example, where some could appear many times), what would you think is the best way to check if, in general, these groups show assortment (i.e. males with males, females with females)?

Part 1 and 3: Yes very constructive and see your point. The justification for wanting to reorder some of the results is so that it would make sense practically, rather than simply statistically. For example, the main problem with chi is it logic that a 1m 1f group could be scored higher 'assortment' than 2m 0F once the probability of male is high (i think over 0.66), when in fact a group of 2m 0F is actually showing more assortment (i.e. propensity) than the perfectly balanced group of 1M 1F, because, taking the college canteen as an e.g., I dont think group of 1M 1F (no matter how rare females are) could ever indicate a propensity for males to be with males and feamels to be with females, would you agree?
Similarly, with high prop of males, a group of 2m 0F could score lower assortment than if a females was added to the group (2M 1F), and this also doesnt make sense to me

1 Reply

@josh1,

I agree.

Here are a some thoughts/questions.

1) Perhaps the makeup of the population isn't very relevant unless the proportion is quite extreme. If there is a propensity for people to visit the canteen in groups of opposite sex pairs, should a more balanced score be assigned to a 1:1 population vs. a 2:1 population, or should they be considered equally balanced?

2) If the makeup of the population is relevant, when we talk about population, are we talking about the total college population, or the canteen-visiting population? Seems to me it should be the latter since we don't know how the non-canteen-visiting members of the population fraternize.

3) If the makeup of the population is relevant, then perhaps a signed score is needed so that neither balanced nor assorted are bounded. It could work something like this where ratios are always M:F:
- The expected outcome (group ratio matches population ratio) is assigned zero.
- As the group ratio grows (exceeds the population ratio), the score grows positively (more assorted).
- As the group ratio shrinks (moves toward 1:1), the score grows negatively (more balanced).
- When the group ratio shrinks below 1:1, the score turns back toward zero.
- Now the tricky part. At what point does the score cross zero and become positive to indicate assortedness instead of balance? I see this as an issue whether or not the score is signed.
- For what it's worth, with a 1:1 population, there wouldn't be any negative scores.

4) If the makeup of the population is not relevant, then maybe the score should be something like sqrt(M-W+1)/N. This one isn't quite right at both ends of the range.

2 Replies

@markr,

Something like this might be better:
(max(M,F)-min(M,F)+1)/(N+2)
This will never reach zero or one.

0 Replies

@markr,

OK great
1) I have also questioned this, but now believe it is relevent for this question (i.e. when assortment refers to considering whether there is an underlying propensity) as if none of this propensity existed, and groups were truely just random subsets of a population, it shouldn't be the case the groups from a 1:1 population would show more 'balance' (i.e. propensity for opposite sex associations) than groups from a 2:1 population. Nor should random groups from a 2:1 population be classed as demonstrating a propensity for 'same sex' assocation than random groups from a 1:1 population.

2) Yes i agree, In this case, we would be talking about the canteen visiting population. Ideally I should have thought up a better example where the sampling method is entirely unbiased

3) Yes, I agree with all this logic, and I am looking into different ways of doing it. One thing I am still undecided about is, the final point.
"- For what it's worth, with a 1:1 population, there wouldn't be any negative scores."
As prehaps this should be the case, as a balanced group would always be the most likely (or 'expected') group. However, on the other hand, a perfect group of 20M 20F would still actually be much less likely to occur than 1M 1F by chance alone (i.e. within an underlying propensity for opposite sex assocaitions resulting in driving balance), and therefore prehaps, although most expected, still shows balance. What would you think?

4) I think the new formula in your post below is well put together and should work well if the population wasnt relevant, or prehaps in 1:1 population

Thanks again

1 Reply

@josh1,

3) I don't disagree. It just complicates things...

0 Replies

Group Assortment measure

Related Topics

Quick Links

My Account

able2know