Reply
Sun 8 Apr, 2012 07:12 am
I am stuck with a probability problem. The problem that I am facing is equivalent to the following problem:
Given a box containing 4 balls labelled A, B, C & D. We select a ball, record its label and put it back (i.e. selection with replacement). I want to calculate the following:
a) What is p3, the probability of selecting exactly 3 different labels (e.g. ABC, ACD) for n selections.
b) What is p4, the probability of selecting exactly 4 different labels (e.g. ABCD) for n selections.
For example, for n = 7, if the selection is BBBACAA this has 3 labels; hence it is treated the same as ABCCCCCC or AAAAABC in determining p3.
I am trying to find a general formula that will work for any value of n. The problem seems "simple" and it seems that obtaining a general equation should be possible; However, I have not been able to find it!
For p1, p2 the solution is simple:
NS = 4^n
p1 = combine(4,1) / NS
p2 = combine(4,2)(2^n-2) / NS
I have found the equation for p3 and p4 for n = 0 ... 6, but I can't find a general solution.
consequently, I will need to find a more general equation that will work for M labels (instead of just 4).
P.S. I have found the answers numerically by programming so I know what the answers should be
Any help is appreciated
@MrAdnan,
Interesting problem.
It's not a closed form, but I think a general solution is:
m = number of labels to choose from
k = desired number of unique labels
n = number of selections
F(m,k,n) = C(m,k)*[C(k,k)*k^n - SUM(i=1 to k-1, C(k,i)*i^n)] / m^n
@markr,
Ignore the previous post.
@MrAdnan,
P(m,k,n) = C(m,k)*k!*S2{n,k} / m^n
where:
C(m,k) is the number of combinations of m items taken k at a time.
S2{n,k} is the number of ways to partition a set of n objects into k non-empty subsets.
S2 is a Stirling number of the second kind.
See here for Stirling numbers of the second kind:
http://en.wikipedia.org/wiki/Stirling_numbers_of_the_second_kind
@markr,
Yes! Thank you.
I've checked it for the values I have for m = 4 and it agrees with them.
I have not checked it for values m > 4, but values of p(m,m,n) -> 1 as n -> infity which are correct. So I'm pretty sure that's its correct.
... and I thought the formula would be something simpler, for what seems to be a simple problem
Thanks again.