Monday, 3 December 2012

combinatorics - How many strings contain every letter of the alphabet?



Given an alphabet of size $n$, how many strings of length $c$ contain every single letter of the alphabet at least once?



I first attempted to use a recurrence relation to work it out:




$$
T(c) = \left\{ \begin{array}{cr}
0 &\mbox{ if $cn! &\mbox{ if $c = n$} \\
T(c-1) \cdot n \cdot c &\mbox{ if $c > n$}
\end{array} \right.
$$



As there's no strings that contain every letter if c < n, and if c = n then it's just all permutations. When c > n you can take any string of size (c-1) that contains all letters (of which there are $T(c-1)$ to choose from), you choose which letter to add (of which there are $n$ choices) and there are $c$ different positions to put it. However, this gives out results that are larger than $n^c$ (the total number of strings), so it can't be right, and I realised it was because you could count some strings multiple times, as you can make them taking different inserting steps.




Then I thought about being simpler: you choose n positions in the string, put each letter of the alphabet in one of those positions, then let the rest of the string be anything:



$$
{c\choose{n}} \cdot n! \cdot n^{c-n}
$$



But again this counts strings multiple times.



I've also considered using multinomial coefficients, but as we don't know how many times each letter appears in the string it seems unlikely they would be much help. I've also tried several other methods, some complicated and some simple, but none of them seem to work.




How would you go about working out a formula for this? I'm sure there's something simple that I'm missing.


Answer



Let $W(c,n)$ denote the number of words of length $c$ from an alphabet of $n$ letters. Then $W(c,n)=n^c$.



Out of these, the number of words of the same size that do not contain one of the letters is $W(c,n-1)=(n-1)^c$. The number of ways of choosing which letter is missing is $\binom{n}{1}$.



The number of words of the same size that do not contain two letters is $W(c,n-2)=(n-2)^c$. The number of ways of choosing which two letters are missing is $\binom{n}{2}$... and so on ...



Now we use inclusion-exclusion principle: (subtract the number of words missing one of the letters, then add the number missing two of the letters, add the number missing three of the letters,...)




We get:



$$W(c,n)-\binom{n}{1}W(c,n-1)+\binom{n}{2}W(c,n-2)-\binom{n}{3}W(c,n-3)+...+(-1)^{n-1}\binom{n}{n-1}W(c,n-(n-1)).$$



This is



$$n^c-\binom{n}{1}(n-1)^c+\binom{n}{2}(n-2)^c+\binom{n}{3}(n-3)^c+...+(-1)^{n-1}\binom{n}{n-1}1^c.$$



or





$$\sum_{k=0}^{n-1}(-1)^k\binom{n}{k}(n-k)^c.$$




Another way could be: Denote $S_c^n$ the number of ways to partition the word of length $c$ into $n$ pieces. Then we just need to choose which letter goes to each of the $n$ pieces. This number is $n!$. So the number of words we are looking for is




$$n!S_c^n.$$





The numbers $S_c^n$ are called Stirling's numbers of the second kind.


No comments:

Post a Comment

real analysis - How to find $lim_{hrightarrow 0}frac{sin(ha)}{h}$

How to find $\lim_{h\rightarrow 0}\frac{\sin(ha)}{h}$ without lhopital rule? I know when I use lhopital I easy get $$ \lim_{h\rightarrow 0}...