At any point while processing the stream, the in-memory state is a partial histogram with at most $k = 1/\epsilon$ elements (class/count pairs). When processing an element $x_i$, we use the following procedure with three cases:
- $x_i$ is already in the histogram: increment its count;
- we have fewer than $k$ elements in the histogram: add $x_i$ with count $1$;
- the histogram already has $k$ other elements: while the histogram has $k$ elements, decrease the count of each element by $1$ and evict any elements with zero count, and then add $x_i$ with count $1$.
That's the entire algorithm, and it's easy to prove that it achieves the desired approximation as well. The only case in which we "lose" information is case 3, where the count of each of $k$ elements is decreased by $1$. But we can clearly decrease counts at most $n/k = \epsilon n$ times, since that would necessarily reduce all counts to zero. Thus, using the partial histogram and outputting the frequency of any class not in the histogram as zero, we are guaranteed that each count is within $\epsilon n$ of the actual count. In particular, this means that the algorithm will be guaranteed to output a positive count for any class that occurs with frequency $\epsilon n$ or more in the stream.
I've always found this problem to be particularly beautiful because of the simplicity of the result, the algorithm, as well as the proof. Moreover, it's a very practical problem with many applications. To make things even more interesting, a recent paper shows how this concept can be generalized to do matrix sketching (i.e. approximating a matrix with a smaller one). I hope to get around to writing about that at some point as well.
I've always found this problem to be particularly beautiful because of the simplicity of the result, the algorithm, as well as the proof. Moreover, it's a very practical problem with many applications. To make things even more interesting, a recent paper shows how this concept can be generalized to do matrix sketching (i.e. approximating a matrix with a smaller one). I hope to get around to writing about that at some point as well.