In my perspective, I’m agree with above explanation, because [12×12]x3 →[(5x5x1x1)] → [8×8]x3, and…

Source: Deep Learning on Medium

In my perspective, I’m agree with above explanation, because [12×12]x3 →[(5x5x1x1)] → [8×8]x3, and 8x8x3 → (1x1x3x256) → 8x8x256. So, if gathering to the one operation, it become the same as above, except 12x12x256 should be 8x8x256.