The CMB can be represented in principle as a map $T:[0,\pi]\times[0,2\pi]\to\mathbb R^+$ that for each point of the sky, specified by angles $(\theta,\varphi)$, assigns a positive real number, the temperature ($T$). Here I describe how to represent the temperature field $T(\theta,\varphi)$ as a linear combination of spherical harmonics, which provide a convenient basis to represent arbitrary functions defined on a sphere. Nevertheless, to build up some intuition, I first introduce the one-dimensional equivalent of this, in which we represent an arbitrary real function $f(x)$ as a linear combination of sines and cosines or complex exponentials, which provide a convenient basis (the Fourier basis) to represent arbitrary functions defined on an interval of the reals.

1 - Fourier representation of real functions

Physicists are used to vectors and vector spaces. We often write things like a force vector as $\mathbf F=F_x \mathbf e_x+F_y \mathbf e_y+F_z \mathbf e_z$ or, in a more compact form, $\mathbf F=\sum_iF_i\mathbf e_i$. This allows to represent the same concept (force in this case) in two ways: by considering the full vector $\mathbf F$ or by using its components $(F_x,F_y,F_z)$ in this specific $\{\mathbf e_i\}$.

What usually takes more time to get introduced to is the use of the same ides (vectors, spaces, bases, components) when one deals not with vectors on a space of numbers, like $\mathbb R^3$, but a space of functions. If one defines a set of functions for which the defining properties of a vector space applies, then the same formalism is applicable.

Just like in plain vector spaces, if one has a vector space $E$ over a field $\mathbb K$ (which will be $\mathbb R$ or $\mathbb C$ for us) one can define a scalar product as a map $$\left\lt\cdot\,,\cdot\right\gt:E\times E\to\mathbb K$$ that respects a series of properties. Moreover, one can define a norm as a map $$||\cdot||:E\to\mathbb R$$ also subject to a series of properties. In particular, it is possible to use the definition of the scalar product to induce a norm, such that $$||f||=\sqrt{\left\lt f,f\right\gt},\quad \forall f\in E.$$

To make this more specific to our current applications, let's choose a space: the space of square-integrable functions in a finite interval of the reals $D=[a,b]$. We call our space $\mathcal L^2_D$, which has a formal definition as $$f\in \mathcal L^2_D\,\iff\int_a^b|f(x)|^2\mathrm dx<\infty.$$

In this space, it is common to introduce a scalar product as $$\left\lt f,g\right\gt=\int_a^b \overline{f(x)}g(x)\mathrm dx$$ if, in general, $\mathbb K=\mathbb C$. The norm induced by this scalar product is simply $$||f||=\left\lt f,f\right\gt=\int_a^b\overline{f(x)} f(x)\mathrm dx$$ which, by definition of $\mathcal L^2_D$ is such that $||f||<\infty\,\forall f\in L^2_D$.

Now, just like in any other vector space, we can talk about the choice of a basis. As usual, there are multiple bases one may choose. The preference for one or other basis depends only on the application we are interested in, the type of function, what makes things simpler, etc.

The specification of a scalar product allow us to define the concepts of orthogonality and orthonormality. A (finite or infinite) family $S\subset E$ vectors $s_i\in E$ is orthogonal if $i\neq j\to\left\lt s_i,s_j\right\gt=0$. It is orthonormal if, besides that, $||s_i||=1\,\forall s_i\in S$. Orthonormality can be compactly expressed using Kronecker's delta as $$\left\lt s_i,s_j\right\gt=\delta_{ij}.$$ If the set $S$ is orthonormal, it can be proven to be linearly independent. That is, $$\sum_i\alpha_i s_i=0\to\alpha_i=0.$$

The linear span of $S$ is the set of all possible linear combinations of the vectors $s_i\in S$. In general, $\mathrm{span}(S)\subseteq E$. If the equality is true (that is, $\mathrm{span}(S)= E$), then every element of $E$ can be written as a linear combination of the elements in $S$. Then, $S$ would be a basis for $E$. There are a series of conditions that $S$, $E$ and $\left\lt\cdot\,,\cdot\right\gt$ must satisfy for this to be true. I will not write or expand on these properties here, but just briefly summarize them.

One needs to introduce several notions of convergence that are dependent on the choice of norm. This allows to define a Cauchy series, a Banach space and a Hilbert space. Then, using the generalized Pythagorean theorem (at least according to my undergrad notes) one can define for any $f\in E$ a series given by $$\sum_i\left\lt s_i,f\right\gt\,s_i $$ as the generalized Fourier series of $f$, where $\alpha_i=\left\lt s_i,f\right\gt$ are the Fourier coefficients. Finally, $S$ is said to be complete if $$f=\sum_i\left\lt s_i,f\right\gt\, s_i,$$ that is, if the function $f\in E$ is equal to (in the sense that it converges to) its generalized Fourier expansion.

This whole mess means the following: if you identify a set $S$ in $E$, given a definition of a scalar product and a norm, that is complete, then $S$ is a basis for $E$. With this, and the proper definitions of convergence, one can prove that the following set is a basis for the space $\mathcal L_D^2$: $$S=\left\{ \frac{1}{\sqrt{L}}, \sqrt{\frac{2}{L}}\cos\left(\frac{2\pi x}{L}\right), \sqrt{\frac{2}{L}}\sin\left(\frac{2\pi x}{L}\right), \sqrt{\frac{2}{L}}\cos\left(2\frac{2\pi x}{L}\right), \sqrt{\frac{2}{L}}\sin\left(2\frac{2\pi x}{L}\right),\dots \right\},$$ where $L=b-a$. Using Euler's formula, we can obtain a similar complete set, bus using exponentials instead. As usual, exponentials are easier to deal with than sines and cosines, but on the other hand, one hast to be more careful with reals and complex numbers. For instance, with the sine/cosine basis, if a function $f$ is real then all coefficients of its expansion in terms of this basis will be real as well. In the exponential basis, even if the function is real, the coefficients will be complex. All in all, still, the exponential basis functions are way simpler to deal with. Then, we consider the complete orthonormal set $$S=\{\varphi_n(x)\},\quad \varphi_n(x)=\frac{1}{\sqrt L}\exp\left(i\frac{2n\pi x}{L}\right),\quad n\in\mathbb Z.$$ With this, one can expand any function $f\in L^2_D$ as a linear combination of the elements in $S$ as $$f(x)=\sum_{n=-\infty}^\infty \alpha_n\varphi_n(x),$$ where $$\alpha_n=\left\lt \varphi_n,f\right\gt=\int_a^b \overline{\varphi_n(x)}f(x)\,\mathrm dx.$$ This is called the Fourier exponential series representation of $f$.

A property of the Fourier series expansion of a function in a finite interval is that the function itself can be evaluated without any issues outside the initial interval $[a,b]$. Due to the periodic properties of the basis functions, the value of the function beyond these limits will be equal to the function inside the limits, in a very particular way. All basis functions $\varphi_n$ are periodic with period $L$, that is, $\varphi_n(x\pm L) =\varphi_n(x)\ \forall x\in\mathbb R$. Then, in the interval $[a+L,b+L]$ there will be an exact copy of the function $f$ in $[a,b]$. This is, in some way, equivalent to having a compact dimension, like a segment that is bent around a circle and its end (at $b$) is connected to its start (at $a$). Since the analysis of CMB shares this compactness property, as the CMB is defined on a sphere, we will make use of this property. In what follows, you can think $x$ as a length along the circunference of a circle.

1.2 - The variation scales of a function

The Fourier expansion of $f$ presented in the previous section allows us to decompose a function into individual pieces that can be separately understood. The term $\varphi_0$ represents the constant part of the function. The rest of the terms represent changing terms. The value of $n$ measures how quickly the function changes with $x$. For instance, the term $n=\pm1$ are such that the function has one full oscillation between $a$ abd $b$. Those with $n=\pm2$ will have two. In general, the $\varphi_{\pm n}$ for $n\in\mathbb N$ will have $n$ full oscillations.

In order to describe the oscillations, one can also use the length scale for them (rather than how many of them there are). This can be done with the wavelength, which is given by $\lambda_n=L/n$, or with the wavenumber, which we define here as $q_n=2\pi/\lambda_n=2\pi n/L$. We can both refer to the scale of the oscillations by their wavelength or their wavenumber. Any function $f$ may oscillate at a combination of different scales, and an understanding of how these different scales contribute to the overall behavior of the function passes by specifying all coefficients $\alpha_n$ as defined above.

A simple way to think of this is to take $f$ to represent the elevation of the water surface at a fixed instant of time. The water level has changes at very different scales:

the Earth is not a perfect sphere, and that produces different gravitational forces at different locations on Earth. This may affect the water level over large distances, as the graviitational field will only be significantly different in places far away.
The water level may change due to plain waves, those that arrive to the shore. These occur at scales much smaller than the ones described above.
If you look closely at the surface of the water, it is not perfectly smooth. Wind and movement give the surface of the water shapes that are distinguishable over the scale of a few centimeters, which is much smaller than the other two scales.

If we were to represent the water level in terms of the Fourier series, we should observe that the coefficients $\alpha_n$ would become larger when the corresponding wavelength $\lambda_n$ is of the order of the three scales described above.

2 - Two dimensional case: application to the CMB

2.1 - Spherical harmonics representation of real functions

The terminology introduced above in section 1.1 can be directly translated to this section. Our space of functions is still $\mathcal L^2_D$, except that now $D$ is not a finite interval of the reals $[a,b]$, but the two sphere $\mathcal S^2$. We describe points in $\mathcal S^2$ by two angles $\theta$ and $\phi$, where $\theta\in[0,\pi]$ and $\phi\in[0,2\pi]$. Integrals over the sphere use the integration measure $\mathrm d\Omega=\sin\theta\,\mathrm d\theta\,\mathrm d\phi$. This allows us to define the scalar product between two functions as $$\left\lt f,g\right\gt=\int_{\mathcal S^2} \overline{f(\theta,\phi)}g(\theta,\phi)\,\mathrm d\Omega.$$

Just like before, we now have to make a choice of a complete set of function in $\mathcal L^2_{\mathcal S^2}$, such that any function in that space can be expanded as a linear combination of those functions. A frequent choice (which can be justified, although I omit that here) are the spherical harmonics. These are functions defined in terms of the associated Legendre polynomials, so I proceed to define these first. These polynomials are solutions to the general Legendre differential equation. They are labeled by two integers $\ell$ and $m$. They do not form a complete and orthonormal set for all values of $\ell$ and $m$, but some subsets of them do. They are referred to as $P_\ell^m(x)$. Although we are not bothering to define them explicitly, as it's rarely needed, it is important to know that they are real functions.

The spherical harmonics combine Legendre's associated polynomials with additional factors, and a change of variable $x=\cos\theta$, such that $$Y_\ell^m(\theta,\phi)=\sqrt{\frac{2\ell+1}{4\pi}\frac{(\ell-m)!}{(\ell+m)!}}P_\ell^m(\cos\theta)e^{i m\phi}.$$ These functions can be evaluated for any $\ell\in\mathbb N$ (including $\ell=0$) and any $m\in\mathbb Z$ such that $|m|\leq\ell$. Despite the $P_\ell^m$ not being so, the spherical harmonics $Y_\ell^m$ form an orthonormal set of functions in $\mathcal L^2_{\mathcal S^2}$, in the sense that $$\left\lt Y_\ell^m,Y_{\ell'}^{m'}\right\gt=\delta_{\ell\ell'}\delta_{mm'}$$ with the scalar product introduced above. Moreover, this set is complete, in the sense that any function $f\in\mathcal L^2_{\mathcal S^2}$ can be written as $$f(\theta,\phi)=\sum_{\ell=0}^\infty\sum_{m=-\ell}^\ell a_{\ell m}Y_\ell^m(\theta,\phi),$$ and the $a_{\ell m}$ coefficients are given by $$a_{\ell m}=\left\lt Y_\ell^m,f\right\gt.$$

The CMB temperature

Consider now, finally, that the function we want to represent is the CMB temperature as seen from Earth, our observation point. Of course, for this to be possible, there must be such a thing as a temperature. Electromagnetic radiation doesn't, in general, have a temperature. It has a frequency, a direction of propagation, and a direction of polarization. The polarization of the CMB is a quite important aspect of it, but we will ignore it in this first approach. The direction of the light is described by the direction of the sky from which it arrives, specified by $\theta$ and $\phi$. From a given direction of the sky, we don't receive monochromatic light (that is, light of a fixed frequency), but electromagnetic radiation that spans across a spectrum of frequencies, with different intensities at different frequencies. When electromagnetic radiation is produced as a result of some system being in thermal equilibrium, the radiation spectrum has a very simple form, that doesn't depend on the type of object that the radiation comes from; it's called blackbody radiation, and it's described by a Planck's law as the energy per unit time, frequency, area and solid angle $$B(\nu,T)=\frac{2h}{c^2}\frac{\nu^3}{e^{h\nu/kT}-1}.$$

If the spectrum of electromagnetic radiation arriving from a specific direction has the frequency dependence captured by Planck's law, then it can be said to have a temperature. That temperature can be obtained by finding the value of $T$ in Planck's law that best fits the data. Through this process, one can consider the temperature field as a map $T:\mathcal S^2\to\mathbb R^+$. As such, in can be expanded as a linear combination of the spherical harmonics as $$T(\theta,\phi)=\sum_{\ell=0}^\infty\sum_{m=-\ell}^\ell a_{\ell m}Y_\ell^m(\theta,\phi).$$

Consider the first term of the series, with $(\ell,m)=(0,0)$. The corresponding spherical harmonic is a constant $Y_0^0=1/\sqrt{4\pi}$. Therefore, it corresponds to a constant value of the temperature across the sky. It is easy to see that it is related to the average temperature across the sky, since $$T_0=\left\lt T\right\gt=\int_{\mathcal S^2}T(\theta,\phi)\,\mathrm d\Omega =4\pi\, a_{00}Y_0^0=\sqrt{4\pi}\,a_{00},$$ because all other spherical harmonics have an average of zero across the sky. This allows to define the temperature anisotropy as $$\Delta T(\theta,\phi)=T(\theta,\phi)-T_0=\sum_{\ell=1}^\infty\sum_{m=-\ell}^\ell a_{\ell m} Y_\ell^m(\theta,\phi).$$