# Part 4

As a helpful review, here are a variety of problem and solutions to exercises from an introductory functional analysis class.  This is the second installment in a series of functional analysis exercises (Part 1, Part 2, Part 3, Part 4)

Problem 1

If $\mathscr{X}$ is a reflexive Banach space, show that $\textrm{Ball}(\mathscr{X})$ is weakly compact.

Solution

We will prove a stronger version of the above statement: A Banach space $\mathscr{X}$ is reflexive if and only if $\textrm{Ball}(\mathscr{X})$ is weakly compact.

Forward Direction

Let us first recall the general statement of the Banach-Alaoglu theorem:

Theorem (Banach-Alaoglu)

Let $X$ be a normed space. Then $\textrm{Ball}(\mathscr{X}^*)$ is weak-* compact.

Let $\mathscr{X}$ be a reflexive Banach space. Observe that $\mathscr{X}^*$ inherits the operator norm from its predual and is, correspondingly, a normed space. Thus, it follows that $\textrm{Ball}(\mathscr{X}^{**})$ is weak-* compact.

However, note that this weak-* topology is on $\mathscr{X}^{**}$ instead of the usual $\mathscr{X}^*$, and is correspondingly generated by the seminorms $p_\lambda(\tau) := |\tau(\lambda)|$ for all $\lambda \in \mathscr{X}^*$. But, observe that $\mathscr{X}$ is reflexive, which is to say that all $\tau \in \mathscr{X}^{**}$ are such that $\tau = \hat{x} \in \mathscr{X}^{**}$, where $\hat{x}$ is the canonical embedding of some element of $\mathscr{X}$ into its second dual. Thus, our previous statement then gives that $\textrm{Ball}(\mathscr{X}^{**})$ is weak-* compact in the topology generated by the seminorms $p_\lambda(\hat{x}) = |\hat{x}(\lambda) | = |\lambda(x)|$ for all $\lambda \in \mathscr{X}^*$. By reflexivity, this then gives that $\textrm{Ball}(\mathscr{X})$ is weakly compact as well.

Reverse Direction

Let $\mathscr{X}$ be a Banach space and $\textrm{Ball}(\mathscr{X})$ be weakly compact. Let us first recall a corollary of the Banach-Alaoglu theorem:

Corollary (Banach-Alaoglu)

If $X$ is a normed space, then $\textrm{Ball}(X)$ is weak-* dense in $\textrm{Ball}(X^{**})$.

Thus, the weak-* closure of the canonical embedding of the weakly compact set $\textrm{Ball}(\mathscr{X})$ into $\mathscr{X}^{**}$ is simply $\textrm{Ball}(\mathscr{X}^{**})$. However, observe that a weakly compact set is weak-* closed, and as the weak closure coincides with norm closure for convex sets, it then follows that the canonical embedding of the ball of $\mathscr{X}$ is precisely the ball of $\mathscr{X}^{**}$, which is to say that $\mathscr{X}$ is reflexive. $\blacksquare$

Problem 2

If $\mathscr{X}$ is a reflexive Banach space, show that every bounded linear operator from $\mathscr{X}$ into $\mathscr{X}$ is weakly compact.

Solution

Let $\mathscr{X}$ be a reflexive Banach space. We seek to show that every bounded linear operator from $\mathscr{X}$ into $\mathscr{X}$ is weakly compact. To do so, let us first note the theorem proved in problem $1$:

Theorem

A Banach space $\mathscr{X}$ is reflexive if and only if $\textrm{Ball}(\mathscr{X})$ is weakly compact.

Thus, we need only recall that a bounded operator on a Banach space is correspondingly continuous to conclude our proof, as the image of a compact set under a continuous function is, itself, compact. Thus, whenever $\mathscr{X}$ is a reflexive Banach space, it follows that every $T \in \mathscr{B}(\mathscr{X})$ is weakly compact. $\blacksquare$

Problem 3

Let $\left \{u_1, u_2, \hdots \right \}$ be a sequence of pairwise orthogonal unit vectors in a Hilbert space $\mathscr{H}$. Let $K$ consist of the vectors $0$ and $n^{-1}u_n$ for $n \geq 1$. Show that:

1. $K$ is compact,
2. $co(K)$ is bounded,
3. $co(K)$ is not closed,
4. Find all the extreme points of $\overline{co}(K)$.

Solution

Let $\left \{u_1, u_2, \hdots \right \}$ be a sequence of pairwise orthogonal unit vectors in a Hilbert space $\mathscr{H}$. Let $K$ consist of the vectors $0$ and $n^{-1}u_n$ for $n \geq 1$.

Solution (1)

We first seek to show that $K$ is compact. To do so, observe first that the sequence \newline $(n^{-1}u_n) \rightarrow 0$, as $\| n^{-1}u_n \| = n^{-1} \rightarrow 0$. Following from this, it then holds that every subsequence of $(n^{-1}u_n)$ also converges to zero. Thus, as sequential compactness is equivalent to compactness in a metric space, $K$ is indeed compact. $\blacksquare$

Solution (2)

We now seek to show that $co(K)$ is bounded. That is, we seek to show that for any neighborhood $V$ of zero, there exists $t \in \mathbb{Z}$ such that $co(K) \subset tV$. Observe that, by the linearity of our space, it suffices to show that $co(K)$ is contained within a dilation of the open unit ball, denoted here by $\mathbb{B}$. Observe

$sup{\| x \| : x \in co(K)} = \sup{\|n^{-1}u_n \|: n\geq 1 } = 1.$

Clearly, this supremum is attained by $u_1$. Thus, $co(K) \subset t \mathbb{B}$ for any $t > 1$, and $co(K)$ is correspondingly bounded. $\blacksquare$

Solution (3)

Next, we seek to show that $co(K)$ is not closed. To do so, let us set $x_1 = \frac{1}{2}\left (u_1+\frac{1}{2}u_2\right )$, $x_2 = \frac{1}{2}\left (x_1+\frac{1}{3}u_3\right )$, and so on, reaching the general form $x_n = \frac{1}{2} \left (x_{n-1}+\frac{1}{n+1}u_{n+1} \right ) = \sum_{i=1}^{n+1} \frac{1}{i 2^i} u_i$. Observe that $x_n \in co(K)$ for all $n \geq 1$. However, note that $x_n \rightarrow x=\sum_{i=1}^\infty \frac{1}{i 2^i} u_i$ and $x \notin co(K)$, as $x$ is not a finite convex combination. Thus, we have constructed a sequence in $co(K)$ which does not converge to an element of $co(K)$, and $co(K)$ is correspondingly not closed. $\blacksquare$

Solution (4)

Finally, we seek to calculate $Ext(\overline{co}(K))$. To do so, let us first recall Milman’s theorem:

Theorem (Milman)

Let $X$ be a locally convex space, $D$ a compact, convex set in $X$, and $F \subseteq D$ such that $\overline{co}(F)=D$. Then, $Ext(D) \subset F$.

To use this result, we first need a lemma:

Lemma

Given $K$ as above, $\overline{co}(K)$ is compact.

Proof

Observe that the operator $T: \mathscr{H} \rightarrow \mathscr{H}$ defined by

$T = \sum_{n\geq 1} \frac{1}{n} \, P_{\textrm{span}\left \{u_n \right \}}$

is bounded (and thus continuous) by part (b). By problem 5 in A variety of Functional Analysis Exercises: Part 2, this operator is also compact, as $\|T(u_n)\| \rightarrow 0$. Moreover, observe $\overline{co}(K) \subset \overline{T(\textrm{Ball}(\mathscr{H}))}$. As $\overline{co}(K)$ is a closed subset of a compact set, it is correspondingly compact as well. $\blacksquare$

Now, note that $\mathscr{H}$ is a locally convex space, $\overline{co}(K)$ is a compact, convex set in $\mathscr{H}$, and $K \subseteq \overline{co}(K)$. Clearly, $\overline{co}(K)=\overline{co}(K)$, so it follows that $Ext(\overline{co}(K)) \subset K$.

For the reverse inclusion, observe that the nonzero elements of $K$ are linearly independent, and as convex combinations are a subset of linear combinations, the nonzero elements of $K$ must be in $Ext(\overline{co}(K))$. Moreover, observe that convex combinations are indeed linear combinations with strictly positive coefficients, and as $K$ consists of positive real scalings of orthogonal vectors, zero must be in $Ext(\overline{co}(K))$ as well. Thus, $K\subset Ext(\overline{co}(K))$, and we may conclude that $Ext(\overline{co}(K)) = K$. $\blacksquare$

Problem 4

If $(X, \Omega, \mu)$ is a $\sigma$-finite measure space, show that the set of extreme points of $\textrm{Ball}(L^\infty(\mu))$ is $\left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \}$.

Solution

Let $(X, \Omega, \mu)$ be a $\sigma$-finite measure space. We seek to show:

$Ext(\textrm{Ball}(L^\infty(\mu)) = \left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \}.$

To this end, let us first suppose there exists $f \in Ext(\textrm{Ball}(L^\infty(\mu))$ such that $\| f \|_\infty < 1$. Let $\epsilon >0$ be a real constant such that $\| f \|_\infty + \epsilon < 1$. Then, observe that $f+\epsilon \in \textrm{Ball}(L^\infty(\mu)$ and $f-\epsilon \in \textrm{Ball}(L^\infty(\mu)$. But then,

$\frac{1}{2} \left ( (f+\epsilon) + (f - \epsilon) \right ) = \frac{1}{2}(2f) = f.$

So, we may see that all elements of $Ext(\textrm{Ball}(L^\infty(\mu))$ must have norm one.

Now, let us suppose $f \in Ext(\textrm{Ball}(L^\infty(\mu))$ with $\|f \|_\infty =1$, but $|f(x)| \neq 1 a.e. [\mu]$. Let $A = \left \{ x \in X : |f(x)| < 1 \right \}$. Then, observe that $\| \chi_Af \|_\infty < 1$. Let $\epsilon >0$ be a real constant such that $\| \chi_Af \|_\infty + \epsilon < 1$. Then, observe that $\chi_{A^c}+\chi_A(f+\epsilon) \in \textrm{Ball}(L^\infty(\mu)$ and $\chi_{A^c}+\chi_A(f-\epsilon) \in \textrm{Ball}(L^\infty(\mu)$. But then,

$\frac{1}{2} \left ( (\chi_{A^c}+\chi_A(f+\epsilon))+(\chi_{A^c}+\chi_A(f-\epsilon)) \right ) = \frac{1}{2}(2\chi_{A^c}+2\chi_Af) = f.$

So, we may see that $Ext(\textrm{Ball}(L^\infty(\mu)) \subseteq \left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \}.$

To conclude our proof, let us demonstrate the reverse inclusion. To this end, let \newline $f \in \left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \}$. Then, suppose there exist $g,h \in \left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \}$ such that $f=\frac{1}{2}(g+h)$. But, this then yields that $\frac{1}{2}(g(x)+h(x))=f$ or, equivalently, that $g(x)+h(x) = 2f$. But then, as $|g| = | h | = 1 a.e. [\mu]$, it must then be the case that $g=h=f$ and $f \in Ext(\textrm{Ball}(L^\infty(\mu)))$. Thus, $\left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \} \subseteq Ext(\textrm{Ball}(L^\infty(\mu))$, and we may conclude $Ext(\textrm{Ball}(L^\infty(\mu)) = \left \{f \in L^\infty(\mu) : |f(x)| = 1 a.e. [\mu] \right \}$ $\blacksquare$

Problem 5

Show that subspaces of reflexive Banach spaces are reflexive. Also, if $\mathscr{X}$ is reflexive and $\mathcal{M}$ is a subspace of $\mathscr{X}$, then $\mathscr{X}/\mathcal{M}$ is reflexive.

Solution

Let $\mathscr{X}$ be a reflexive Banach space and $\mathcal{M}$ be a subspace of $\mathscr{X}$. We seek to show first that $\mathcal{M}$ is reflexive as well.

To do so, let us first observe the theorem proved in problem $1$:

Theorem

Banach space $\mathscr{X}$ is reflexive if and only if $\textrm{Ball}(\mathscr{X})$ is weakly compact.

Observe then that $\textrm{Ball}(\mathcal{M})$ is a closed subset of $\textrm{Ball}(\mathscr{X})$ (as the closure and weak closure coincide for convex sets in locally convex spaces), which is correspondingly weakly compact by the above result. As a closed subset of a compact set is compact as well, our desired result then follows.

We now seek to show that the quotient $\mathscr{X}/\mathcal{M}$ is reflexive. To do so, recall that $\mathscr{X}/\mathcal{M}$ is indeed a Banach space, and note that the image of a compact set under a continuous function is compact. Recalling that the quotient map is a linear operator with norm $1$ (and thus continuous), note that the image of the unit ball of $\mathscr{X}$ under the quotient map remains weakly compact whenever $\mathscr{X}$ is reflexive (via the previously cited result). Correspondingly, the unit ball of $\mathscr{X}/\mathcal{M}$ is weakly compact and thus reflexive. $\blacksquare$

Problem 6

Theorems characterizing the linear isometries of a space onto itself rely heavily on the Krein-Milman theorem. Here is the statement of the famous Banach-Stone theorem:

Theorem (Banach-Stone)

Let $X$ be a compact Hausdorﬀ space and $A$ a complex linear subalgebra of $C(X)$, the algebra of continuous complex-valued functions on $X$. Assume that $A$ contains the constant function $1$, and suppose that $T$ is a $1-1$ linear map from $A$ onto $A$ which is isometric, i.e. $\|Tf\|_\infty = \|f\|_\infty$. Then $T$ has the form $Tf$ = $\alpha \phi f$ where $\alpha \in A$, $\| \alpha \| = 1$ , $\frac{1}{\alpha} \in A$ and $\phi$ is an algebra automorphism.

Give a proof of the Banach-Stone theorem for $C([0, 1])$.

Solution

Recall that $[0,1]$ is compact Hausdorﬀ space, and let $A$ be a complex linear subalgebra of $C([0,1])$. Assume that $A$ contains the constant function $1$, and suppose that $T$ is a $1-1$ linear map from $A$ onto $A$ which is isometric, i.e. $\|Tf\|_\infty = \|f\|_\infty$. We seek to show that $T$ has the form $Tf$ = $\alpha \phi f$ where $\alpha \in A$, $\| \alpha \| = 1$ , $\frac{1}{\alpha} \in A$ and $\phi$ is an algebra automorphism.

First, let us note that $A$ is an abelian $C^*$-algebra when equipped with conjugation as an involution. By a proof given in Conway, it then follows that the Gelfand transform $A \rightarrow C(\Sigma)$ is an isometric isomorphism. Similarly, recall that $(\Sigma,wk-*)$ is compact and Hausdorff, and note that by theorem III.5.7 of Conway, there exists an isometric isomorphism of $M(\Sigma)$ onto $C(\Sigma)^*$, where $M(\Sigma)$ is the space of all $\mathbb{C}$-valued regular Borel measures on $(\Sigma,wk-*)$ with the total variation norm.

As such, let us consider the map $T^*:M(\Sigma) \rightarrow M(\Sigma)$. Note that $T$ is an isometric isomorphism, and by the properties of the adjoint, recall that $\|T^* \| =\|T\| =1$. Moreover, as $T$ is invertible, observe that $T^*$ must also be invertible, and correspondingly an isometric isomorphism as well. It then follows that $T^*$ is a weak-* homeomorphism of $\textrm{Ball}(M(\Sigma))$ onto $\textrm{Ball}(M(\Sigma))$. Moreover, as $T$ is linear, observe that $T^*$ distributes over convex combinations. Hence, it is clear that:

$T^*(Ext(\textrm{Ball}(M(\Sigma))))=Ext(\textrm{Ball}(M(\Sigma))).$

Now, recall that, by Conway theorem V.8.4, $Ext(\textrm{Ball}(M(\Sigma)))=\left \{\alpha \delta_h : |\alpha|=1, h \in \Sigma \right \}$. So, for each $h \in \Sigma$, there exists unique $\tau(h) \in \Sigma$ and scalar $\gamma(h) \in \mathbb{C}$ such that $|\gamma(h)|=1$ and $T^*(\delta_h)=\gamma(h)\delta_{\tau(h)}$.

We now show that $\gamma:\Sigma \rightarrow \mathbb{C}$ is continuous. Let $\left \{h_i \right \}$ be a net in $\Sigma$ and $h_i \rightarrow h$. Then, $\delta_{h_i} \rightarrow \delta_h$ in the weak-* topology of $M(\Sigma)$. Hence, $\gamma(h_i)\delta_{\tau(h_i)} = T^*(\delta_{h_i}) \rightarrow T^*(\delta_h)=\gamma(h)\delta_h$ in the weka-* topology of $C(\Sigma)$. In particular, $\gamma(h_i) = \langle 1, T^*(\delta_{h_i}) \rangle \rightarrow \langle 1, T^*(\delta_h) \rangle = \gamma(h)$. Thus, $\gamma$ is continuous.

Continuing on, we now demonstrate that $\tau: \Sigma \rightarrow \Sigma$ is a homoeomorphism. To do so, observe that we need only show that $\tau$ is continuous, as a continuous bijection between compact sets is a homeomorphism. But, as $T^*$ and $\gamma$ are continuous and $T^*$ is invertible, the continuity of $\tau$ follows immediately. Thus, $\tau$ is a homeomorphism.

Now, observe that if $f \in C(\Sigma)$ and $h \in \Sigma$, then

$T(f)(y)=\langle Tf,\delta_h \rangle = \langle f,T^*\delta_h \rangle = \langle f, \gamma(h)\delta_{\tau(h)} \rangle = \alpha(h)f(\tau(h)).$

Thus, as the Gelfand transformation $A \rightarrow C(\Sigma)$ is an isometric isomorphism, our desired result then follows. $\blacksquare$

# Part 3

As a helpful review, here are a variety of problem and solutions to exercises from an introductory functional analysis class.  This is the second installment in a series of functional analysis exercises (Part 1, Part 2, Part 3, Part 4)

Problem 1

If $\mathscr{B}$ is a Banach space and $\mathscr{M}$ is a proper closed subspace in $\mathscr{B}$, define the quotient space $\mathscr{B}/\mathscr{M}$ by the set of all cosets as usual, normed as

$\|x\| = \inf{\|x + m\| : m \in \mathscr{M} }.$

Show that $\mathscr{B}/\mathscr{M}$ is a Banach space.

Solution

Let $\mathscr{B}$ be a Banach space and let $\mathscr{B}/\mathscr{M}$ be the canonical quotient with infimum norm ranging over the elements of the coset. To show that $\mathscr{B}/\mathscr{M}$ is also a Banach space, we must demonstrate that $\mathscr{B}/\mathscr{M}$ is complete with respect to the norm. To this end, we restate a portion of a proof of Conway. But first, we need a lemma:

Lemma

A normed space $\mathscr{X}$ is a Banach space if and only if every absolutely convergent series in $\mathscr{X}$ converges in $\mathscr{X}$.

Proof

Forward Direction

Let $\mathscr{X}$ be a Banach space and $\left \{x_n \right \}$ an arbitrary sequence in $\mathscr{X}$ such that $\sum_{k=1}^\infty\| x_n \|$ converges. It then follows that the partial sums of the series are a Cauchy sequence, and by the completeness of $\mathscr{X}$, the series $\sum_{k=1}^\infty x_n$ converges to an element of $\mathscr{X}$.

Reverse Direction

Let $\mathscr{X}$ be a normed space and suppose that every absolutely convergent series in $\mathscr{X}$ converges in $\mathscr{X}$. We must now show that every Cauchy sequence in $\mathscr{X}$ converges. To that end, let $\left \{x_n \right \}$ be an arbitrary Cauchy sequence in $\mathscr{X}$, and let $\left \{x_{n_k}\right \}$ be a subsequence of $\left \{x_n\right \}$ such that $\| x_{n_{k+1}} - x_{n_k} \| < 2^k$. It then follows that $\sum_{k=1}^\infty \| x_{n_{k+1}} - x_{n_k} \|$ converges, and by our assumption, that $\sum_{k=1}^\infty x_{n_{k+1}} - x_{n_k}$ converges as well to some $x \in \mathscr{X}$. Observe that

$\sum_{k=1}^N x_{n_{k+1}} - x_{n_k} = x_{N_{k+1}} - x_{n_1}.$

As this series converges, it then follows that $x_{N_{k+1}} - x_{n_1} \rightarrow x - x_{n_1}$ for some $x \in \mathscr{X}$. Thus, we have shown that $\left \{x_{n_k} \right \}$ is a convergent subsequence of the Cauchy sequence $\left \{x_n \right \}$ and, correspondingly, we may conclude that $\left \{x_n \right \}$ converges in $\mathscr{X}$. Being that $\left \{x_n \right \}$ was arbitrary, it then follows that $\mathscr{X}$ is complete and, thus, a Banach space.
$\blacksquare$

Now, let $\left \{x_n+\mathscr{M} \right \}$ be a Cauchy sequence in $\mathscr{X}/\mathscr{M}$. There is then a subsequence $\left \{x_{n_k}+\mathscr{M} \right \}$ such that

$\| (x_{n_k}+\mathscr{M}) - (x_{n_{k+1}}+\mathscr{M}) \| = \| x_{n_k} - x_{n_{k+1}} + \mathscr{M} \| < 2^{-k}.$

Following this, let $y_1 =0$. Choose $y_2 \in \mathscr{M}$ such that

$\| x_{n_1} - x_{n_2} +y_2 \| \leq \| x_{n_1}-x_{n_2} + \mathscr{M} \| + 2^{-1} < 2\cdot 2^{-1}.$

Similarly, choose $y_3 \in \mathscr{M}$ such that

$\| (x_{n_2} - y_2) - (x_{n_3} +y_3) \| \leq \| x_{n_2}-x_{n_3} + \mathscr{M} \| + 2^{-2} < 2\cdot 2^{-2}.$

Continuing this, we result in a sequence $\left \{y_k \right \} \subseteq \mathscr{M}$ such that

$\| (x_{n_k} + y_k) - (x_{n_{k+1}} + y_{k+1} )\| < 2 \cdot 2^{-k}.$

Thus, it follows that the series

$\sum_{k=1}^\infty \| (x_{n_k} + y_k) - (x_{n_{k+1}} + y_{k+1} ) \|$

converges in $\mathscr{X}$. Moreover, as $\mathscr{X}$ is a Banach space, our previous lemma then gives that

$\sum_{k=1}^\infty (x_{n_k} + y_k) - (x_{n_{k+1}} + y_{k+1} )$

also converges in $\mathscr{X}$ However, observe that this then yields that the partial sums of the series are a Cauchy sequence. As the $N$th partial sum of the series is simply $(x_{n_1} + y_1) - (x_{n_N} + y_N)$, we then have that $\left \{x_{n_k}+y_k \right \}$ is a Cauchy sequence in $\mathscr{X}$ and correspondingly converges.

With this in hand, let us recall that the quotient map is a linear operator with norm $1$ (and is thus continuous). Hence, it follows that the convergent sequence $\left \{x_{n_k}+y_k\right \}$ is mapped onto a convergent sequence in the quotient. But this then gives that the Cauchy sequence $x_n + \mathscr{M}$ has a convergent subsequence, and correspondingly converges. As the sequence $\left \{x_n + \mathscr{M} \right \}$ was arbitrary, we may then conclude that $\mathscr{X}$ is complete and, therefore, a Banach space as well.  $\blacksquare$

Problem 2

Prove that if $T$ is a bounded linear operator on a Hilbert space $\mathscr{H}$ and $T$ commutes with every compact operator on $\mathscr{H}$, then $T$ is a multiple of the identity.

Solution

Let $T$ be a bounced linear operator on a Hilbert space $\mathscr{H}$ and suppose that $T$ commutes with every compact operator on $\mathscr{H}$.

By way of contradiction, let $\lambda_x \in \mathbb{F}$ and suppose that $Tx \neq \lambda_x x$ for some $x \in \mathscr{H}$. Then, let $Tx = y$ for some $y \notin \textrm{span} \left \{x \right \}$. Note that as $T$ commutes with every compact operator, $T$ must certainly commute with the finite rank orthogonal projections. Let $P_{x}$ be the orthogonal projection of $\mathscr{H}$ onto $\textrm{span} \left \{x \right \}$. Then:

$P_x T x = P_x y =0.$

But, observe as well that:

$T P_x x = T x = y.$

As $T$ and $P_x$ commute, it then follows that $y = 0 \in \textrm{span}\left \{x \right \}$, a contradiction. Thus, for some $\lambda_x \in\mathbb{F}$, it must be the case that $T x = \lambda_x x$ for each $x \in \mathscr{H}$.

Now, let $x,y \in \mathscr{H}$ with $y \notin \textrm{span}\left \{x \right \}$, and suppose that $latex Tx = \lambda_x x$ and $T y = \lambda_y y$ with $\lambda_x \neq \lambda_y$.
However, observe that $x+y \in \mathscr{H}$ as well, so it must be the case that there exists $\lambda_{(x+y)} \in \mathbb{F}$ such that:

$T (x+y) = \lambda_{(x+y)} (x+y) = \lambda_{(x+y)}x + \lambda_{(x+y)}y$

But, by the linearity of $T$, this then yields the following contradiction:

$T(x+y)= T(x)+T(y) = \lambda_x x + \lambda_y y \neq \lambda_{(x+y)}x + \lambda_{(x+y)}y = T(x+y).$

Thus, there must exist a single $\lambda \in \mathbb{F}$ such that $T x = \lambda x$ for all $x \in \mathscr{H}$. Correspondingly, as $T$ scales every element of $\mathscr{H}$ by a uniform constant, it then follows that $T = \lambda I$.   $\blacksquare$

Problem 3

Let $\mathscr{X}$ be a normed linear space and let $\mathscr{X}^*$ be its dual with the usual norm

$\| f\| = \sup{ |f(x)| : \| x\| \leq 1}.$

1.  Prove that the mapping $f \mapsto f(x)$ is, for each $x \in \mathscr{X}$, a bounded linear functional on $\mathscr{X}^*$ having norm $\|x\|$.
2. Prove that $\left \{ \| x_n \| \right \}$ is bounded if $\left \{ x_n \right \}$ is a sequence in $\mathscr{X}$ such that $\left \{f(x_n) \right \}$ is bounded for every $f \in \mathscr{X}^*$.

Solution (1)

Let $\mathscr{X}$ be a normed linear space and let $\mathscr{X}^*$ be its dual with the usual norm. For each $x \in \mathscr{X}$, define a corresponding element in the second dual of $\mathscr{X}$ by the map $\hat{x}: \mathscr{X}^* \rightarrow \mathbb{F}$, where $\hat{x}(f) \mapsto f(x)$. We will now verify that each $\hat{x}$ is indeed a bounded linear functional on $\mathscr{X}^*$ having norm $\| x \|$.

To do so, let us observe first that, for arbitrary $f,g \in \mathscr{X}^*$ and $\alpha, \beta \in \mathbb{F}$, the following holds by the linearity of $f,g$:

Thus, $\hat{x}$ is indeed a linear map on $\mathscr{X}^*$.

Now, let us utilize the norm of $\mathscr{X}^*$ in the following fashion:

Thus, $\hat{x}(f) \leq \|f\| \enskip \| x\|$ for all $f \in \mathscr{X}^*$, and $\hat{x}$ is correspondingly bounded for all $x \in \mathscr{X}$.

By the above argument, it then follows that $\hat{x} \in \mathscr{B}(\mathscr{X}^*, \mathbb{F}) = \mathscr{X}^{**}$ for all $x \in \mathscr{X}$$\blacksquare$

Solution (2)

Let $\mathscr{X}$ be a normed linear space and let $\mathscr{X}^*$ be its dual with the usual norm. Further, let $\left \{ x_n \right \}$ be a sequence in $\mathscr{X}$ such that $\left \{f(x_n)\right\}$ is bounded for every $f\in \mathscr{X}^*$. We seek to show that $\left \{ \| x_n \| \right \}$ is also bounded.

To do so, let us utilize a corollary of the Hahn-Banach theorem:

Corollary

Let $\mathscr{X}$ be a normed space and let $x \in \mathscr{X}$. Then, $\| x\| = \sup \left \{|f(x)|:f \in \mathscr{X}^*, \|f\| \leq 1 \right \}$.

Thus, it follows that

$\left \{ \| x_n \| \right \} = \left \{ \enskip \sup{|f(x_n)|:f \in \mathscr{X}^*, \|f\| \leq 1} \enskip \right \}$

Observing that $\left \{x_n \right \}$ is such that $\left \{f(x_n) \right \}$ is bounded for every $f\in \mathscr{X}^*$, it follows that $\left \{f(x_n) \right \}$ is also bounded for every $f \in \mathscr{X}^*$ with $\|f\| \leq 1$. Thus, we may construct the sequence of bounds $\left \{c_n\right\}$ defined by $c_n := \sup \left \{|f(x_n)|:f \in \mathscr{X}^*, \|f\| \leq 1\right\}$. But, by the above corollary, it then follows that $\left \{c_n\right \} = \left \{\|x_n\|\right\}$.

Thus, $\left\{\|x_n\|\right\}$ is bounded whenever $\left\{x_n\right\}$ is a sequence in $\mathscr{X}$ such that $\left\{f(x_n)\right\}$ is bounded for every $f \in \mathscr{X}^*$. $\blacksquare$

Problem 4

Suppose $L^1$ and $L^2$ are the usual Lebesgue spaces on $[0,1]$

1. Show that $A_n := \left \{ f : \int |f|^2 \leq n \right \}$ is closed in $L^1$.
2. Show that $L^2$ is a set of first category in $L^1$.

Clearly, the same proof works for $L^p$ and $L^q$.

Solution (1)

Let $L^1$ be the usual Lebesgue space on $[0,1]$ and define $A_n := \left \{ f : \int |f|^2 \leq n \right \}$. We seek to show that $A_n$ is closed in $L^1$ for all $n$.

To do so, let $\left \{f_n \right \} \subseteq A_n$ be a Cauchy sequence such that $f_n \rightarrow f$. Following from the fact that we may select a subsequence $\left \{f_{n_k} \right \}\subseteq \left \{f_n \right \}$ such that $\|f_{n_k} - f \| \rightarrow 0$ and $\sum_{k=1}^\infty \|f_{n_k} - f \|$ converges, recall that we may then note that $f_{n_k} \rightarrow f$ point-wise almost everywhere. It then follows that $|f_{n_k}|^2$ is a sequence of non-negative measurable functions such that $|f_k|^2 \rightarrow |f|^2$ point-wise almost everywhere, and by Fatou’s lemma, it must hold that

$\int f = \underset{k \rightarrow \infty}{\underline{\textrm{lim}}} \int f_{n_k} \leq n$

Thus, it follows that $f \in A_n$, and $A_n$ is correspondingly closed in $L^1$.   $\blacksquare$

Solution (2)

Let $L^1$, $L^2$ be the usual Lebesgue spaces on $[0,1]$. We seek to show that $L^2$ is a set of first category in $L^1$. We provide two proofs of this fact:
For the first proof, observe that $L^2 = \bigcup_{n=1}^\infty A_n$. Then, following from the fact that the set $A_n$ has been shown to be closed, we must only demonstrate that $A_n$ has empty interior to show that $L^2$ is a countable union of nowhere dense sets in $L^1$ (that is, of first category). To do so, observe that $f(x)=\frac{1}{\sqrt{x}}$ is an element of $L^1$, but $f \notin L^2$. Let $g \in L^2$ be arbitrary, and observe that the sequence $\left \{g+\frac{1}{n}f \right \}$ converges to $g$ in the norm of $L^1$, but no element of the sequence is an element of $L^2$. Thus, $g$ is not in the interior of $L^2$ and, as $g$ was arbitrary, it follows that $A_n$ has no interior points. Therefore, $L^2$ is a set of first category in $L^1$.   $\blacksquare$

For the second proof, recall that $L^2$ is a complete metric space and $L^1$ is certainly a topological vector space.

To begin, let us recall the open mapping theorem:

Theorem (Open Mapping)

Let $X$ be a complete metric space, $Y$ a topological vector space, and $\Lambda: X \rightarrow Y$ a continuous linear map such that $\Lambda(X)$ is of second category in $Y$. Then, $\Lambda(X) = Y$ and $\Lambda$ is an open mapping.

Observe that the contrapositive of the open mapping theorem may be stated as follows:

Theorem (Open Mapping – Contrapositive)

Let $X$ be a complete metric space, $Y$ a topological vector space, and $\Lambda: X \rightarrow Y$ a continuous linear map. Then, if $\Lambda(X) \neq Y$, it follows that $\Lambda(X)$ is of first category in $Y$.

Let $\Lambda: L^2 \rightarrow L^1$ be the inclusion map. As $\Lambda$ acts as the identity on its domain, clearly it is a continuous linear map.

However, observe that $f(x)=\frac{1}{\sqrt{x}}$ is an element of $L^1$, but $f \notin L^2$. From this, we may see that $L^2 \neq L^1$, and $\Lambda(L^2) \neq L^1$. But, by the converse of the open mapping theorem, this then gives that $L^2$ is of first category in $L^1$$\blacksquare$

Problem 5

Prove that, if $\mathscr{X}$ is a normed space and $\mathscr{M}$ is a linear manifold in $\mathscr{X}$, then $\mathscr{M}$ is dense in $\mathscr{X}$ if and only if the only bounded linear functional on $\mathscr{X}$ that annihilates $\mathscr{M}$ is the zero functional.

Solution

Forward Direction

Let $\mathscr{X}$ be a normed space and $\mathscr{M}$ a dense linear manifold in $\mathscr{X}$. We seek to show that the only bounded linear functional on $\mathscr{X}$ that annihilates $\mathscr{M}$ is the zero functional.

To this aim, allow us to recall a corollary of the Hahn-Banach theorem. The proof is restated below for clarity:

Corollary (Hahn-Banach)

If $\mathscr{X}$ is a normed space and $\mathscr{M}$ is a linear manifold in $\mathscr{X}$, then

$\overline{\mathscr{M}} =\bigcap \left \{\ker f: f \in \mathscr{X}^*, \mathscr{M} \subseteq \ker f\right\}$

Proof

Let $N = \bigcap \left \{\ker f: f \in \mathscr{X}^*, \mathscr{M} \subseteq \ker f \right \}$. Observe that $\overline{\mathscr{M}} \subseteq N$, as $\overline{\mathscr{M}} \subseteq \ker f$ for all $f \in \mathscr{X}^*$ such that $\mathscr{M} \subseteq \ker f$ (that is, the kernel of a linear functional is closed). To show that $\mathscr{M} = N$, let us suppose not. Then, there exists $x_0 \in N \setminus \overline{\mathscr{M}}$. But, by a separate corollary of the Hahn-Banach theorem, there exists $f \in \mathscr{X}^*$ such that $f|_\mathscr{M} = 0$ and $f(x_0)=1$. Then, $x+0 \notin \ker f$, so it must be the case that $x_0 \notin N$. Thus, $N \subseteq \overline{\mathscr{M}}$ and $N = \overline{\mathscr{M}}$$\blacksquare$

With this in hand, let us begin by observing that, as $\mathscr{M}$ is dense in $\mathscr{X}$, it then follows that $\overline{\mathscr{M}}=\mathscr{X}$. Thus, we have $\mathscr{X}=\bigcap \left \{\ker f: f \in \mathscr{X}^*, \mathscr{M} \subseteq \ker f\right \}$. But, we may now observe that, for any nonzero linear functional $f \in \mathscr{X}^*$, it must be the case that $\ker f$ is strictly smaller than the entirety of $\mathscr{X}$. Correspondingly, it must be the case that the intersection of the kernel of any collection of linear functionals containing at least one nonzero element must also remain strictly smaller than the entirety of $\mathscr{X}$.

Thus, the only linear functional which contains $\mathscr{M}$ as its kernel must be the zero functional whenever $\mathscr{M}$ is dense in $\mathscr{X}$$\blacksquare$

Reverse Direction

Let $\mathscr{X}$ be a normed space and $\mathscr{M}$ a linear manifold in $\mathscr{X}$. Further, suppose the only linear functional on $\mathscr{X}$ which annihilates $\mathscr{M}$ is the zero functional. By the previous corollary, it then follows that $\overline{\mathscr{M}}=\bigcap \left \{\ker f: f \in \mathscr{X}^*, \mathscr{M} \subseteq \ker f \right \} = \ker 0 = \mathscr{X}$. Thus, $\mathscr{M}$ is dense in $\mathscr{X}$.   $\blacksquare$

Challenge Exercises

The following are challenge exercises left to the reader.  They are, respectively, corollaries of:

1. The Hahn-Banach Theorem
2. The Banach-Steinhaus Theorem
3. The Open Mapping Theorem

Challenge Exercise 1

A Banach-Mazur game, denoted $MB(Y,X,\mathcal{W})$, is a topological combinatorial game defined as follows: Let $Y$ be a topological space and $X$ a fixed subset of $Y$. Define a family of subsets $\mathcal{W}$ of $Y$ such that:

1. $\overset{\circ}{W_i} \neq \emptyset$ for all $W_i \in \mathcal{W}$
2. For all nonempty, open $A \subset Y$, there exists $W_i \in \mathcal{W}$ such that $W_i \subset A$.

Players $P_1$ and $P_2$ alternate movements, selecting a single element of $\mathcal{W}$ in each, to form a sequence $W_0 \subset W_1 \subset \hdots$. The game is won by $P_1$ if and only if $X \bigcap \big ( \bigcap_{n<\omega} W_n \big ) \neq \emptyset$. If this condition is not fulfilled, $P_2$ wins.

Let $\mathscr{Y}$ be a Banach space and $\mathscr{X}$ a linear manifold in $\mathscr{Y}$. Let $\mathcal{W}$ be any collection of subsets of $\mathscr{Y}$ satisfying the properties defined above, and consider the game $MB(\mathscr{Y},\overline{\mathscr{X}},\mathcal{W})$. Given that it is known $P_2$ has a winning strategy if $X$ is of first category in $Y$, prove the following:

1. $P_2$ has a winning strategy if a nonzero linear functional annihilates $\mathscr{X}$.
2. $P_1$ has a winning strategy if the only linear functional which annihilates $\mathscr{X}$ is the zero functional.

Challenge Exercise 2

Let $X$ and $Y$ be topological vector spaces and $\Gamma$ a family of continuous linear maps from $X$ into $Y$. Let $B$ be as above, define $\mathcal{T} := \left \{V \subset X : V \textrm{ is open} \right \}$, and fix $N \subset Y$. Show that we may construct a valid Banach-Mazur game $MB(Y,N,\Lambda(\mathcal{T}))$ for each $\Lambda \in \Gamma$ if $B$ is of second category.

Challenge Exercise 3

Let $\mathscr{X}$ be a Banach space and define for each $x \in \mathscr{X}$ the bounded linear functional $\hat{x}: \mathscr{X}^* \rightarrow \mathbb{F}$ given by $\hat{x}(f)=f(x)$. Define the map $\Lambda: X \rightarrow X^{**}$ by $\Lambda(x)=\hat{x}$. Show that, for any suitable $\mathscr{M}$, $P_2$ has a winning strategy in the Banach-Mazur game $MB(\mathscr{X}^{**},\Lambda(\mathscr{X}),\mathcal{M})$ if and only if $\Lambda$ is not an open map.

# Part 2

As a helpful review, here are a variety of problem and solutions to exercises from an introductory functional analysis class.  This is the second installment in a series of functional analysis exercises (Part 1, Part 2, Part 3, Part 4)

Problem 1

Let $\mathscr{H}=L^2[0,1]$. Show that the multiplication operator by $m$ is an idempotent if and only if $m$ is the characteristic function of a measurable set on $[0,1]$.

Solution

Forward Direction

Let $\mathscr{H}=L^2[0,1]$ and let $m \in L^\infty[0,1]$. Define the map $M_m:L^2[0,1] \rightarrow L^2[0,1]$ by $M_m(f) \mapsto mf$. Suppose $M_m$ is an idempotent. That is, suppose $M_m^2=M_m$. Then, for all $f \in \mathscr{H}$, we have that $M_m^2(f)=M_m(f)$, which is to say that $m^2f-mf=0$. But, if this is the case, then (by the positive definiteness of the inner product) it must hold that $\left \|m^2f-mf \right \|=0$. Observing this, we may see:

As this must hold for any $f \in \mathscr{H}$, let $f(x)=c$ be a constant function on the interval $[0,1]$. Correspondingly, $m$ must be such that the following is satisfied:

$0 = c^2 \int_0^1 \left | m(x)\left ( m(x)-1 \right ) \right |^2 dx$

As an immediate consequence, $m$ may only take on the values of $0$ and $1$ over any set of nonzero measure on the interval $[0,1]$. Thus, for a measurable set $A \subseteq [0,1]$, we have that $m(x)=\chi_A(X)g(x)$ (where $g$ may be a function different from the identity only on a set of measure zero), and we may conclude that $m$ must be the characteristic function of a measurable set almost everywhere.   $\blacksquare$

Reverse Direction

Let $\mathscr{H}=L^2[0,1]$ and let $m \in L^\infty[0,1]$. Define the map $M_m:L^2[0,1] \rightarrow L^2[0,1]$ by $M_m(f) \mapsto mf$. Suppose $m$ is a characteristic function of a measurable set on $[0,1]$. As $\chi_A^2 = \chi_A$ for any measurable set $A$, note that $M_m^2(f) = m^2f = mf = M_m(f)$. Thus, $M_m^2=M_m$ and $M_m$ is indeed an idempotent.   $\blacksquare$

Problem 2

Prove that an idempotent $E$ is a projection if and only if $\left \| E \right \|=1$.

Note: This solution is a rephrasing of a proof found in Conway.

Solution

Forward Direction

Let $E$ be an idempotent and suppose that $E$ is a projection. To proceed, we first paraphrase a theorem and proof of Conway:

Theorem

Let $\mathscr{M}$ be a closed linear subspace of $\mathscr{H}$ and $P:\mathscr{H}\rightarrow \mathscr{M}$ the orthogonal projection of $\mathscr{H}$ onto $\mathscr{M}$. Then, for every $h \in \mathscr{H}$, $\left \|Ph \right \| \leq \left \|h \right \|$.

Proof
Given $h \in \mathscr{H}$, we may write $h = (h-Ph)+Ph$. But, recall that $Ph \in \mathscr{M}$ and $h-Ph \in \mathscr{M}^\perp$. So, by the pythagorean theorem, $\left \| h \right \|^2 = \left \| h-Ph \right \|^2+\left \|Ph \right \|^2 \geq \left \| Ph\right \|^2$. Thus, we may conclude that $\left \| Ph \right \| \leq \left \| h \right \|$ as desired.  $\blacksquare$

Now, let us recall that $\left \| E \right \| = \mathrm{sup}\left \{ \left \| E(h) \right \| : \left \| h \right \|=1 \right \}$. By the cited theorem, this gives us that $\left \| E \right \| \leq 1$. However, let us note that $Eh=h$ for all $h \in \mathrm{range}(E)$ and that, from this equality, equality in norm must also hold. Thus, $\left \| E \right \| =1$$\blacksquare$

Reverse Direction

Let $E$ be an idempotent such that $\left \| E \right \|=1$. Let $h \in \mathrm{ker}(E)^\perp$, and observe that $\mathrm{range}(I-E) = \mathrm{ker}(E)$. It follows that $h-Eh \in \mathrm{ker}(E)$. Thus, by perpendicularity, it must be the case that $\langle h-EH,h \rangle =0$. As $\langle h-Eh,h \rangle = \left \| h \right \|^2-\langle Eh, h \rangle$, we may utilize the CBS inequality and the assumption that $\left \| E\right \|=1$ to see the following holds:

$\left \| h \right \|^2 = \langle Eh, h \rangle \leq \left \| Eh \right \| \left \| h \right \| \leq \left \| h \right \|^2$

Consequently, for any $h \in \mathrm{ker}(E)^\perp$, we have that $\left \| Eh \right \| = \left \| h \right \| = \langle Eh,h \rangle ^{1/2}$. Using this, however, we may see that the polar identity yields:

$\left \|h - Eh \right \|^2 = \left \| h\right \|^2 -2 Re \langle Eh, h \rangle + \left \| Eh \right \|^2 =0$

Which is to say that $\mathrm{ker}(E)^\perp \subseteq \mathrm{range}(E)$.

Similarly, for $g \in \mathrm{range}(E)$, we may write $g = g_1 + g_2$, where $g_1 \in \mathrm{ker}(E)$ and $g_1 \in \mathrm{ker}(E)^\perp$. Corresponding to this, we may observe that $E(g)=E(g_1+g_2) = E(g_1)+E(g_2) = E(g_2) = g_2$, which is to say that $\mathrm{range}(E) \subseteq \mathrm{ker}(E)^\perp$. Therefore, we may conclude that $\mathrm{range}(E) = \mathrm{ker}(E)^\perp$, giving that $E$ is a projection.  $\blacksquare$

Problem 3

If $T$ is a compact operator and the range of $T$ is closed, prove that the range of $T$ is finite dimensional.

Solution

Suppose $T: \mathscr{H} \rightarrow \mathscr{K}$ is a compact operator with closed range. Observe that, as $T$ is compact, $\mathrm{range}(T)$ is locally compact following from the linearity of $T$, as the existence of a precompact set about zero guarantees a precompact set about any point in $\mathrm{range}(T)$.

Note as well that, as $T$ has closed range, it follows that $\mathrm{range}(T) \leq \mathscr{K}$ and is correspondingly a topological vector space. But, by Theorem I.2.2 of Rudin’s Functional Analysis text, we know that any locally compact topological vector space must be of finite dimension. Thus, $\mathrm{range}(T)$ is finite dimensional.   $\blacksquare$

Problem 4

Prove that an idempotent is compact if and only if it has finite rank.

Solution

Forward Direction

Let $E$ be a compact idempotent on $\mathscr{H}$. That is, let $E \in \mathscr{B}_0(\mathscr{H})$ be such that $E^2=E$. We first cite part $(b)$ of proposition II.3.2 of Conway:

Theorem (Proposition II.3.2)

$E$ is an idempotent if and only if $\mathrm{range}(E)=\mathrm{ker}(I-E)$, $\mathrm{ker}(E)=\mathrm{range}(I-E)$, and both $\mathrm{range}(E)$ and $\mathrm{ker}(E)$ are closed linear subspaces of $\mathscr{H}$.

Thus, $E$ is a compact operator with closed range and, by the result proven in problem 3 above, we may conclude that $\mathrm{dim}(\mathrm{range}(E)) < \infty$, or that $E$ has finite rank.  $\blacksquare$

Reverse Direction

Let $E$ be an idempotent on $\mathscr{H}$ with finite rank. That is, let $E \in \mathscr{B}(\mathscr{H})$ be such that $E^2 = E$ and $\mathrm{dim}(\mathrm{range}(E)) < \infty$.

Note that, by the previously cited proposition of Conway, we have that $\mathrm{range}(E)$ is a closed linear subspace of $\mathscr{H}$ (and, thus, itself a Hilbert space). Correspondingly, $\overline{E (\mathrm{Ball} \enskip \mathscr{H})} \subseteq \mathrm{range}(E)$. Thus, the closure of the image of the unit ball of $\mathscr{H}$ is a subset of a finite dimensional Hilbert space. By the Heine-Borel theorem, it then follows that $\overline{E (\mathrm{Ball} \enskip \mathscr{H})}$ is compact, and we may conclude $E$ is a compact operator.   $\blacksquare$

Problem 5

Let $\mathscr{H}$ be a Hilbert space and $T$ a compact operator on $\mathscr{H}$. If $(e_n)$ is an orthonormal set on $\mathscr{H}$, prove that $\left \| Te_n \right \| \rightarrow 0$. Is the converse of this true?

Solution

Let $\mathscr{H}$ be a Hilbert space and $T: \mathscr{H} \rightarrow \mathscr{K}$ a compact operator on $\mathscr{H}$. Further, suppose $(e_n)$ is an orthonormal set on $\mathscr{H}$. Let $I$ be an arbitrary countable subset of the index set of $(e_n)$, and let us consider the countable subset $(e_i)_{i \in I} \subseteq (e_n)$.

We seek to show that $\left \| Te_i \right \| \rightarrow 0$. By way of contradiction, assume that the image of $(e_i)$ under $T$ does not converge to zero in the norm of $\mathscr{K}$. Then, as $T$ is compact and $(e_i)$ a bounded sequence, there must exist a convergent subsequence $(Te_{i_k})$ and corresponding $q \in \mathscr{K}$ such that $Te_{i_k} \rightarrow q$ and $\left \| q \right \|^2 > \epsilon$ for some $\epsilon > 0$.

From this, let us note that $\left | \langle Te_{i_k},q \rangle \right | \rightarrow \left | \langle q,q \rangle \right | = \left \|q \right \|^2 > \epsilon$. Following this, also note that we may utilize the properties of the adjoint as follows:

As $(e_{i_k})$ is an orthonormal set, recall that Bessel’s inequality gives the following bound for all $x \in \mathscr{H}$.

$\sum_{k=1}^\infty \left | \langle x, e_{i_k} \rangle \right |^2 \leq \left \| x \right \|^2$

With this in mind, let us make use of the fact that $T^*q \in \mathscr{H}$, yielding:

$\sum_{k=1}^\infty \left | \langle T^*q, e_{i_k} \rangle \right |^2 \leq \left \| T^*q \right \|^2$

As $T$ is a compact operator and is thus bounded, and as $\left \| T^* \right \| = \left \| T \right \|$, it follows that $\left \| T^*q \right \|^2 \leq \left \| T \right \|^2 \enskip \left \| q \right \|^2$.

However, as the sum given in the above inequality converges to a finite value, it must be the case that $\left | \langle T^*q, e_{i_k} \rangle \right |^2 \rightarrow 0$. But, we may then recall that $\left | \langle Te_{i_k},q \rangle \right | \rightarrow \left \|q \right \|^2$, so it follows that $\left \| q \right \| = 0$. By the definiteness of the norm, it must indeed be the case that $q = 0$, which yields a contradiction to our hypothesis that $\left \| q \right \| > \epsilon$. Thus, all convergent subsequences of $(Te_i)$ must converge to $0$ in norm, and correspondingly $(Te_i)$ must converge to zero in norm as well. Moreover, as $(e_i)$ was an arbitrary countable subset of $(e_n)$, this also yields that $(Te_n)$ must respectively converge to $0$ in norm. $\blacksquare$

Problem 6

Prove that if $T$ is compact and invertible on $\mathscr{H}$, then $\mathscr{H}$ has finite dimension.

Solution

Suppose $T \in \mathscr{B}_0(\mathscr{H},\mathscr{K})$ is invertible. Then, it must be the case that $\mathrm{dim}(\mathrm{range}(T))=\mathrm{dim}(\mathrm{range}(T^{-1}))$, as well as $\mathrm{ker}(T)=\left \{ 0 \right \}$.

By way of contradiction, let us suppose $\mathscr{H}$ is not finite dimensional. Following this, let $(e_n)$ be an orthonormal basis of $\mathscr{H}$. Then, by the result of problem 5 above, we have that $\left \| Te_n \right \| \rightarrow 0$, which is to say that for all $\epsilon >0$, there exists $N$ such that $m > N$ implies that $\left \| Te_m \right \| < \epsilon$.

However, now observe that $\left \|T^{-1}\left ( Te_m \right ) \right \|=1$ as $e_m$ is a unit vector. Following from the distributivity of the norm in $\mathscr{H}$ and the definition of the operator norm, this then gives that:

$\frac{1}{\epsilon} < \frac{1}{\left \| Te_m \right \|}=\left \|T^{-1}\left (\frac{1}{\left \| Te_m \right \|} Te_m \right ) \right \| \leq \left \|T^{-1} \right \|$

But, as $\epsilon$ can be made arbitrarily close to zero, it then follows from the above inequality that $\left \| T^{-1} \right \|$ can be made arbitrarily large. Thus, $T^{-1} \notin \mathscr{B}( \mathscr{k}, \mathscr{H} )$, and we may conclude that a compact operator is invertible only when the dimension of its domain is finite.  $\blacksquare$

Problem 7

Prove that if $T$ is compact on $\mathscr{H}$, $\mathcal{M}$ is a subspace of $\mathscr{H}$, and $T|_{\mathcal{M}}: \mathcal{M} \rightarrow \mathcal{M}$, then the restriction of $T$ to $\mathcal{M}$ is compact.

Solution

Let $T$ be compact on $\mathscr{H}$ and let $\mathcal{M} \leq \mathscr{H}$. Further, let $\mathcal{M}$ be an invariant subspace of $\mathscr{H}$ under $T$. That is, let $T(\mathcal{M}) \subseteq \mathcal{M}$.

As $T$ is compact on $\mathscr{H}$, it follows that every open cover of $\overline{T( \mathrm{Ball} \mathscr{H})}$ has a finite subcover. Let $C$ be such a finite subcover of $\overline{T( \mathrm{Ball} \mathscr{H})}$, and observe that $\overline{T( \mathrm{Ball} \mathcal{M})} \subseteq \overline{T( \mathrm{Ball} \mathscr{H})} \subseteq C$.

As $\mathcal{M}$ is a subspace of $\mathscr{H}$, let $P$ be the orthogonal projection of $\mathscr{H}$ onto $\mathcal{M}$. Recalling that $\mathscr{H}$ is a Hilbert space and that bounded operators on a Hilbert space are necessarily continuous, we may observe that $P$ is continuous, as $P \in \mathscr{B}(\mathscr{H})$. Following from the fact that the image of an open set under a continuous function is an open set, observe that $P(C)\subset \mathcal{M}$ is a finite collection of open sets. Further, note that $\overline{T( \mathrm{Ball} \mathcal{M})} \subseteq C$, and recall that for all $m \in \mathcal{M}$, $P(m)=m$. Thus, as $C$ is a finite open cover of $\overline{T( \mathrm{Ball} \mathcal{M})}$ in $\mathscr{H}$, $P(C)$ must also be a finite open cover of $\overline{T( \mathrm{Ball} \mathcal{M})}$ in $\mathcal{M}$. Therefore, whenever $T$ is compact on $\mathscr{H}$, it must follow that $T|_{\mathcal{M}}$ is compact on $\mathcal{M}$$\blacksquare$

Problem 8

Prove that the multiplication operator $M_\phi : L^2[0,1] \rightarrow L^2[0,1]$ is compact if and only if its symbol is zero.

Solution

Forward Direction

Let $M_\phi : L^2[0,1] \rightarrow L^2[0,1]$ be the multiplication operator defined by $M_\phi (f) \mapsto \phi f$, and let $M_\phi$ be compact. Further, let $(f_n)$ be a Cauchy sequence in $L^2[01]$ such that $f_n \rightarrow f$. Then, for all $\epsilon >0$, there exists $N$ such that $n > N$ implies that $\left \|f_n-f \right \| < \epsilon$. Observe then that the following holds:

From this, it follows that any Cauchy sequence in $L^2[0,1]$ is mapped to a convergent sequence by $M_\phi$, and the limit of the sequence is included in the range of $M_\phi$ by merit of the fact that a sequence converging in the range is also a convergent sequence in the domain (as $M_\phi$ maps $L^2[0,1]$ onto itself). Thus, it follows that the range of $M_\phi$ is closed, and by the result of problem 3 above, a compact operator with closed range must be of finite rank.

However, let us now note that if $\phi \neq 0$, then there must be a set $A \subseteq [0,1]$ of nonzero measure on which $\phi(a)\neq 0$ for all $a \in A$. But, as $L^2(A)$ extends to an infinite dimensional subspace of $L^2[0,1]$, and as $\left (M_\phi(f)\right )|_A \neq 0$ for all nonzero $f$ in this extension, it follows that the range of $A$ cannot be finite — a contradiction. Thus, as the range as been shown to be closed, it follows that $M_\phi$ cannot be compact when $\phi \neq 0$.

Now, let us consider the case where $\phi =0$. Trivially, as the range of phi is a singleton set, any open set in $L^2[0,1]$ containing zero is a finite open cover of $\overline{M_\phi( \mathrm{Ball} L^2[0,1] )} = \left \{ 0 \right \}$. Thus, if $M_\phi$ is compact, its symbol must then be zero.   $\blacksquare$

Reverse Direction

Note that the argument in the final paragraph of the above proof implies that, if $M_\phi$ has a zero symbol, it must then be compact. Therefore, the multiplication operator on $L^2[0,1]$ is compact if and only if its symbol is zero. $\blacksquare$

Problem 9

Calculate the spectrum $\Sigma(S)$, where $S: l^2 \rightarrow l^2$ is the unilateral shift operator.

Solution

Let $S: l^2 \rightarrow l^2$ be the unilateral shift operator defined by $S(\alpha_1,\alpha_2,\hdots) \mapsto (0,\alpha_1,\alpha_2,\hdots)$. We now calculate the spectrum $\Sigma(S)$.

First, let us consider $\sigma_p(S)$, the point spectrum of $S$. Recall that if $\lambda \in \sigma_p(S)$, then $\lambda \neq 0$ and $\mathrm{ker}(T-\lambda I) \neq 0$. In our case, observe that this would then entail that:

But, this then gives that $0=\lambda \alpha_1$, which holds only if $\lambda = 0$ or $\alpha_1 =0$. Substituting, this then yields $\lambda \alpha_2 = \alpha_1$, giving that either $\alpha_1 =0$ if $\lambda=0$ or $\alpha_2 =0$. continuing on in this fashion, we see that $(\alpha_1, \alpha_2, \hdots) = (0,0,\hdots)$. Thus, $\sigma_p(S) = \emptyset$.

Now, let us consider $\sigma_c(S)$, the continuous spectrum of $S$. Recall that if $\lambda \in \sigma_c(S)$, then there exists $(x_n)\subset l^2$, with $\left \| x_i \right \| = 1$ for all $i \in \mathbb{N}$, such that $\left \|(S-\lambda I)x_n \right \| \rightarrow 0$. In our case, let us note that $\left \| S t \right \| = \left \|t \right \|$ for all $t \in l^2$, and observe that:

As our norm is bounded by below by $\left | \enskip 1 - \left | \lambda \right | \enskip \right |$ for all $n$, we may conclude that $\left \|(S-\lambda I)x_n \right \| \rightarrow 0$ is possible only if $|\lambda|=1$. Thus, $\sigma_c(S) = \left \{ \lambda : |\lambda|=1 \right \}$.

Finally, let us consider $\sigma_r(S)$, the compression spectrum of $S$. Recall that if $\lambda \in \sigma_r(S)$, then $\mathrm{range}(S - \lambda I)$ is a proper subspace of $l^2$. In our case, as $S$ is linear, we need only consider then a sequence $(x_n)$ such that $x_n \rightarrow x$ for some $x \in l^2$ and show that it converges to an element within the range. As such, observe the following:

Hence, if $| \lambda | < 1$, it follows that the sequence converges. Moreover, note that the range is correspondingly closed, which follows by merit of the fact that a sequence converging in the range is also convergent in the domain (as $S$ maps $l^2$ onto itself). Thus, $\sigma_r(S)= \left \{ \lambda : | \lambda | < 1 \right \}$.

It then follows that $\Sigma(S) = \left \{ \lambda : |\lambda| \leq 1 \right \}$.   $\blacksquare$

# The Open Mapping Theorem

The open mapping theorem (or the Banach-Schauder theorem, if you prefer) is an incredibly important, relatively straightforward and digestible result in functional analysis which plays a crucial role in a large variety of other interesting theorems.  As an exercise, we’ll prove the open mapping theorem here in the standard fashion.  This will hopefully serve as a useful reference for later posts about metric regularity and linear openness, which serve as a measuring device to quantify the degree to which a map is open.

Stating the Open Mapping Theorem

The open mapping theorem may be stated as follows:

Theorem (Open Mapping)

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a bounded linear operator. Then, if $T$ is surjective, $T$ is an open map.

Restating this another way (with $\mathbb{B}$ denoting the open unit ball), the theorem reads:

Theorem (Open Mapping)

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a bounded linear operator. Then, if $T$ is surjective, $0 \in int A(\mathbb{B}_\mathfrak{X})$.

At the heart of the open mapping theorem is the notion that there is a link between precision and isomorphism among complete normed spaces (though the theorem holds under the even weaker assumption of local convexity as well): if the equation $Tx=y$ has at least one solution for any $y \in \mathfrak{Y}$, then either $T$ is an isomorphism and $\mathfrak{Y}$ is isomorphic to $\mathfrak{X}$, or $T$ is a quotient map and $\mathfrak{Y}$ is isomorphic to a quotient of $\mathfrak{X}$. Taken another way, the open mapping theorem links precision and estimation — at least in the sense that if $Tx=y$ has a solution for any $y \in \mathfrak{Y}$, then there exists a constant $k>0$ such that $\|x\|_\mathfrak{X} \leq k\|y\|_\mathfrak{Y}$. More specifically (and this keys us in to precisely how open mappings relate to metric regularity), $k^{-1} = \sup{ r > 0 : r\mathbb{B}_\mathfrak{Y} \subseteq T(\mathbb{B}_\mathfrak{X} }.$ We say that the value $k^{-1}$, as defined here, is the Banach constant of $T$, and will prove useful in the future.

We will prove the open mapping theorem in detail to highlight precisely the epsilon-delta argument used, which will hopefully allow us to see how weakening of linear openness may be better understood.

Proving the Open Mapping Theorem

Our proof of the open mapping theorem will rely on the Baire category theorem. Because of this, we will also prove the Baire Category theorem. However, let’s first define a Baire space.

The modern definition of a Baire space is often given in one of the four following equivalent forms:

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if every intersection of a countable collection of dense open subsets of $\mathcal{X}$ is also dense.

That is, Baire spaces preserve density under countable intersections (notice that open need not be specified, as a dense closed set is necessarily the entire space).

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if every union of a countable collection of closed subsets of $\mathcal{X}$ with empty interior has empty interior.

That is, Baire spaces preserve boundary sets under countable unions (notice that closed need not be specified, as an open set with empty interior is necessarily the empty set).

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if the interior of every union of a countable collection of closed, nowhere dense subsets of $\mathcal{X}$ is empty.

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if, whenever any union of countably many closed subsets of $\mathcal{X}$ has an interior point, then one of the closed subsets has an interior point.

The historical definition of a Baire space involves Baire’s notion of categories (not to be confused with the categories of category theory), but is also equivalent.

Definition (Sets of First and Second Category)

Let $\mathcal{X}$ be a topological space. We say a subset $\mathcal{W}$ of $\mathcal{X}$ is:

1.  of first category (or, often, meagre) in $\mathcal{X}$ if there exist a sequence $\left \{N_i\right \}_{i=1}^\infty$ of nowhere dense subsets of $\mathcal{X}$ such that $\mathcal{W}=\bigcup_{i=1}^\infty N_i$;
2. of second category in $\mathcal{X}$ if $\mathcal{W}$ is not of first category in $\mathcal{X}$.

This leads to Baire’s original definition, which is as follows:

Definition (Baire Space)

We say $\mathcal{X}$ is a Baire space if every non-empty open set in $\mathcal{X}$ is of second category in $\mathcal{X}$.

That is to say, open sets in Baire spaces are, in a sense, suitably ‘substantial’ or ‘large’ — at least, insofar as Baire spaces have no open sets which are meager.

Correspondingly, we also can find one further historical definition of a Baire space using this notion:

Definition (Comeagre set)

Let $\mathcal{X}$ be a topological space. We say that a subset $\mathcal{W}$ of $\mathcal{X}$ is comeagre if its compliment $\mathcal{W}^C$ is meager.

Definition (Baire Space)

We say $\mathcal{X}$ is a Baire space if every comeagre subset of $\mathcal{X}$ is dense in $\mathcal{X}$.

Seeing the straight-up crazy number of different definitions of a Baire space, one might wonder why these spaces deserve so much fuss. To this end, let us observe that Baire spaces enjoy a combinatorial property akin to the pigeonhole principle, which is (very obviously) equivalent to their definition.

Theorem (Interior Pigeonhole Property)

Let $E_1,E_2,\hdots$ be an arbitrary, at most countable, sequence of closed subsets of $\mathcal{X}$. We say $\mathcal{X}$ has the interior pigeonhole property if, whenever $\bigcup_n E_i$ has nonempty interior, then at least one $E_i$ has nonempty interior.

Let $\mathcal{X}$ be a topological space. If $\mathcal{X}$ has the interior pigeonhole property, then $\mathcal{X}$ is a Baire space.

So, Baire spaces are useful in some contexts because they allow us to — in a somewhat combinatorial fashion — extract data about the existence of a subset satisfying a certain property among a collection of subsets by looking at the properties of a larger set which contains them. Determining if a topological space is a Baire space allows us to utilize arguments of this form, so we frequently are interested in conditions dictating whether a topological space does indeed have this desirable property. In particular, we could consider this question in the context of a complete metric space, which leads us to the so-called Baire Category Theorem.

Theorem (Baire Category)

Every complete metric space is a Baire space.

Proof

Let $\mathcal{X}$ be a complete metric space. Using definition the first definition of a Baire space, we seek to show that a countable intersection of dense subsets is dense. To that end, let $\left \{E_n \right \}_{n=1}^\infty$ be a countable collection of dense subsets of $\mathcal{X}$. As a subset is dense if and only if every nonempty open subset intersects it, it is then sufficient to show that any nonempty open subset $W \subset \mathcal{X}$ has a point $x$ which lies in the intersection of $W$ with each $E_n$.

Proceeding in this fashion, observe that since $E_1$ is dense, $E_1\bigcap W \neq \emptyset$. Thus, there exists $x_1 \in E_1\bigcap W$ and real constant $0 such that $\overline{\mathbb{B}(x_1,r_1)} \subseteq E_1\bigcap W$.

Now, observe that, as $E_2$ is dense, $E_2 \bigcap \mathbb{B}(x_1,r_1) \neq \emptyset$, and we may find a point $x_2$ in the intersection and positive radius $0 < r_2<\frac{1}{2}$ such that $\overline{\mathbb{B}(x_2,r_2)} \subset \mathbb{B}(x_1,r_1)$. Continuing recursively, we find a pair of sequences $\left \{x_n\right \}_{n=1}^\infty$ and $\left \{r_n\right \}_{n=1}^\infty$ such that $0 < r_n < \frac{1}{n}$ and $\overline{\mathbb{B}(x_n,r_n} \subset \mathbb{B}(x_{n-1},r_{n-1})\bigcap E_{n}$. Thus, we have a nested sequence of closed and bounded subsets $\left \{\overline{\mathbb{B}(x_n,r_n)}\right \}_{n=1}^\infty$, which are correspondingly compact by the Heine-Borel theorem. Applying Cantor’s intersection theorem, this then yields a fixed point

Moreover, as the sequence $\left \{x_n\right \}_{n=1}^\infty$ is Cauchy and $\mathcal{X}$ is complete, $x_n \rightarrow x \in \mathcal{X}$.

Therefore, we may conclude that the intersection of a countable number of dense open subsets of a complete metric space is dense, and that every complete metric space is correspondingly a Baire space.  $\blacksquare$

We will use this result in a central way to prove the open mapping theorem. However, we first need three lemmas.

Lemma
A normed space $\mathfrak{X}$ is a Banach space if and only if every absolutely convergent series in $\mathfrak{X}$ converges in $\mathfrak{X}$.

Proof

$\mathbf{[\Rightarrow]}$

Let $\mathfrak{X}$ be a Banach space and $\left \{x_n \right \}$ an arbitrary sequence in $\mathfrak{X}$ such that $\sum_{k=1}^\infty\| x_n \|$ converges. It then follows that the partial sums of the series are a Cauchy sequence, and by the completeness of $\mathfrak{X}$, the series $\sum_{k=1}^\infty x_n$ converges to an element of $\mathfrak{X}$.

$\mathbf{[\Leftarrow]}$

Let $\mathfrak{X}$ be a normed space and suppose that every absolutely convergent series in $\mathfrak{X}$ converges in $\mathfrak{X}$. We must now show that every Cauchy sequence in $\mathfrak{X}$ converges. To that end, let $\left \{x_n \right \}$ be an arbitrary Cauchy sequence in $\mathfrak{X}$, and let $\left \{x_{n_k} \right \}$ be a subsequence of $\left \{ x_n \right \}$ such that $\| x_{n_{k+1}} - x_{n_k} \| < 2^k$. It then follows that $\sum_{k=1}^\infty \| x_{n_{k+1}} - x_{n_k} \|$ converges, and by our assumption, that $\sum_{k=1}^\infty x_{n_{k+1}} - x_{n_k}$ converges as well to some $x \in \mathfrak{X}$. Observe that

As this series converges, it then follows that $x_{N_{k+1}} - x_{n_1} \rightarrow x - x_{n_1}$ for some $x \in \mathfrak{X}$. Thus, we have shown that $\left \{ x_{n_k} \right \}$ is a convergent subsequence of the Cauchy sequence $\left \{ x_n \right \}$ and, correspondingly, we may conclude that $\left \{ x_n \right \}$ converges in $\mathfrak{X}$. Being that $\left \{ x_n \right \}$ was arbitrary, it then follows that $\mathfrak{X}$ is complete and, thus, a Banach space.  $\blacksquare$

Lemma

Let $\mathfrak{X}$ be a Banach space, $\mathcal{Y}$ a normed space, and $T \in \mathscr{B}(\mathfrak{X},\mathcal{Y})$. If $r,s > 0$ is are constants such that $s\mathbb{B}_\mathcal{Y} \subset \overline{T(r\mathbb{B}_\mathfrak{X})}^o$, then $s\mathbb{B}_\mathcal{Y}^o \subset T(r\mathbb{B}_\mathfrak{X})^o$.

Proof

Let $\mathfrak{X}$ be a Banach space, $\mathcal{Y}$ a normed space, and $T \in \mathscr{B}(\mathfrak{X},\mathcal{Y})$. Further, suppose we have a constant $r > 0$ such that $r\mathbb{B}_\mathcal{Y} \subset \overline{T(\mathbb{B}_\mathfrak{X})}^o$. Noting that scaling is a homeomorphism, without loss of generality we take $r=s=1$ by alternatively considering $\frac{r}{s}T$.

Having done this, let us first choose an arbitrary $z \in \mathbb{B}_\mathcal{Y}$, and choose $\delta>0$ such that $\|z\|_\mathcal{Y} < 1-\delta< 1$ (that is, $z \in (1-\delta)\mathbb{B}_\mathcal{Y} \subseteq \mathbb{B}_\mathcal{Y}^o$). Then, choose $y \in Y$ by $y = (1-\delta)^{-1}z$, observing that $\|y\|_\mathcal{Y} = \|(1-\delta)^{-1}z\|_\mathcal{Y} < 1$. We will demonstrate that $y \in (1-\delta)^{-1}T(\mathbb{B}_\mathfrak{X})^o$, which correspondingly gives that $z \in T(\mathbb{B}_\mathfrak{X})^o$.

To do so, we find a sequence $\left \{ y_n \right \}_{n=0}^\infty \subset \mathcal{Y}$ such that

That is, a sequence which converges to $y$ and has the difference of successive terms in successively smaller contractions of $T(\mathbb{B}_\mathfrak{X})^o$.

As $z \in \overline{T(\mathbb{B}_\mathfrak{X})}^o$, $z$ is the limit of a sequence $\left \{ T(w_n) \right \}_{n=1}^\infty \subset T(\mathbb{B}_\mathfrak{X})^o$ where $w_n \in \mathbb{B}_\mathfrak{X}$. Then, it follows that for all $\epsilon > 0$, there exists $N(\epsilon) \in \mathbb{N}$ such that $\|z-T(w_m)\|_\mathcal{Y} < \epsilon$ for all $m \geq N(\epsilon)$. Following this, set $\epsilon_n = (1-\delta)\delta^n$. Now, let $y_n = (1-\delta)^{-1}T(w_{N(\epsilon_n)})$ for $n \geq 1$ and $y_0=0$ (which does indeed work, as $\|z\|_\mathcal{Y}<(1-\delta)$). Notice that this allows us to yield the following observations:

Thus, as $y_n \in (1-\delta)^{-1}T(\mathbb{B}_\mathfrak{X})^o$, we then have that $y_n-y_{n-1} \in \delta^{n-1}T(\mathbb{B}_\mathfrak{X})^o$, and our desired properties have been fulfilled.

Following this, we find a convergent sequence in $\mathfrak{X}$ which also converges to $y$ under $T$ to demonstrate that $y \in (1-\delta)^{-1}\mathbb{B}_\mathfrak{X}$. That is, we seek a sequence $\left \{ x_n \right \}_{n=1}^\infty \subset \mathfrak{X}$ such that $\|x_n\|_\mathfrak{X} < \delta^{n-1}$ and $T(x_n)=y_n-y_{n-1}$. To do this, let us first notice that, for $x \notin ker(T)$, we have $1 \leq \|T(\frac{x}{\|x\|_\mathfrak{X}})\|_\mathcal{Y}$, as $\mathbb{B}_\mathcal{Y}^o \subseteq \overline{T(\mathbb{B}_\mathfrak{X})}^o$. Setting $x_n = (1-\delta)^{-1}\left (w_{N(\epsilon_n)}-w_{N(\epsilon_{n-1})} \right )$, clearly $T(x_n) = y_n - y_{n-1}$ by linearity, and the following then holds for $n \geq 1$:

Thus, $\|x_n\|_\mathfrak{X} < \delta^{n-1}$. Following this, we may now also notice that

so the series $\sum_{n=1}^\infty x_n$ is absolutely convergent, and correspondingly by the previous lemma, we then have that $\sum_{n=1}^\infty x_n = x^* \in \mathfrak{X}$ by the completeness of the space.

Moreover, note $\|x^*\|_\mathfrak{X} < (1-\delta)^{-1}$, so $x^* \in (1-\delta)^{-1}\mathbb{B}_\mathfrak{X}$. Thus, by the linearity of $T$, we then have that

and, correspondingly, $y \in (1-\delta)^{-1}T(\mathbb{B}_\mathfrak{X})^o$. Hence, it follows that

As such, we have demonstrated that, if $z \in \mathbb{B}_\mathcal{Y}^o \subset \overline{T(\mathbb{B}_\mathfrak{X})}^o$, then $z \in T(\mathbb{B}_\mathfrak{X})^o$ as well. As $z$ was chosen arbitrarily, this yields that $\mathbb{B}_\mathcal{Y}^o \subset T(\mathbb{B}_\mathfrak{X})^o$ as desired.  $\blacksquare$

Lemma

Let $V$ and $W$ be $\mathbb{R}$-vector spaces and $T: V \rightarrow W$ a bounded linear map. If $C \subset V$ is convex in $V$, then $T(V)$ is convex in $W$.

Proof

Let $V$ and $W$ be $\mathbb{R}$-vector spaces, $T: V \rightarrow W$ be a bounded linear map, and $C \subset V$ be convex in $V$. We seek to show that, for all $x,y \in T(C)$, $(1-t)x+ty \in C$ for all $t \in [0,1]$.

To that end, let $x,y \in T(C)$ and $t \in [0,1]$ be arbitrary. Then, $x = T(v)$ and $y = Tu$ for some $v,u \in C$. Moreover, by the convexity of $C$, we then have that $(1-t)v+tu \in C$. But, by the linearity of $T$, we yield the following inclusion

Thus, $T(C)$ is convex as well.  $\blacksquare$

Now, we use these results to prove the open mapping theorem, which we will recall is stated as follows:

Theorem (Open Mapping)

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a bounded linear operator. Then, if $T$ is surjective, $0 \in int A(\mathbb{B}_\mathfrak{X})$.

Proof

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a surjective bounded linear operator. If $\mathfrak{Y}$ is the trivial space, then we are done. Suppose $\mathfrak{Y}$ is not the trivial space. By the linearity of the spaces, it is sufficient to show that $T$ maps $\mathbb{B}_\mathfrak{X}$ to a neighborhood of the origin of $\mathfrak{Y}$.

First, let us note that $\mathfrak{X} = \bigcup_{n=1}^\infty n\mathbb{B}_\mathfrak{X}$. Correspondingly, by the surjectivity and linearity of $T$, we then have that $\mathfrak{Y} = T(\mathfrak{X}) = \bigcup_{n=1}^\infty nT(\mathbb{B}_\mathfrak{X}).$

As $\mathfrak{X}$ and $\mathfrak{Y}$ are Banach spaces, they are also Baire spaces. Correspondingly, as the whole space $\mathfrak{Y}$ has nonempty interior, there exists $n \in \mathbb{N}$ such that $\overline{nT(\mathbb{B}_\mathfrak{X})}^o \neq \emptyset$. Thus, there then must exist $y_0 \in \mathfrak{Y}$ and $r > 0$ such that $(y_0+r\mathbb{B}_\mathfrak{Y})^o \subset \overline{nT(\mathbb{B}_\mathfrak{X})}^o$.

Moreover, observe that if $y \in (y_0 + \mathbb{B}_\mathfrak{Y})^o \subset \overline{nT(\mathbb{B}_\mathfrak{X})}^o$, then $-y \in \overline{nT(\mathbb{B}_\mathfrak{Y})}^o$ as well by the linearity of $T$. Thus, $(-y_0+r\mathbb{B}_\mathfrak{Y})^o \subseteq \overline{nT(\mathbb{B}_\mathfrak{X})}^o$, and as the image of a convex set under a bounded linear map is convex by the third lemma above, we then have that

By the second lemma, we then may conclude that $r\mathbb{B}_\mathfrak{Y}^o \subset nT(\mathbb{B}_\mathfrak{X})^o$. Therefore, as we have shown that $T$ maps $\mathbb{B}_\mathfrak{X}$ to a neighborhood of the origin of $\mathfrak{Y}$, it follows that $T$ is then an open map.  $\blacksquare$

NOTE

This post draws on a large number of resources that I’ve encountered at various points over the last few years, most of which I didn’t write down at the time.  If you recognize any of the proofs given above, I’d love to know where you’ve seen them so I can properly cite the source.

# A Control Theory Perspective

In this post, we’ll talk about the trace parameterization of nonnegative Hermitian trigonometric polynomials, providing a proof which depends on the Kalman-Yakubovich-Popov lemma.  This follows very closely along the lines of Chapter 2, Section 2.5 of Dumitrescu’s book Positive Trigonometric Polynomials and Signal Processing Applications” (Springer, 2007).

Abstract

The trace parameterization of a Hermitian trigonometric polynomial which is nonnegative on the unit circle is an analog to the Riesz-Fejér theorem, which connects the convex set of trigonometric polynomials which are nonnegative on the unit circle to the set of positive definite matrices by means of causal polynomials. For this reason, the trace parameterization is often useful in computational settings, as it allows one bring the methods of semidefinite programming to bear on a problem involving these polynomials. We sketch an equivalent formulation of the trace parameterization, derived instead from the control-theoretic Kalman-Yakubovich-Popov lemma. This reveals a surprising connection between Hermitian trigonometric polynomials which are nonnegative on the unit circle and positive real transfer functions of linear time-invariant control systems with linear time-invariant feedback.

Basic Harmonic Analysis

Let us first establish some of the relevant Harmonic analysis terminology, notation, and definitions before proceeding. First, let us define the general object with which we will concern ourselves.

Definition (Trigonometric Polynomial)

Let $R(x) \in \mathbb{C}[x]$. We say that $R$ is a trigonometric polynomial if

$\displaystyle R(x) = a_0 + \sum_{k=1}^n a_n cos(nx) + i\sum_{k=1}^n b_n sin(nx), \enskip x \in \mathbb{R}.$

Using Euler’s formula, this can equivalently be seen as a polynomial $R(z) \in \mathbb{C}_n[z]$ given by

$\displaystyle R(z) = \sum_{k=-n}^n r_kz^{-k}, \enskip z \in \mathbb{C}.$

A specific subset of trigonometric polynomials will be the primary item on which we focus in this post.

Definition (Hermitian Trigonometric Polynomial)

If $R(z) \in \mathbb{C}_n[z]$, we say that $R$ is a Hermitian trigonometric polynomial if

$\displaystyle R(z) = \sum_{k=-n}^n r_kz^{-k}$

and $r_{-k}=r^*_k$.

Letting $r_k = \rho_ke^{i\theta_k}$, the final condition above then says that $r_{-k} = \rho_ke^{-i\theta_k}$. This allows us to see the symmetry which makes these polynomials particularly interesting, namely:

$R(z) = \rho_0 e^{i\theta_0} + \sum_{k=1}^n \left ( \rho_ke^{i\theta_k} z^{-k}+\rho_ke^{-e\theta_k} z^{k} \right ).$

It can be shown that Hermitian trigonometric polynomials are real-valued as well. For $u_k,v_k \in \mathbb{R}$ with $-v_k = v_{-k}$, we may write the complex coefficients of $R(z)$ as $r_k = u_k+iv_k$. This then allows us to see the following holds for values on the unit circle:

$R(z) = \sum_{k=-n}^n (u_k+iv_k)z^{-k} = u_0+2\sum_{k=1}^n u_k cos(k\theta) + 2 \sum_{k=1}^n v_ksin(k\theta).$

So, it is indeed the case that Hermitian trigonometric polynomials are real-valued.

A portion of a Hermitian trigonometric polynomial is, itself another important class of polynomials. In fact, this class of polynomials allows such polynomials as we’ve seen above to be written in a very nice way.

Definition (Causal Polynomial)

Let $\displaystyle H(z) \in \mathbb{C}_n[z]$. We say that $H$ is a causal polynomial if

$H(z) = \sum_{k=0}^n h_k z^{-k}.$

That is, Causal polynomials are simply polynomials with no $z^k$ terms for $k > 0$.

Notice then that the causal part of $R(z)$ as given above (which we denote by $R_+(z)$) is simply

$R_+(z) = \frac{r_0}{2} + \sum_{k=1}^n r_kz^{-k}.$

As another notational item, for a polynomial $p(z) = \sum_{k=0}^n p_k z^{k} \in \mathbb{C}[z]$, let us write $p^*(z) = \sum_{k=0}^n p^*_k z^{k}$.

Now, with this notation in mind, note that — as $r_0 \in \mathbb{R}$ for $R(z)$ a Hermitian trigonometric polynomial — we may observe

$R(z) = R_+(z) + R_+^*(z^{-1}).$

So, Hermitian trigonometric polynomials are necessary the sum of two causal polynomials. Later, we will see that they are also the product of causal polynomials, demonstrating that these two classes of polynomials are indeed intimately linked.

Basic Control Theory

Control theory broadly concerns itself with control-parameterized dynamical systems of ODE’s of the form

$\dot{x}=f(x(t),u(t))$

where $\dot{x}$ denotes $\frac{dx(t)}{dt}$ and $f$ is a sufficiently smooth vector field (which generally depends on the context of the problem). We say that $x(t)$ is the state vector, which is point-wise in $\mathbb{R}^n$, and $u(t)$ is the and control vector, which is point-wise in $\mathbb{R}^m$. Both the state vector and control vector can be thought of as being bounded, continuous vector fields as well (for our purposes, as least), and we generally consider dynamics which are time-invariant (i.e. autonomous) and finite dimensional.

Considering a fixed initial control $u$ and some initial value problem $\dot{x}=f(x,u), \quad x(0)=x_0$, if the correct assumptions are made (i.e. satisfying Pickard-Lindelöff or the like), we can (in theory) find a solution $\Phi_{x_0}(t)$. This essentially is the main problem in addressed by the theory of ODE’s. Control theory, very generally, is concerned with the reverse problem: Given a desired trajectory $\Phi_{x_0}(t)$, we seek to find a control vector field $u$ such that $\Phi_{x_0}(t)$ is a solution to the IVP.

Frequently, we will also concern ourselves with systems that have feedback and controls which may depend on that feedback. Intuitively, feedback can be thought of as a measurement of part of a state of the system — think of the cruise control in a car, which accelerates or decelerates based in response to the measured speed of the vehicle and the present control being applied. More precisely, let us make the following definitions:

Definition (Open Loop and Closed Loop Systems)

Consider the autonomous dynamical control system governed by ODE’s with feedback

$\dot{x}=f(x(t),u(t,y)), \enskip y=h(x(t)) \quad \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{control} \enskip u(t,y) \in \mathbb{R}^m, \enskip \textrm{and feedback} \enskip y \in \mathbb{R}^p$

We say such a controller $u(t,y)$ is a state-feedback control, and say that such a system is a closed loop system. A system in which the controller $u(t)$ does not depend on the feedback is called an open loop system.

We will consider a particularly nice class of control systems called linear, time invariant systems, which we will now define.

Definition (Linear Time-Invariant Systems)

An autonomous dynamical control system govened by ODE’s with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t)$

where $A,B,C,D$ are appropriately sized matrices, is said to be a linear time-invariant system.

For these systems, we will particularly concerned with two concepts that are essentially dual (and are dual general, not just in the linear case): controllability and observability. Before we see this link, let us first build up to each term’s respective definitions. First, we will need the notion of the reachable set.

Definition (Reachable Set)

Consider the autonomous dynamical control system governed by ODE’s

$\dot{x}=f(x(t),u(t)) \quad \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{and control} \enskip u(t) \in \mathbb{R}^m.$

Given a fixed $x_0 \in \mathbb{R}^n$ and control $u(t)$ defined for $t \geq 0$, denote by $\Phi_t(x_0,u)$ the solution to the IVP

$\dot{x}=f(x(t),u(t)), \quad x(0)=x_0.$

We define the reachable set at $x_0$ to be

$\mathscr{R}(x_0) = \left \{ x \in \mathbb{R}^n: \exists T \geq 0 \enskip \textrm{and input trajectory} \enskip u(t) \enskip s.t. \enskip \Phi_T(x_0,u)=x \right \}.$

That is, the reachable set at $x_0$ is everywhere you can steer the system to beginning at $x_0$ in a finite amount of time. Of particular interest is the case when we can steer the system anywhere we would like, which leads to the concept of controllability.

Definition (Controllability)

Consider the autonomous dynamical control system governed by ODE’s

$\dot{x}=f(x(t),u(t,y)), \enskip \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{control} \enskip u(t,y) \in \mathbb{R}^m$

We say the system is controllable at $x_0$ if $\mathscr{R}(x_0)=\mathbb{R}^n$. If the system is controllable at any $x_0 \in \mathbb{R}^n$, we say the system is controllable.

Essentially, the internal state of a controllable system can be moved from any initial state to any final state in finite time — it is both an intuitive notion and a rather strong condition!

The notion of observability can be seen in a relatively concrete fashion as well. Broadly, observability is is a measure for how much of the internal state of a system can be accurately inferred by the output. Think of a nice thermostat — if you set the tempurature to a certain level, the system will keep the the room at that level by turning on and off the heat in response to its internal thermometer, which (barring any strange lacunas in the air flow of the room) precisely yield a current measurement for the internal state of the system. More precisely, let us make the following definition for a simple class of systems with which we will concern ourselves:

Definition (Observability)

Consider the autonomous dynamical control system governed by ODE’s

$\dot{x}=f(x(t),u(t,y)), \enskip y=h(x(t)) \quad \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{control} \enskip u(t,y) \in \mathbb{R}^m, \enskip \textrm{and feedback} \enskip y \in \mathbb{R}^p$

We say the system is observable if, at any time $T$ and under any sequence of valid state and control vectors $\left \{u_i(t,y) \right \}_{i=0}^r$ and $x_k(t)_{k=0}^l$ applied to the system during the interval $[0,T]$, the internal state of the system can be recovered in finite time from only the data yielded by $y$.

In more colloquial terms, this says that the behavior of the entire system — no matter what fashion it has been steered in previously — can be determined at any time, and in finite time, by the output of the system alone. The link between observability and controllability can be seen rather naturally in this context: Controllability allows one to steer the internal state of a system anywhere one wishes in a finite amount of time irrespective of the starting point, while observability allows one to determine the internal state of a system in a finite amount of time irrespective of the previous steering.

For the case of linear, time-invariant systems, controllability and observability have particularly nice characterizations which make this duality quite clear. The results can be summarized by the following theorems:

Theorem (Kalman Controllability Criteria)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t).$

The, the system is controllable if and only if the Kalman matrix (or controllability matrix) given by

$K(A,B) = \begin{bmatrix} B & AB &A^2B &\hdots & A^{n-1}B \end{bmatrix}$

has full row rank (i.e. is of rank $n$). If this holds, we say that $(A,B)$ is a controllable pair.

(Note that feedback was not needed in this definition. We, however, opt to include the feedback to stress that this result does indeed also apply to systems with feedback, which will be our primary interest here.)

This result follows from the fact that the range of the Kalman matrix is the reachable set from $0$, i.e. $\mathcal{R}_T(0) = \textrm{range}(K(A,B))$. While it can be shown without much work that $\mathcal{R}_T(0) \subseteq \textrm{range}(K(A,B))$, we must construct the so-called controllability Gramian of $(A,B)$ to achieve our desired equality. This is outside the scope of our interests here, so we will omit the proof at present.

Now, we will see that a similar condition exists for observability.

Theorem (Observability Criteria)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t).$

The, the system is observable if and only if the Observability matrix given by

$O(A,B) = \begin{bmatrix} C & CA &C^2A &\hdots & C^{n-1}A \end{bmatrix}$

has full row rank (i.e. is of rank $n$). If this holds, we say that $(A,C)$ is an observable pair.

Why this condition is necessary is perhaps less clear than the controllability criteria, and has to do with placing the eigenvalues of $A-LC$ in the left of the complex plane by a suitable choice of $L$ in order to make the estimation error of a so-called Luenberger observer go to zero as $t \rightarrow \infty$. We will not delve in to this here, as it is somewhat lengthy. The basic gist can be summarized as saying that, if $n$ rows of the observability matrix are linearly independent, then each of the $n$ states of the system can be obtained through linear combinations of the output variable $y$.

Finally, we must define the notion of a transfer function. In our context, a transfer function is simply a representation of the relation between the input and the output of a linear time-invariant system in terms of the temporal frequency. More precisely, in our case, the transfer function $\mathbf{G}(s)$ is the linear mapping of the Laplace transform of the input to the Laplace transform of the output. Let us now demonstrate what we mean.
Let $\mathbf{V}(s) = \mathcal{L}\left \{v(t)\right \} = \int_{0}^\infty v(s)e^{-st}dt$ denote the usual Laplace transform of a function $v(t)$ (where $s$ is used here to represent a complex variable instead of the standard $z$ purely by historical convention).
Taking the Laplace transform of $\dot{x} = Ax(t)+Bu(t)$ yields

$s\mathbf{X}(s) - x(0) = A\mathbf{X}(s)+B\mathbf{U}(s).$

Thus,

$\mathbf{X}(s) = (sI-A)^{-1}x(0)+(sI-A)^{-1}B\mathbf{U}(s).$

Substituting this into the output, we yield

$\mathbf{Y}(s) = C\left ( (sI-A)^{-1}x(0)+(sI-A)^{-1}B\mathbf{U}(s) \right ) + D\mathbf{U}(s).$

Thus, if $\mathbf{G}(s)$ is the transfer function, it is a linear mapping of $\mathbf{X}(s)$ to $\mathbf{Y}(s)$, and we result in the following definition:

Definition (Transfer Function of a Linear, Time-Invariant System)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t).$

Then, the transfer function of the system is the linear mapping of $\mathbf{X}(s)$ to $\mathbf{Y}(s)$ given by

$\mathbf{G}(s) = C(sI-A)^{-1}B+D.$

This, combined with the rest of the terminology given here, turns out (surprisingly) to be enough to give an alternate derivation of the trace parameterization of Hermitian trigonometric polynomials.

Trace Parameterization:  A Harmonic Analysis Perspective

Let us begin with some notation. For shorthand, we will denote the vector containing the canonical basis elements for a polynomial of degree $n$ by

Noting that $\psi_n(z) \in \mathbb{C}[z]^{(n+1) \times 1}$, a familiarity with basic linear algebra allows us to see that the action of a vector of dimension $1 \times (n+1)$ (i.e. a linear functional) on $\psi_n(z)$ generates a polynomial. That is, if $b_n(z) = \sum_{k=0}^n b_k q_k(z) \in \mathbb{C}[z]^{1\times (n+1)}$ where $q_k(z) \in \mathbb{C}[z]$ for each $k$, then a new polynomial $p(z)$ is generated by the following action:

$\displaystyle b_n(z)\psi_n(z) = \sum_{i=0}^n b_iq_i(z)z^i = p(z). \qquad\qquad(*)$

The standard proof of the Riesz-Fejèr theorem uses this observation precisely to characterize nonnegative Hermitian trigonometric polynomials. The theorem is stated as follows:

Theorem (Riesz-Fejèr)

Let $R(z) \in \mathbb{C}[z]$ be a Hermitian trigonometric polynomial. Then, $R$ is nonnegative on the unit circle if and only if there exists a causal polynomial

$\displaystyle H(z) = \sum_{k=0}^n h_kz^{-k}$

such that

$R(z) = H(z)H^*(z^{-1}).$

Then, for a nonnegative Hermitian trigonometric polynomial, it is useful to note here that if $z = e^{i\theta}$ (that is, if $z$ is restricted to the unit circle), the following holds:

That is, all nonnegative Hermitian trigonometric polynomials are the squared moduli of causal polynomials.

We may also note that, as $r_{-k} = r_k^*$, we then have that

$\displaystyle r_k = \sum_{i=k}^n h_ih_{i-k}^*.$

So, a characterization of all Hermitian trigonometric polynomials nonnegative trigonometric polynomials is given by a choice of their respective causal polynomials. However, in some important computational circumstances, this characterization may not encode the coefficients $r_k$ in a particularly optimal way. For those circumstances in which the characterization of $r_k$ is indeed non-optimal, we may wish to consider a parameterization which relies on a particularly nice case of the observation $(*)$ — namely, when $p(z) = \psi_n^T(z^{-1})Q\psi_n(z)$. This leads to a definition:

Definition (Gram Matrix of a Hermitian Trigonometric polynomial)

A Hermitian matrix $Q \in \mathbb{C}^{(n+1) \times (n+1)}$ is called a Gram matrix associated with a Hermitian trigonometric polynomial $R \in \mathbb{C}_n[z]$ if

$R(z) = \psi_n^T(z^{-1})Q\psi_n(z).$

Moreover, we denote the set of all Gram matrices associated with $R(z)$ by $\mathcal{G}(R)$.

As suggested by the notation $\mathcal{G}(R)$, a Gram matrix associated with a given Hermitian trigonometric polynomial $R$ need not be unique.

Following the definition of a Gram matrix, we may now provide a theorem yielding the so-called trace parameterization of a Hermitian trigonometric polynomial $R(z)$, which allows for a different method of encoding the coefficients $r_k$.

Theorem (Trace Parameterization)

Let $R(z) \in \mathbb{C}_n[z]$ be a Hermitian trigonometric polynomial, and let $Q \in \mathcal{G}(R)$ be an associated Gram matrix of $R$. Then, the trace parameterization of $R(z)$ is given by

$\displaystyle r_k = tr[\Theta_k Q] = \sum_{i=max(0,k)}^{min(n+k,n)} q_{i,(i-k)} , \quad k \in \left \{-n, -(n-1) \hdots, n\right \}$

where $\Theta_k$ is the elementary Toeplitz matrix with $1$‘s along the $k^{th}$ diagonal and $0$‘s elsewhere.

Proof

First, let us observe that the following holds:

$R(z) = \psi_n^T(z^{-1})Q\psi_n(z) = tr[\psi_n^T(z^{-1})\psi_n(z)Q].$

Note now that

$\psi_n^T(z)\psi_n(z) = \begin{bmatrix} 1 &z^{-1} &\hdots &z^{-n} \\ z & 1 &\ddots &z^{-n+1} \\ \vdots &\ddots &\ddots &\vdots \\ z^n &z^{n-1} &\hdots &1 \end{bmatrix} = \sum_{k=-n}^n \Theta_k z^{-k}$

Combining, this yields:

which does indeed yield the desired form of the coefficients $r_k$$\blacksquare$

A keen observer might now conjecture that there may be a connection between the non-negativity of $R(z)$ and the properties of a given Gram matrix $Q$ associated with $R(z)$. This is indeed the case, and can be summarized by the following theorem (whose proof we omit, as this is not our central focus):

Theorem (Trace Parameterization of Nonnegative Trigonometric Polynomials)

Let $R(z) \in \mathbb{C}_n[z]$ be Hermitian trigonometric polynomial. Then, $R$ is nonnegative on the unit circle if and only if there exists a positive semidefinite matrix $Q \in \mathcal{G}(R)$.

In essence, the characterizations of trigonometric polynomials which are nonnegative on the unit circle given by trace parameterization and the Reisz-Fejèr theorem yields a link between those polynomials and positive definite matrices. This is particularly useful in a computation setting as both of these sets are convex (a fact which can be easily checked, but which we will not demonstrate here). As convexity is a required condition for optimization, this sheds light on the comment that trace parameterization may be a more useful method to encode these polynomials in some contexts.

Trace Parameterization:  A Control Theory Perspective

Now, we will show that the trace parameterization of Hermitian trigonometric polynomials can, in fact, be derived in a control theory perspective. This relies on the so-called Kalman-Yakubovich-Popov lemma (or positive real lemma, as it is sometimes referred to). A specific, nicer case of the lemma — which is important enough that we will think of it as a theorem here — can be stated as follows:

Theorem (Kalman-Yakubovich-Popov Lemma for Controllable and Observable Systems)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) + Bu(t), \qquad y=Cx(t)+Du(t).$

Then, if $(A,B)$ are a controllable pair and $(A,C)$ are an observable pair, the transfer function $\mathbf{G}(s) = C(sI-A)^{-1}B+D$ is positive real (i.e. $Re\left ( G(s) \right ) \geq 0$) if and only if there exists a positive semidefinite matrix $P$ such that the matrix

$Q = \begin{bmatrix} P-A^TPA & C^T-A^TPB \\ C-B^TPA &(D+D^T)-B^TPB \end{bmatrix}$

is positive semidefinite.

To use this, let us recall that the causal part of a Hermitian trigonometric polynomial which is nonnegative on the unit circle is positive real. Let $R(z)\in \mathbb{C}_n[z]$ be such a polynomial. We now will place this causal part of $R(z)$ into a linear time-invariant control system with $(A,B)$ a controllable pair and $(A,C)$ an observable pair. That is, we realize a controllable and observable linear time-invariant system such that $G(s)=R_+(z)$. How this is done is slightly involved (we utilize the so called Rosenbrock system matrix to realize our desired transfer function), but for our purposes may be achieved by setting

$A = \Theta_1 = \begin{bmatrix} 0 & 1 & \hdots & 0 \\ \vdots & \ddots &\ddots &\vdots \\ \vdots & \ddots &\ddots &1 \\ 0 & \hdots & \hdots &0 \end{bmatrix}, \quad B = \begin{bmatrix} 0 \\ \ddots \\ 0 \\ 1 \end{bmatrix},$

$C= \begin{bmatrix} &r_n &\hdots &r_2 &r_1 \end{bmatrix}, \quad D = \frac{r_0}{2}.$

Then, for some positive definite matrix $P$, the matrix $Q$ in the Kalman-Yakubovich-Popov Lemma for Observable Systems is positive definite and becomes

In fact, this turns out to be a Gram matrix of $R(z)$, which can be easily verified by simply computing $\psi^T(z^{-1})Q\psi(z)$.

Now, we may also note that

for $k \in \left \{0, 1, \hdots, n\right \}$. But this then yields that

Which is precisely the trace parameterization of Hermitian trigonometric polynomials we had previously derived!

References

1. B. Dumitrescu, Positive Trigonometric Polynomials and Signal Processing Applications. Springer (2007).
2. R. Lozano, B. Brogliato, O. Egeland, B. Maschke, Dissipative Systems Analysis and Control: Theory and Applications. Springer (2000)
3. F. Jafari, Topics in Harmonic Analysis. University of Wyoming, Lecture Notes, Spring 2017 (currently unpublished).
4. D. Meyer, Nonlinear Trajectory Generation and Control: Course Notes from an Introduction. University of Wyoming, Lecture notes Spring 2017 (currently unpublished).

# Part 1

As a helpful review, here are a variety of problem and solutions to exercises from an introductory functional analysis class. This is the second installment in a series of functional analysis exercises (Part 1, Part 2, Part 3, Part 4).

Problem 1

Show that for any real inner product space

$\langle u, v \rangle + \langle v, u \rangle = \frac{1}{2} \left ( \left \| u + v \right \|^2 - \left \| u - v \right \|^2 \right ),$

and for a complex inner product space

$\langle u, v \rangle - \langle v, u \rangle = \frac{i}{2} \left (\left \| u + iv \right \|^2 - \left \| u - iv \right \|^2 \right ).$

Solution

Recall that if $\langle \cdot, \cdot \rangle$ is an inner product on a vector space $\mathscr{X}$, the following hold:

1. $\langle \alpha x + \beta y, z \rangle = \alpha \langle x, z \rangle + \beta \langle y, z \rangle$;
2. $\langle x, \alpha y + \beta z \rangle = \overline{\alpha} \langle x, y \rangle + \overline{\beta} \langle x, z \rangle$;
3. $\langle x, x \rangle \geq 0$ and $\langle x, x \rangle =0 \Leftrightarrow x=0$;
4. $\langle x, y \rangle = \overline{\langle y, x \rangle}$.

Moreover, let us recall that the norm induced by the inner product is given by $\left \| x \right \|= \langle x, x \rangle^{1/2}$ for all $x \in \mathscr{X}$. With this in mind, let us proceed in demonstrating the desired identities:

Let $\mathscr{X}$ be a real inner product space. Recall that $\overline{\alpha} = \alpha$ for all $\alpha \in \mathbb{R}$, and observe that the following holds for all $u,v \in \mathscr{X}$:

Now, let $\mathscr{X}$ be a complex inner product space. Then, observe that the following holds for all $u,v \in \mathscr{X}$:

Problem 2

Let $X = C[0,1]$ with supremum norm and define $E \subset X$ to be the set of functions such that

$\int_0^{\frac{1}{2}} f - \int_{\frac{1}{2}}^1 f = 1.$

Prove that $E$ is a closed, convex subset of $X$, but that $E$ has no element of minimal norm. This gives an alternative proof of the fact that $C[0,1]$ is not a Hilbert space.

Solution

Let $X = C[0,1]$ with supremum norm and define $E:= \left \{ f \in X : \int_0^{\frac{1}{2}} f - \int_{\frac{1}{2}}^1 f = 1 \right \}$. Let us consider the linear continuous function $G: C[0,1] \rightarrow \mathbb{R}$ defined by $G(g) = \int_0^{\frac{1}{2}} g - \int_{\frac{1}{2}}^1 g$. As the preimage of a closed set under a continuous function is closed, the preimage of the closed set $\left \{1 \right \}$ must also be closed under the continuous function $G$. Following from the fact that $G^{-1} \left ( \left \{ 1 \right \} \right ) = E$ by definition, we may see that $E$ is closed.

Now, let $g, h \in E$ be arbitrary and fix $t \in [0,1]$. Then, we may observe:

It follows that, for all $g, h \in E$, it must be the case that $tg+(1-t)h \in X$ whenever $t \in [0,1]$. Thus, $E$ is convex.

Further, for all $f \in E$, let us observe that the following holds:

That is, $\left \|f \right \| \leq 1$ for all $f \in E$.

Observe that if we desire to show that $C[0,1]$ is not a Hilbert space, it suffices to show that there exists no unique element $f_0$ in $E$ such that $d(0,E) = \left \| 0-f_0 \right \|$.

To do so, we construct a sequence in $E$ which approaches the minimum of the norm, but which does not converge to an element of $C[0,1]$. Setting $n= \frac{1}{t}$ for convenience, let us then consider the sequence $\left \{ f_t \right \}_{t=1}^\infty$ defined by:

While rather undesirably messy when expressed as such, this is simply the following function:

Observe by inspection (for convenience, as this is already cumbersome enough) that $f_t \in E$ for all $t > 0$, and $\left \| f_l \right \| > \left \| f_p \right \|$ whenever $l > p$.

Moreover, note that $\left \| f_t\right \| = 1 + \frac{2n}{1-2n}$ and $\left \| f_t\right \| \rightarrow 1$ as $t \rightarrow \infty$. But, observe as well that $f_t(X) \rightarrow \chi_{[0,1/2)} (x) - \chi_{[1/2,1]} (x)$ point-wise as $t \rightarrow \infty$, which is not a continuous function. Thus, we have constructed a sequence of functions in $E$ which grow arbitrarily close to the minimum of the norm, but which do not converge point-wise to an element of $C[0,1]$. Correspondingly, there cannot exist an element of minimal norm in $E$ with regard to $0$, and thus $C[0,1]$ is not a Hilbert space.    $\blacksquare$

Problem 3

Compute

$\underset{a,b,c}{min}\int_{-1}^1 \left | x^3 - a -bx -cx^2 \right |^2 dx.$

Solution

We seek to compute $\underset{a,b,c}{min}\int_{-1}^1 \left | x^3 - a -bx -cx^2 \right |^2 dx$.

Let us begin by observing that the set $\mathscr{X} := \left \{ \alpha_1x^2+\alpha_2x +\alpha_3 \mid \alpha_i \in \mathbb{R} \right \}$ is a closed, linear subspace of the Hilbert space $L^2[-1,1]$ with inner product $\langle f, g \rangle = \int_{-1}^1 (fg)(x) dx$. Further, note that it may be easily checked via routine computation that $\mathscr{E} = \left \{ \frac{\sqrt{2}}{2}, \frac{\sqrt{6}}{2}x, \frac{3\sqrt{10}}{4}(x^2-\frac{1}{3}) \right \}$ is an orthonormal basis for $\mathscr{X}$.

Observing that $x^3 \in L^2[-1,1]$, let $f_0 \in \mathscr{X}$ be the unique element of $\mathscr{X}$ with minimal norm in regard to $x^3$. That is, let $f_0$ be such that

Then, we may compute $f_0$ by taking the projection of $x^3$ onto $\mathscr{X}$. As $\mathscr{E}$ is an orthonormal basis for $\mathscr{X}$, it follows that

As a Hilbert space is necessarily complete, the following then holds

Thus, the given integral is minimized when $a =0$, $b= \frac{3}{5}$, and $c=0$.   $\blacksquare$

Problem 4

If the closed unit ball of $\mathscr{H}$ is compact, show that $\textrm{dim} \mathscr{H} < \infty$.

Solution

Suppose $\mathscr{H}$ is a Hilbert space, and suppose the closed unit ball of $\mathscr{H}$, denoted by $\mathcal{B}_\mathscr{H}$, is compact. For some index set $I$, let $\mathscr{E} = \left \{e_i \right \}_{i \in I}$ be an orthonormal basis for $\mathscr{H}$. Then, it follows that $e_i \in \mathcal{B}$ for all $i \in I$. Moreover, as $\langle e_i, e_j \rangle =0$ for any two $e_i, e_j \in \mathscr{E}$ with $i\neq j$, observe that we may deduce:

Let us denote the open ball of radius $\delta$ about $x \in \mathscr{H}$ by $\mathcal{B}(x; \delta)$ and construct the set:

Observe that each open ball in $C$ contains precisely one element of $\mathscr{E}$. Further, by the polar identity, let us note that, for all $x \in \mathcal{B}_\mathscr{H}$, the following holds:

From this, we may see that each $x \in \mathcal{B}_\mathscr{H}$ lies within at least one open ball in the set $C$. Thus, $\mathcal{B}_\mathscr{H} \subseteq \bigcup_{i \in I}\mathcal{B}(e_i ; 1)$, and it follows that $C$ is indeed an open cover of $\mathcal{B}_\mathscr{H}$.

But, as $\mathcal{B}_\mathscr{H}$ is compact, every open cover admits a finite subcover. As no open ball in $C$ may be omitted without also omitting an element of the basis $\mathscr{E}$, it must then be the case that $C$ is finite. Correspondingly, as the cardinality of $C$ is the same as the cardinality of $\mathscr{E}$, it must be the case that our orthonormal basis is finite as well. Thus, dim$\mathscr{H} < \infty$.   $\blacksquare$

# Frattini’s Argument

Being a useful exercise to review, let’s talk about Frattini’s argument.  We’ll approach the problem first from a standpoint that (hopefully) makes things intuitively clear, then we’ll look at the classical argument advanced by Frattini.

The topic that Frattini’s argument addresses begins with a relatively straightforward question:

Given a finite group $G$ with a nontrivial normal subgroup $K$, what is the relationship of $K$ to $G$?

Let’s begin at the top and observe that, as $K$ is nontrivial by assumption, it must have a Sylow-$p$ subgroup for some prime $p$.  Without fussing over what exactly $p$ is (in fact, this will hold for any choice of $p$ dividing the order of $K$), let’s just look at an arbitrary Sylow-$p$ subgroup $P$ of $K$.

Now, as $K$ is normal in $G$, let us observe that the second isomorphism theorem (frequently also called the ‘diamond’ isomorphism theorem) gives us a variety of nice things.  In general, for any subgroup $H$ of $G$, we have that:

• $HK = KH \leq G$;
• $H \bigcap K \unlhd H$;
• $(HK) / K \cong H / (H \bigcap K)$.

Looking at the first result, recall that $HK=KH$ is a necessary and sufficient condition for $HK$ to be a subgroup of $G$ (a fact which is occasionally referred to as the ‘product theorem’).  Taking some generous liberties with notation in order to make a point, note that we can ‘think of’ $HK = KH$ as $K = H^{-1}KH$ — at least, in the sense that, for some $h_1,h_2 \in H$ and $k \in K$,  assuming $HK = KH$ gives that $h_1k = kh_2$, or, equivalently, that $k=h_1^{-1}kh_2$.  So, the manner in which this fact follows from $K$‘s normality isn’t terribly surprising.

However, the second and third results are where the real “meat” of the concept lies.  As a general rule, if one is given a Sylow-$p$ group and asked to discern something about the group structure, it is often useful to recall that $\textrm{Syl}_p(K)$ constitutes a single conjugacy class.  Frequently, we may wish to consider the number of Sylow-$p$ subgroups, but for our purposes, let’s focus only on that basic fact.

With conjugacy classes in mind, we might wish to ask which elements of $K$ or $G$ fix $P$ under conjugation.  Of course, these are respectively given by $N_K(P)$ and $N_G(P)$.  However, note that $N_G(P) \bigcap K = N_K(P)$.  This is key — we have discerned potentially interesting subgroups of $G$, and their form makes them particularly well suited to the second isomorphism theorem.

Applying the second isomorphism theorem, we may see $N_G(P) \bigcap K=N_K(P) \unlhd N_G(P)$, and that $(N_G(P)K) / K \cong N_G(P) / N_K(P)$.  This seems useful: we have a subgroup in which $P$ is normal which is, itself, normal in a larger subgroup in which $P$ is also normal.  What may we do with this?

Well, for starters, let’s notice that $N_K(P)$ being normal in $N_G(P)$ isn’t terribly surprising.  Since $K$ is normal in $G$, and since $P \leq K$, we know that $P$ remains in $K$ after conjugation by any element of $G$.  But, this allows us to reach something very strong:  The number of $G$ conjugates of $K$ is the same as the number of $G$ conjugates of $P$, which is given by $[G:N_G(P)]$.  But, more than that, notice that the number of $K$ conjugates of $P$ is precisely $n_p(K)$, which is given by $[K:N_K(P)]$.  Thus, $[G:N_G(P)]=[K:N_K(P)]$.

Now, let’s apply Lagrange’s theorem to this, yielding

$[G:N_G(P)] = \frac{|G|}{|N_G(P)|} = \frac{|K|}{|N_K(P)|} = [K:N_K(P)]$

But, notice that this then gives $\frac{|N_G(P)|}{|N_K(P)|}=\frac{|G|}{|K|}$, and as we know $(N_G(P)K) / K \cong N_G(P) / N_K(P)$, it must follow that $N_G(P)K = G$.

So, we now have at least a reasonable answer to our initial question:  If $G$ is a finite group with nontrivial normal subgroup $K$, then $N_G(P)K=G$ for any $P \in \textrm{Syl}_p(K)$.

Diverging from the previous line of thinking briefly, let’s consider a few calculations which properly constitute Frattini’s argument.  These show the same fact that we discovered above, but from a more blunt and direct angle.

Applying the fact that $K$ is normal in $G$, let us note that, for any $g \in G$, we have that $P^g \leq K$ and, by fact that $\textrm{Syl}_p(K)$ forms a single conjugacy class, $P^g \in \textrm{Syl}_p(K)$.   Applying the conjugation argument yet again, we then yield that there must be $k \in K$ such that $(P^g)^k = P$.  But note that this is simply $k^{-1}g^{-1}Pgk$, so $gk \in N_G(P)$.  Moreover, $g \in N_G(P)K$, as $k \in K$ and, as $g$ was arbitrary, we have that $N_G(P)K = G$.

With this in mind, let’s conclude by reflecting briefly on what all of the above might give us.  For one, one might notice that we might apply this to any Sylow-$p$ subgroup of $H$, giving a whole host of different quotients of elements of $G$ isomorphic to the canonical quotient $G / H$.  Or, perhaps we might apply this to a subgroup of $G$, replacing $K$ with a subgroup now normal in the new, smaller group.  Among other things, we could apply this argument to show that $N_G(N_G(P))=N_G(P)$ for any $P \in \textrm{Syl}_p(G)$.

Most importantly though (in my mind, at least), it reveals a small part of just how much structure normality gives:  the way $G$ behaves everywhere except on a normal subgroup $K$ is precisely the way that the parts of $G$ not in $K$ which leave Sylow-$p$ subgroups of $K$ invariant behave.

(Note:  The image included was first posted by the stackexchange user p Groups in an excellent answer to this question regarding the second isomorphism theorem.)