The Open Mapping Theorem

The open mapping theorem (or the Banach-Schauder theorem, if you prefer) is an incredibly important, relatively straightforward and digestible result in functional analysis which plays a crucial role in a large variety of other interesting theorems.  As an exercise, we’ll prove the open mapping theorem here in the standard fashion.  This will hopefully serve as a useful reference for later posts about metric regularity and linear openness, which serve as a measuring device to quantify the degree to which a map is open.

Stating the Open Mapping Theorem

The open mapping theorem may be stated as follows:

Theorem (Open Mapping)

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a bounded linear operator. Then, if $T$ is surjective, $T$ is an open map.

Restating this another way (with $\mathbb{B}$ denoting the open unit ball), the theorem reads:

Theorem (Open Mapping)

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a bounded linear operator. Then, if $T$ is surjective, $0 \in int A(\mathbb{B}_\mathfrak{X})$.

At the heart of the open mapping theorem is the notion that there is a link between precision and isomorphism among complete normed spaces (though the theorem holds under the even weaker assumption of local convexity as well): if the equation $Tx=y$ has at least one solution for any $y \in \mathfrak{Y}$, then either $T$ is an isomorphism and $\mathfrak{Y}$ is isomorphic to $\mathfrak{X}$, or $T$ is a quotient map and $\mathfrak{Y}$ is isomorphic to a quotient of $\mathfrak{X}$. Taken another way, the open mapping theorem links precision and estimation — at least in the sense that if $Tx=y$ has a solution for any $y \in \mathfrak{Y}$, then there exists a constant $k>0$ such that $\|x\|_\mathfrak{X} \leq k\|y\|_\mathfrak{Y}$. More specifically (and this keys us in to precisely how open mappings relate to metric regularity), $k^{-1} = \sup{ r > 0 : r\mathbb{B}_\mathfrak{Y} \subseteq T(\mathbb{B}_\mathfrak{X} }.$ We say that the value $k^{-1}$, as defined here, is the Banach constant of $T$, and will prove useful in the future.

We will prove the open mapping theorem in detail to highlight precisely the epsilon-delta argument used, which will hopefully allow us to see how weakening of linear openness may be better understood.

Proving the Open Mapping Theorem

Our proof of the open mapping theorem will rely on the Baire category theorem. Because of this, we will also prove the Baire Category theorem. However, let’s first define a Baire space.

The modern definition of a Baire space is often given in one of the four following equivalent forms:

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if every intersection of a countable collection of dense open subsets of $\mathcal{X}$ is also dense.

That is, Baire spaces preserve density under countable intersections (notice that open need not be specified, as a dense closed set is necessarily the entire space).

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if every union of a countable collection of closed subsets of $\mathcal{X}$ with empty interior has empty interior.

That is, Baire spaces preserve boundary sets under countable unions (notice that closed need not be specified, as an open set with empty interior is necessarily the empty set).

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if the interior of every union of a countable collection of closed, nowhere dense subsets of $\mathcal{X}$ is empty.

Definition (Baire Space)

Let $\mathcal{X}$ be a topological space. We say $\mathcal{X}$ is a Baire space if, whenever any union of countably many closed subsets of $\mathcal{X}$ has an interior point, then one of the closed subsets has an interior point.

The historical definition of a Baire space involves Baire’s notion of categories (not to be confused with the categories of category theory), but is also equivalent.

Definition (Sets of First and Second Category)

Let $\mathcal{X}$ be a topological space. We say a subset $\mathcal{W}$ of $\mathcal{X}$ is:

1.  of first category (or, often, meagre) in $\mathcal{X}$ if there exist a sequence $\left \{N_i\right \}_{i=1}^\infty$ of nowhere dense subsets of $\mathcal{X}$ such that $\mathcal{W}=\bigcup_{i=1}^\infty N_i$;
2. of second category in $\mathcal{X}$ if $\mathcal{W}$ is not of first category in $\mathcal{X}$.

This leads to Baire’s original definition, which is as follows:

Definition (Baire Space)

We say $\mathcal{X}$ is a Baire space if every non-empty open set in $\mathcal{X}$ is of second category in $\mathcal{X}$.

That is to say, open sets in Baire spaces are, in a sense, suitably ‘substantial’ or ‘large’ — at least, insofar as Baire spaces have no open sets which are meager.

Correspondingly, we also can find one further historical definition of a Baire space using this notion:

Definition (Comeagre set)

Let $\mathcal{X}$ be a topological space. We say that a subset $\mathcal{W}$ of $\mathcal{X}$ is comeagre if its compliment $\mathcal{W}^C$ is meager.

Definition (Baire Space)

We say $\mathcal{X}$ is a Baire space if every comeagre subset of $\mathcal{X}$ is dense in $\mathcal{X}$.

Seeing the straight-up crazy number of different definitions of a Baire space, one might wonder why these spaces deserve so much fuss. To this end, let us observe that Baire spaces enjoy a combinatorial property akin to the pigeonhole principle, which is (very obviously) equivalent to their definition.

Theorem (Interior Pigeonhole Property)

Let $E_1,E_2,\hdots$ be an arbitrary, at most countable, sequence of closed subsets of $\mathcal{X}$. We say $\mathcal{X}$ has the interior pigeonhole property if, whenever $\bigcup_n E_i$ has nonempty interior, then at least one $E_i$ has nonempty interior.

Let $\mathcal{X}$ be a topological space. If $\mathcal{X}$ has the interior pigeonhole property, then $\mathcal{X}$ is a Baire space.

So, Baire spaces are useful in some contexts because they allow us to — in a somewhat combinatorial fashion — extract data about the existence of a subset satisfying a certain property among a collection of subsets by looking at the properties of a larger set which contains them. Determining if a topological space is a Baire space allows us to utilize arguments of this form, so we frequently are interested in conditions dictating whether a topological space does indeed have this desirable property. In particular, we could consider this question in the context of a complete metric space, which leads us to the so-called Baire Category Theorem.

Theorem (Baire Category)

Every complete metric space is a Baire space.

Proof

Let $\mathcal{X}$ be a complete metric space. Using definition the first definition of a Baire space, we seek to show that a countable intersection of dense subsets is dense. To that end, let $\left \{E_n \right \}_{n=1}^\infty$ be a countable collection of dense subsets of $\mathcal{X}$. As a subset is dense if and only if every nonempty open subset intersects it, it is then sufficient to show that any nonempty open subset $W \subset \mathcal{X}$ has a point $x$ which lies in the intersection of $W$ with each $E_n$.

Proceeding in this fashion, observe that since $E_1$ is dense, $E_1\bigcap W \neq \emptyset$. Thus, there exists $x_1 \in E_1\bigcap W$ and real constant $0 such that $\overline{\mathbb{B}(x_1,r_1)} \subseteq E_1\bigcap W$.

Now, observe that, as $E_2$ is dense, $E_2 \bigcap \mathbb{B}(x_1,r_1) \neq \emptyset$, and we may find a point $x_2$ in the intersection and positive radius $0 < r_2<\frac{1}{2}$ such that $\overline{\mathbb{B}(x_2,r_2)} \subset \mathbb{B}(x_1,r_1)$. Continuing recursively, we find a pair of sequences $\left \{x_n\right \}_{n=1}^\infty$ and $\left \{r_n\right \}_{n=1}^\infty$ such that $0 < r_n < \frac{1}{n}$ and $\overline{\mathbb{B}(x_n,r_n} \subset \mathbb{B}(x_{n-1},r_{n-1})\bigcap E_{n}$. Thus, we have a nested sequence of closed and bounded subsets $\left \{\overline{\mathbb{B}(x_n,r_n)}\right \}_{n=1}^\infty$, which are correspondingly compact by the Heine-Borel theorem. Applying Cantor’s intersection theorem, this then yields a fixed point

Moreover, as the sequence $\left \{x_n\right \}_{n=1}^\infty$ is Cauchy and $\mathcal{X}$ is complete, $x_n \rightarrow x \in \mathcal{X}$.

Therefore, we may conclude that the intersection of a countable number of dense open subsets of a complete metric space is dense, and that every complete metric space is correspondingly a Baire space.  $\blacksquare$

We will use this result in a central way to prove the open mapping theorem. However, we first need three lemmas.

Lemma
A normed space $\mathfrak{X}$ is a Banach space if and only if every absolutely convergent series in $\mathfrak{X}$ converges in $\mathfrak{X}$.

Proof

$\mathbf{[\Rightarrow]}$

Let $\mathfrak{X}$ be a Banach space and $\left \{x_n \right \}$ an arbitrary sequence in $\mathfrak{X}$ such that $\sum_{k=1}^\infty\| x_n \|$ converges. It then follows that the partial sums of the series are a Cauchy sequence, and by the completeness of $\mathfrak{X}$, the series $\sum_{k=1}^\infty x_n$ converges to an element of $\mathfrak{X}$.

$\mathbf{[\Leftarrow]}$

Let $\mathfrak{X}$ be a normed space and suppose that every absolutely convergent series in $\mathfrak{X}$ converges in $\mathfrak{X}$. We must now show that every Cauchy sequence in $\mathfrak{X}$ converges. To that end, let $\left \{x_n \right \}$ be an arbitrary Cauchy sequence in $\mathfrak{X}$, and let $\left \{x_{n_k} \right \}$ be a subsequence of $\left \{ x_n \right \}$ such that $\| x_{n_{k+1}} - x_{n_k} \| < 2^k$. It then follows that $\sum_{k=1}^\infty \| x_{n_{k+1}} - x_{n_k} \|$ converges, and by our assumption, that $\sum_{k=1}^\infty x_{n_{k+1}} - x_{n_k}$ converges as well to some $x \in \mathfrak{X}$. Observe that

As this series converges, it then follows that $x_{N_{k+1}} - x_{n_1} \rightarrow x - x_{n_1}$ for some $x \in \mathfrak{X}$. Thus, we have shown that $\left \{ x_{n_k} \right \}$ is a convergent subsequence of the Cauchy sequence $\left \{ x_n \right \}$ and, correspondingly, we may conclude that $\left \{ x_n \right \}$ converges in $\mathfrak{X}$. Being that $\left \{ x_n \right \}$ was arbitrary, it then follows that $\mathfrak{X}$ is complete and, thus, a Banach space.  $\blacksquare$

Lemma

Let $\mathfrak{X}$ be a Banach space, $\mathcal{Y}$ a normed space, and $T \in \mathscr{B}(\mathfrak{X},\mathcal{Y})$. If $r,s > 0$ is are constants such that $s\mathbb{B}_\mathcal{Y} \subset \overline{T(r\mathbb{B}_\mathfrak{X})}^o$, then $s\mathbb{B}_\mathcal{Y}^o \subset T(r\mathbb{B}_\mathfrak{X})^o$.

Proof

Let $\mathfrak{X}$ be a Banach space, $\mathcal{Y}$ a normed space, and $T \in \mathscr{B}(\mathfrak{X},\mathcal{Y})$. Further, suppose we have a constant $r > 0$ such that $r\mathbb{B}_\mathcal{Y} \subset \overline{T(\mathbb{B}_\mathfrak{X})}^o$. Noting that scaling is a homeomorphism, without loss of generality we take $r=s=1$ by alternatively considering $\frac{r}{s}T$.

Having done this, let us first choose an arbitrary $z \in \mathbb{B}_\mathcal{Y}$, and choose $\delta>0$ such that $\|z\|_\mathcal{Y} < 1-\delta< 1$ (that is, $z \in (1-\delta)\mathbb{B}_\mathcal{Y} \subseteq \mathbb{B}_\mathcal{Y}^o$). Then, choose $y \in Y$ by $y = (1-\delta)^{-1}z$, observing that $\|y\|_\mathcal{Y} = \|(1-\delta)^{-1}z\|_\mathcal{Y} < 1$. We will demonstrate that $y \in (1-\delta)^{-1}T(\mathbb{B}_\mathfrak{X})^o$, which correspondingly gives that $z \in T(\mathbb{B}_\mathfrak{X})^o$.

To do so, we find a sequence $\left \{ y_n \right \}_{n=0}^\infty \subset \mathcal{Y}$ such that

That is, a sequence which converges to $y$ and has the difference of successive terms in successively smaller contractions of $T(\mathbb{B}_\mathfrak{X})^o$.

As $z \in \overline{T(\mathbb{B}_\mathfrak{X})}^o$, $z$ is the limit of a sequence $\left \{ T(w_n) \right \}_{n=1}^\infty \subset T(\mathbb{B}_\mathfrak{X})^o$ where $w_n \in \mathbb{B}_\mathfrak{X}$. Then, it follows that for all $\epsilon > 0$, there exists $N(\epsilon) \in \mathbb{N}$ such that $\|z-T(w_m)\|_\mathcal{Y} < \epsilon$ for all $m \geq N(\epsilon)$. Following this, set $\epsilon_n = (1-\delta)\delta^n$. Now, let $y_n = (1-\delta)^{-1}T(w_{N(\epsilon_n)})$ for $n \geq 1$ and $y_0=0$ (which does indeed work, as $\|z\|_\mathcal{Y}<(1-\delta)$). Notice that this allows us to yield the following observations:

Thus, as $y_n \in (1-\delta)^{-1}T(\mathbb{B}_\mathfrak{X})^o$, we then have that $y_n-y_{n-1} \in \delta^{n-1}T(\mathbb{B}_\mathfrak{X})^o$, and our desired properties have been fulfilled.

Following this, we find a convergent sequence in $\mathfrak{X}$ which also converges to $y$ under $T$ to demonstrate that $y \in (1-\delta)^{-1}\mathbb{B}_\mathfrak{X}$. That is, we seek a sequence $\left \{ x_n \right \}_{n=1}^\infty \subset \mathfrak{X}$ such that $\|x_n\|_\mathfrak{X} < \delta^{n-1}$ and $T(x_n)=y_n-y_{n-1}$. To do this, let us first notice that, for $x \notin ker(T)$, we have $1 \leq \|T(\frac{x}{\|x\|_\mathfrak{X}})\|_\mathcal{Y}$, as $\mathbb{B}_\mathcal{Y}^o \subseteq \overline{T(\mathbb{B}_\mathfrak{X})}^o$. Setting $x_n = (1-\delta)^{-1}\left (w_{N(\epsilon_n)}-w_{N(\epsilon_{n-1})} \right )$, clearly $T(x_n) = y_n - y_{n-1}$ by linearity, and the following then holds for $n \geq 1$:

Thus, $\|x_n\|_\mathfrak{X} < \delta^{n-1}$. Following this, we may now also notice that

so the series $\sum_{n=1}^\infty x_n$ is absolutely convergent, and correspondingly by the previous lemma, we then have that $\sum_{n=1}^\infty x_n = x^* \in \mathfrak{X}$ by the completeness of the space.

Moreover, note $\|x^*\|_\mathfrak{X} < (1-\delta)^{-1}$, so $x^* \in (1-\delta)^{-1}\mathbb{B}_\mathfrak{X}$. Thus, by the linearity of $T$, we then have that

and, correspondingly, $y \in (1-\delta)^{-1}T(\mathbb{B}_\mathfrak{X})^o$. Hence, it follows that

As such, we have demonstrated that, if $z \in \mathbb{B}_\mathcal{Y}^o \subset \overline{T(\mathbb{B}_\mathfrak{X})}^o$, then $z \in T(\mathbb{B}_\mathfrak{X})^o$ as well. As $z$ was chosen arbitrarily, this yields that $\mathbb{B}_\mathcal{Y}^o \subset T(\mathbb{B}_\mathfrak{X})^o$ as desired.  $\blacksquare$

Lemma

Let $V$ and $W$ be $\mathbb{R}$-vector spaces and $T: V \rightarrow W$ a bounded linear map. If $C \subset V$ is convex in $V$, then $T(V)$ is convex in $W$.

Proof

Let $V$ and $W$ be $\mathbb{R}$-vector spaces, $T: V \rightarrow W$ be a bounded linear map, and $C \subset V$ be convex in $V$. We seek to show that, for all $x,y \in T(C)$, $(1-t)x+ty \in C$ for all $t \in [0,1]$.

To that end, let $x,y \in T(C)$ and $t \in [0,1]$ be arbitrary. Then, $x = T(v)$ and $y = Tu$ for some $v,u \in C$. Moreover, by the convexity of $C$, we then have that $(1-t)v+tu \in C$. But, by the linearity of $T$, we yield the following inclusion

Thus, $T(C)$ is convex as well.  $\blacksquare$

Now, we use these results to prove the open mapping theorem, which we will recall is stated as follows:

Theorem (Open Mapping)

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a bounded linear operator. Then, if $T$ is surjective, $0 \in int A(\mathbb{B}_\mathfrak{X})$.

Proof

Let $\mathfrak{X}$ and $\mathfrak{Y}$ be Banach spaces and $T : \mathfrak{X} \rightarrow \mathfrak{Y}$ a surjective bounded linear operator. If $\mathfrak{Y}$ is the trivial space, then we are done. Suppose $\mathfrak{Y}$ is not the trivial space. By the linearity of the spaces, it is sufficient to show that $T$ maps $\mathbb{B}_\mathfrak{X}$ to a neighborhood of the origin of $\mathfrak{Y}$.

First, let us note that $\mathfrak{X} = \bigcup_{n=1}^\infty n\mathbb{B}_\mathfrak{X}$. Correspondingly, by the surjectivity and linearity of $T$, we then have that $\mathfrak{Y} = T(\mathfrak{X}) = \bigcup_{n=1}^\infty nT(\mathbb{B}_\mathfrak{X}).$

As $\mathfrak{X}$ and $\mathfrak{Y}$ are Banach spaces, they are also Baire spaces. Correspondingly, as the whole space $\mathfrak{Y}$ has nonempty interior, there exists $n \in \mathbb{N}$ such that $\overline{nT(\mathbb{B}_\mathfrak{X})}^o \neq \emptyset$. Thus, there then must exist $y_0 \in \mathfrak{Y}$ and $r > 0$ such that $(y_0+r\mathbb{B}_\mathfrak{Y})^o \subset \overline{nT(\mathbb{B}_\mathfrak{X})}^o$.

Moreover, observe that if $y \in (y_0 + \mathbb{B}_\mathfrak{Y})^o \subset \overline{nT(\mathbb{B}_\mathfrak{X})}^o$, then $-y \in \overline{nT(\mathbb{B}_\mathfrak{Y})}^o$ as well by the linearity of $T$. Thus, $(-y_0+r\mathbb{B}_\mathfrak{Y})^o \subseteq \overline{nT(\mathbb{B}_\mathfrak{X})}^o$, and as the image of a convex set under a bounded linear map is convex by the third lemma above, we then have that

By the second lemma, we then may conclude that $r\mathbb{B}_\mathfrak{Y}^o \subset nT(\mathbb{B}_\mathfrak{X})^o$. Therefore, as we have shown that $T$ maps $\mathbb{B}_\mathfrak{X}$ to a neighborhood of the origin of $\mathfrak{Y}$, it follows that $T$ is then an open map.  $\blacksquare$

NOTE

This post draws on a large number of resources that I’ve encountered at various points over the last few years, most of which I didn’t write down at the time.  If you recognize any of the proofs given above, I’d love to know where you’ve seen them so I can properly cite the source.

A Control Theory Perspective

In this post, we’ll talk about the trace parameterization of nonnegative Hermitian trigonometric polynomials, providing a proof which depends on the Kalman-Yakubovich-Popov lemma.  This follows very closely along the lines of Chapter 2, Section 2.5 of Dumitrescu’s book Positive Trigonometric Polynomials and Signal Processing Applications” (Springer, 2007).

Abstract

The trace parameterization of a Hermitian trigonometric polynomial which is nonnegative on the unit circle is an analog to the Riesz-Fejér theorem, which connects the convex set of trigonometric polynomials which are nonnegative on the unit circle to the set of positive definite matrices by means of causal polynomials. For this reason, the trace parameterization is often useful in computational settings, as it allows one bring the methods of semidefinite programming to bear on a problem involving these polynomials. We sketch an equivalent formulation of the trace parameterization, derived instead from the control-theoretic Kalman-Yakubovich-Popov lemma. This reveals a surprising connection between Hermitian trigonometric polynomials which are nonnegative on the unit circle and positive real transfer functions of linear time-invariant control systems with linear time-invariant feedback.

Basic Harmonic Analysis

Let us first establish some of the relevant Harmonic analysis terminology, notation, and definitions before proceeding. First, let us define the general object with which we will concern ourselves.

Definition (Trigonometric Polynomial)

Let $R(x) \in \mathbb{C}[x]$. We say that $R$ is a trigonometric polynomial if

$\displaystyle R(x) = a_0 + \sum_{k=1}^n a_n cos(nx) + i\sum_{k=1}^n b_n sin(nx), \enskip x \in \mathbb{R}.$

Using Euler’s formula, this can equivalently be seen as a polynomial $R(z) \in \mathbb{C}_n[z]$ given by

$\displaystyle R(z) = \sum_{k=-n}^n r_kz^{-k}, \enskip z \in \mathbb{C}.$

A specific subset of trigonometric polynomials will be the primary item on which we focus in this post.

Definition (Hermitian Trigonometric Polynomial)

If $R(z) \in \mathbb{C}_n[z]$, we say that $R$ is a Hermitian trigonometric polynomial if

$\displaystyle R(z) = \sum_{k=-n}^n r_kz^{-k}$

and $r_{-k}=r^*_k$.

Letting $r_k = \rho_ke^{i\theta_k}$, the final condition above then says that $r_{-k} = \rho_ke^{-i\theta_k}$. This allows us to see the symmetry which makes these polynomials particularly interesting, namely:

$R(z) = \rho_0 e^{i\theta_0} + \sum_{k=1}^n \left ( \rho_ke^{i\theta_k} z^{-k}+\rho_ke^{-e\theta_k} z^{k} \right ).$

It can be shown that Hermitian trigonometric polynomials are real-valued as well. For $u_k,v_k \in \mathbb{R}$ with $-v_k = v_{-k}$, we may write the complex coefficients of $R(z)$ as $r_k = u_k+iv_k$. This then allows us to see the following holds for values on the unit circle:

$R(z) = \sum_{k=-n}^n (u_k+iv_k)z^{-k} = u_0+2\sum_{k=1}^n u_k cos(k\theta) + 2 \sum_{k=1}^n v_ksin(k\theta).$

So, it is indeed the case that Hermitian trigonometric polynomials are real-valued.

A portion of a Hermitian trigonometric polynomial is, itself another important class of polynomials. In fact, this class of polynomials allows such polynomials as we’ve seen above to be written in a very nice way.

Definition (Causal Polynomial)

Let $\displaystyle H(z) \in \mathbb{C}_n[z]$. We say that $H$ is a causal polynomial if

$H(z) = \sum_{k=0}^n h_k z^{-k}.$

That is, Causal polynomials are simply polynomials with no $z^k$ terms for $k > 0$.

Notice then that the causal part of $R(z)$ as given above (which we denote by $R_+(z)$) is simply

$R_+(z) = \frac{r_0}{2} + \sum_{k=1}^n r_kz^{-k}.$

As another notational item, for a polynomial $p(z) = \sum_{k=0}^n p_k z^{k} \in \mathbb{C}[z]$, let us write $p^*(z) = \sum_{k=0}^n p^*_k z^{k}$.

Now, with this notation in mind, note that — as $r_0 \in \mathbb{R}$ for $R(z)$ a Hermitian trigonometric polynomial — we may observe

$R(z) = R_+(z) + R_+^*(z^{-1}).$

So, Hermitian trigonometric polynomials are necessary the sum of two causal polynomials. Later, we will see that they are also the product of causal polynomials, demonstrating that these two classes of polynomials are indeed intimately linked.

Basic Control Theory

Control theory broadly concerns itself with control-parameterized dynamical systems of ODE’s of the form

$\dot{x}=f(x(t),u(t))$

where $\dot{x}$ denotes $\frac{dx(t)}{dt}$ and $f$ is a sufficiently smooth vector field (which generally depends on the context of the problem). We say that $x(t)$ is the state vector, which is point-wise in $\mathbb{R}^n$, and $u(t)$ is the and control vector, which is point-wise in $\mathbb{R}^m$. Both the state vector and control vector can be thought of as being bounded, continuous vector fields as well (for our purposes, as least), and we generally consider dynamics which are time-invariant (i.e. autonomous) and finite dimensional.

Considering a fixed initial control $u$ and some initial value problem $\dot{x}=f(x,u), \quad x(0)=x_0$, if the correct assumptions are made (i.e. satisfying Pickard-Lindelöff or the like), we can (in theory) find a solution $\Phi_{x_0}(t)$. This essentially is the main problem in addressed by the theory of ODE’s. Control theory, very generally, is concerned with the reverse problem: Given a desired trajectory $\Phi_{x_0}(t)$, we seek to find a control vector field $u$ such that $\Phi_{x_0}(t)$ is a solution to the IVP.

Frequently, we will also concern ourselves with systems that have feedback and controls which may depend on that feedback. Intuitively, feedback can be thought of as a measurement of part of a state of the system — think of the cruise control in a car, which accelerates or decelerates based in response to the measured speed of the vehicle and the present control being applied. More precisely, let us make the following definitions:

Definition (Open Loop and Closed Loop Systems)

Consider the autonomous dynamical control system governed by ODE’s with feedback

$\dot{x}=f(x(t),u(t,y)), \enskip y=h(x(t)) \quad \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{control} \enskip u(t,y) \in \mathbb{R}^m, \enskip \textrm{and feedback} \enskip y \in \mathbb{R}^p$

We say such a controller $u(t,y)$ is a state-feedback control, and say that such a system is a closed loop system. A system in which the controller $u(t)$ does not depend on the feedback is called an open loop system.

We will consider a particularly nice class of control systems called linear, time invariant systems, which we will now define.

Definition (Linear Time-Invariant Systems)

An autonomous dynamical control system govened by ODE’s with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t)$

where $A,B,C,D$ are appropriately sized matrices, is said to be a linear time-invariant system.

For these systems, we will particularly concerned with two concepts that are essentially dual (and are dual general, not just in the linear case): controllability and observability. Before we see this link, let us first build up to each term’s respective definitions. First, we will need the notion of the reachable set.

Definition (Reachable Set)

Consider the autonomous dynamical control system governed by ODE’s

$\dot{x}=f(x(t),u(t)) \quad \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{and control} \enskip u(t) \in \mathbb{R}^m.$

Given a fixed $x_0 \in \mathbb{R}^n$ and control $u(t)$ defined for $t \geq 0$, denote by $\Phi_t(x_0,u)$ the solution to the IVP

$\dot{x}=f(x(t),u(t)), \quad x(0)=x_0.$

We define the reachable set at $x_0$ to be

$\mathscr{R}(x_0) = \left \{ x \in \mathbb{R}^n: \exists T \geq 0 \enskip \textrm{and input trajectory} \enskip u(t) \enskip s.t. \enskip \Phi_T(x_0,u)=x \right \}.$

That is, the reachable set at $x_0$ is everywhere you can steer the system to beginning at $x_0$ in a finite amount of time. Of particular interest is the case when we can steer the system anywhere we would like, which leads to the concept of controllability.

Definition (Controllability)

Consider the autonomous dynamical control system governed by ODE’s

$\dot{x}=f(x(t),u(t,y)), \enskip \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{control} \enskip u(t,y) \in \mathbb{R}^m$

We say the system is controllable at $x_0$ if $\mathscr{R}(x_0)=\mathbb{R}^n$. If the system is controllable at any $x_0 \in \mathbb{R}^n$, we say the system is controllable.

Essentially, the internal state of a controllable system can be moved from any initial state to any final state in finite time — it is both an intuitive notion and a rather strong condition!

The notion of observability can be seen in a relatively concrete fashion as well. Broadly, observability is is a measure for how much of the internal state of a system can be accurately inferred by the output. Think of a nice thermostat — if you set the tempurature to a certain level, the system will keep the the room at that level by turning on and off the heat in response to its internal thermometer, which (barring any strange lacunas in the air flow of the room) precisely yield a current measurement for the internal state of the system. More precisely, let us make the following definition for a simple class of systems with which we will concern ourselves:

Definition (Observability)

Consider the autonomous dynamical control system governed by ODE’s

$\dot{x}=f(x(t),u(t,y)), \enskip y=h(x(t)) \quad \textrm{with state vector} \enskip x(t)\in \mathbb{R}^n, \enskip \textrm{control} \enskip u(t,y) \in \mathbb{R}^m, \enskip \textrm{and feedback} \enskip y \in \mathbb{R}^p$

We say the system is observable if, at any time $T$ and under any sequence of valid state and control vectors $\left \{u_i(t,y) \right \}_{i=0}^r$ and $x_k(t)_{k=0}^l$ applied to the system during the interval $[0,T]$, the internal state of the system can be recovered in finite time from only the data yielded by $y$.

In more colloquial terms, this says that the behavior of the entire system — no matter what fashion it has been steered in previously — can be determined at any time, and in finite time, by the output of the system alone. The link between observability and controllability can be seen rather naturally in this context: Controllability allows one to steer the internal state of a system anywhere one wishes in a finite amount of time irrespective of the starting point, while observability allows one to determine the internal state of a system in a finite amount of time irrespective of the previous steering.

For the case of linear, time-invariant systems, controllability and observability have particularly nice characterizations which make this duality quite clear. The results can be summarized by the following theorems:

Theorem (Kalman Controllability Criteria)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t).$

The, the system is controllable if and only if the Kalman matrix (or controllability matrix) given by

$K(A,B) = \begin{bmatrix} B & AB &A^2B &\hdots & A^{n-1}B \end{bmatrix}$

has full row rank (i.e. is of rank $n$). If this holds, we say that $(A,B)$ is a controllable pair.

(Note that feedback was not needed in this definition. We, however, opt to include the feedback to stress that this result does indeed also apply to systems with feedback, which will be our primary interest here.)

This result follows from the fact that the range of the Kalman matrix is the reachable set from $0$, i.e. $\mathcal{R}_T(0) = \textrm{range}(K(A,B))$. While it can be shown without much work that $\mathcal{R}_T(0) \subseteq \textrm{range}(K(A,B))$, we must construct the so-called controllability Gramian of $(A,B)$ to achieve our desired equality. This is outside the scope of our interests here, so we will omit the proof at present.

Now, we will see that a similar condition exists for observability.

Theorem (Observability Criteria)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t).$

The, the system is observable if and only if the Observability matrix given by

$O(A,B) = \begin{bmatrix} C & CA &C^2A &\hdots & C^{n-1}A \end{bmatrix}$

has full row rank (i.e. is of rank $n$). If this holds, we say that $(A,C)$ is an observable pair.

Why this condition is necessary is perhaps less clear than the controllability criteria, and has to do with placing the eigenvalues of $A-LC$ in the left of the complex plane by a suitable choice of $L$ in order to make the estimation error of a so-called Luenberger observer go to zero as $t \rightarrow \infty$. We will not delve in to this here, as it is somewhat lengthy. The basic gist can be summarized as saying that, if $n$ rows of the observability matrix are linearly independent, then each of the $n$ states of the system can be obtained through linear combinations of the output variable $y$.

Finally, we must define the notion of a transfer function. In our context, a transfer function is simply a representation of the relation between the input and the output of a linear time-invariant system in terms of the temporal frequency. More precisely, in our case, the transfer function $\mathbf{G}(s)$ is the linear mapping of the Laplace transform of the input to the Laplace transform of the output. Let us now demonstrate what we mean.
Let $\mathbf{V}(s) = \mathcal{L}\left \{v(t)\right \} = \int_{0}^\infty v(s)e^{-st}dt$ denote the usual Laplace transform of a function $v(t)$ (where $s$ is used here to represent a complex variable instead of the standard $z$ purely by historical convention).
Taking the Laplace transform of $\dot{x} = Ax(t)+Bu(t)$ yields

$s\mathbf{X}(s) - x(0) = A\mathbf{X}(s)+B\mathbf{U}(s).$

Thus,

$\mathbf{X}(s) = (sI-A)^{-1}x(0)+(sI-A)^{-1}B\mathbf{U}(s).$

Substituting this into the output, we yield

$\mathbf{Y}(s) = C\left ( (sI-A)^{-1}x(0)+(sI-A)^{-1}B\mathbf{U}(s) \right ) + D\mathbf{U}(s).$

Thus, if $\mathbf{G}(s)$ is the transfer function, it is a linear mapping of $\mathbf{X}(s)$ to $\mathbf{Y}(s)$, and we result in the following definition:

Definition (Transfer Function of a Linear, Time-Invariant System)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) = Bu(t,y), \qquad y=Cx(t)+Du(t).$

Then, the transfer function of the system is the linear mapping of $\mathbf{X}(s)$ to $\mathbf{Y}(s)$ given by

$\mathbf{G}(s) = C(sI-A)^{-1}B+D.$

This, combined with the rest of the terminology given here, turns out (surprisingly) to be enough to give an alternate derivation of the trace parameterization of Hermitian trigonometric polynomials.

Trace Parameterization:  A Harmonic Analysis Perspective

Let us begin with some notation. For shorthand, we will denote the vector containing the canonical basis elements for a polynomial of degree $n$ by

Noting that $\psi_n(z) \in \mathbb{C}[z]^{(n+1) \times 1}$, a familiarity with basic linear algebra allows us to see that the action of a vector of dimension $1 \times (n+1)$ (i.e. a linear functional) on $\psi_n(z)$ generates a polynomial. That is, if $b_n(z) = \sum_{k=0}^n b_k q_k(z) \in \mathbb{C}[z]^{1\times (n+1)}$ where $q_k(z) \in \mathbb{C}[z]$ for each $k$, then a new polynomial $p(z)$ is generated by the following action:

$\displaystyle b_n(z)\psi_n(z) = \sum_{i=0}^n b_iq_i(z)z^i = p(z). \qquad\qquad(*)$

The standard proof of the Riesz-Fejèr theorem uses this observation precisely to characterize nonnegative Hermitian trigonometric polynomials. The theorem is stated as follows:

Theorem (Riesz-Fejèr)

Let $R(z) \in \mathbb{C}[z]$ be a Hermitian trigonometric polynomial. Then, $R$ is nonnegative on the unit circle if and only if there exists a causal polynomial

$\displaystyle H(z) = \sum_{k=0}^n h_kz^{-k}$

such that

$R(z) = H(z)H^*(z^{-1}).$

Then, for a nonnegative Hermitian trigonometric polynomial, it is useful to note here that if $z = e^{i\theta}$ (that is, if $z$ is restricted to the unit circle), the following holds:

That is, all nonnegative Hermitian trigonometric polynomials are the squared moduli of causal polynomials.

We may also note that, as $r_{-k} = r_k^*$, we then have that

$\displaystyle r_k = \sum_{i=k}^n h_ih_{i-k}^*.$

So, a characterization of all Hermitian trigonometric polynomials nonnegative trigonometric polynomials is given by a choice of their respective causal polynomials. However, in some important computational circumstances, this characterization may not encode the coefficients $r_k$ in a particularly optimal way. For those circumstances in which the characterization of $r_k$ is indeed non-optimal, we may wish to consider a parameterization which relies on a particularly nice case of the observation $(*)$ — namely, when $p(z) = \psi_n^T(z^{-1})Q\psi_n(z)$. This leads to a definition:

Definition (Gram Matrix of a Hermitian Trigonometric polynomial)

A Hermitian matrix $Q \in \mathbb{C}^{(n+1) \times (n+1)}$ is called a Gram matrix associated with a Hermitian trigonometric polynomial $R \in \mathbb{C}_n[z]$ if

$R(z) = \psi_n^T(z^{-1})Q\psi_n(z).$

Moreover, we denote the set of all Gram matrices associated with $R(z)$ by $\mathcal{G}(R)$.

As suggested by the notation $\mathcal{G}(R)$, a Gram matrix associated with a given Hermitian trigonometric polynomial $R$ need not be unique.

Following the definition of a Gram matrix, we may now provide a theorem yielding the so-called trace parameterization of a Hermitian trigonometric polynomial $R(z)$, which allows for a different method of encoding the coefficients $r_k$.

Theorem (Trace Parameterization)

Let $R(z) \in \mathbb{C}_n[z]$ be a Hermitian trigonometric polynomial, and let $Q \in \mathcal{G}(R)$ be an associated Gram matrix of $R$. Then, the trace parameterization of $R(z)$ is given by

$\displaystyle r_k = tr[\Theta_k Q] = \sum_{i=max(0,k)}^{min(n+k,n)} q_{i,(i-k)} , \quad k \in \left \{-n, -(n-1) \hdots, n\right \}$

where $\Theta_k$ is the elementary Toeplitz matrix with $1$‘s along the $k^{th}$ diagonal and $0$‘s elsewhere.

Proof

First, let us observe that the following holds:

$R(z) = \psi_n^T(z^{-1})Q\psi_n(z) = tr[\psi_n^T(z^{-1})\psi_n(z)Q].$

Note now that

$\psi_n^T(z)\psi_n(z) = \begin{bmatrix} 1 &z^{-1} &\hdots &z^{-n} \\ z & 1 &\ddots &z^{-n+1} \\ \vdots &\ddots &\ddots &\vdots \\ z^n &z^{n-1} &\hdots &1 \end{bmatrix} = \sum_{k=-n}^n \Theta_k z^{-k}$

Combining, this yields:

which does indeed yield the desired form of the coefficients $r_k$$\blacksquare$

A keen observer might now conjecture that there may be a connection between the non-negativity of $R(z)$ and the properties of a given Gram matrix $Q$ associated with $R(z)$. This is indeed the case, and can be summarized by the following theorem (whose proof we omit, as this is not our central focus):

Theorem (Trace Parameterization of Nonnegative Trigonometric Polynomials)

Let $R(z) \in \mathbb{C}_n[z]$ be Hermitian trigonometric polynomial. Then, $R$ is nonnegative on the unit circle if and only if there exists a positive semidefinite matrix $Q \in \mathcal{G}(R)$.

In essence, the characterizations of trigonometric polynomials which are nonnegative on the unit circle given by trace parameterization and the Reisz-Fejèr theorem yields a link between those polynomials and positive definite matrices. This is particularly useful in a computation setting as both of these sets are convex (a fact which can be easily checked, but which we will not demonstrate here). As convexity is a required condition for optimization, this sheds light on the comment that trace parameterization may be a more useful method to encode these polynomials in some contexts.

Trace Parameterization:  A Control Theory Perspective

Now, we will show that the trace parameterization of Hermitian trigonometric polynomials can, in fact, be derived in a control theory perspective. This relies on the so-called Kalman-Yakubovich-Popov lemma (or positive real lemma, as it is sometimes referred to). A specific, nicer case of the lemma — which is important enough that we will think of it as a theorem here — can be stated as follows:

Theorem (Kalman-Yakubovich-Popov Lemma for Controllable and Observable Systems)

Consider the $n^{th}$ order linear time-invariant control system with feedback of the form

$\dot{x} = Ax(t) + Bu(t), \qquad y=Cx(t)+Du(t).$

Then, if $(A,B)$ are a controllable pair and $(A,C)$ are an observable pair, the transfer function $\mathbf{G}(s) = C(sI-A)^{-1}B+D$ is positive real (i.e. $Re\left ( G(s) \right ) \geq 0$) if and only if there exists a positive semidefinite matrix $P$ such that the matrix

$Q = \begin{bmatrix} P-A^TPA & C^T-A^TPB \\ C-B^TPA &(D+D^T)-B^TPB \end{bmatrix}$

is positive semidefinite.

To use this, let us recall that the causal part of a Hermitian trigonometric polynomial which is nonnegative on the unit circle is positive real. Let $R(z)\in \mathbb{C}_n[z]$ be such a polynomial. We now will place this causal part of $R(z)$ into a linear time-invariant control system with $(A,B)$ a controllable pair and $(A,C)$ an observable pair. That is, we realize a controllable and observable linear time-invariant system such that $G(s)=R_+(z)$. How this is done is slightly involved (we utilize the so called Rosenbrock system matrix to realize our desired transfer function), but for our purposes may be achieved by setting

$A = \Theta_1 = \begin{bmatrix} 0 & 1 & \hdots & 0 \\ \vdots & \ddots &\ddots &\vdots \\ \vdots & \ddots &\ddots &1 \\ 0 & \hdots & \hdots &0 \end{bmatrix}, \quad B = \begin{bmatrix} 0 \\ \ddots \\ 0 \\ 1 \end{bmatrix},$

$C= \begin{bmatrix} &r_n &\hdots &r_2 &r_1 \end{bmatrix}, \quad D = \frac{r_0}{2}.$

Then, for some positive definite matrix $P$, the matrix $Q$ in the Kalman-Yakubovich-Popov Lemma for Observable Systems is positive definite and becomes

In fact, this turns out to be a Gram matrix of $R(z)$, which can be easily verified by simply computing $\psi^T(z^{-1})Q\psi(z)$.

Now, we may also note that

for $k \in \left \{0, 1, \hdots, n\right \}$. But this then yields that

Which is precisely the trace parameterization of Hermitian trigonometric polynomials we had previously derived!

References

1. B. Dumitrescu, Positive Trigonometric Polynomials and Signal Processing Applications. Springer (2007).
2. R. Lozano, B. Brogliato, O. Egeland, B. Maschke, Dissipative Systems Analysis and Control: Theory and Applications. Springer (2000)
3. F. Jafari, Topics in Harmonic Analysis. University of Wyoming, Lecture Notes, Spring 2017 (currently unpublished).
4. D. Meyer, Nonlinear Trajectory Generation and Control: Course Notes from an Introduction. University of Wyoming, Lecture notes Spring 2017 (currently unpublished).

Part 1

As a helpful review, here are a variety of problem and solutions to exercises from an introductory functional analysis class. This is the second installment in a series of functional analysis exercises (Part 1, Part 2, Part 3, Part 4).

Problem 1

Show that for any real inner product space

$\langle u, v \rangle + \langle v, u \rangle = \frac{1}{2} \left ( \left \| u + v \right \|^2 - \left \| u - v \right \|^2 \right ),$

and for a complex inner product space

$\langle u, v \rangle - \langle v, u \rangle = \frac{i}{2} \left (\left \| u + iv \right \|^2 - \left \| u - iv \right \|^2 \right ).$

Solution

Recall that if $\langle \cdot, \cdot \rangle$ is an inner product on a vector space $\mathscr{X}$, the following hold:

1. $\langle \alpha x + \beta y, z \rangle = \alpha \langle x, z \rangle + \beta \langle y, z \rangle$;
2. $\langle x, \alpha y + \beta z \rangle = \overline{\alpha} \langle x, y \rangle + \overline{\beta} \langle x, z \rangle$;
3. $\langle x, x \rangle \geq 0$ and $\langle x, x \rangle =0 \Leftrightarrow x=0$;
4. $\langle x, y \rangle = \overline{\langle y, x \rangle}$.

Moreover, let us recall that the norm induced by the inner product is given by $\left \| x \right \|= \langle x, x \rangle^{1/2}$ for all $x \in \mathscr{X}$. With this in mind, let us proceed in demonstrating the desired identities:

Let $\mathscr{X}$ be a real inner product space. Recall that $\overline{\alpha} = \alpha$ for all $\alpha \in \mathbb{R}$, and observe that the following holds for all $u,v \in \mathscr{X}$:

Now, let $\mathscr{X}$ be a complex inner product space. Then, observe that the following holds for all $u,v \in \mathscr{X}$:

Problem 2

Let $X = C[0,1]$ with supremum norm and define $E \subset X$ to be the set of functions such that

$\int_0^{\frac{1}{2}} f - \int_{\frac{1}{2}}^1 f = 1.$

Prove that $E$ is a closed, convex subset of $X$, but that $E$ has no element of minimal norm. This gives an alternative proof of the fact that $C[0,1]$ is not a Hilbert space.

Solution

Let $X = C[0,1]$ with supremum norm and define $E:= \left \{ f \in X : \int_0^{\frac{1}{2}} f - \int_{\frac{1}{2}}^1 f = 1 \right \}$. Let us consider the linear continuous function $G: C[0,1] \rightarrow \mathbb{R}$ defined by $G(g) = \int_0^{\frac{1}{2}} g - \int_{\frac{1}{2}}^1 g$. As the preimage of a closed set under a continuous function is closed, the preimage of the closed set $\left \{1 \right \}$ must also be closed under the continuous function $G$. Following from the fact that $G^{-1} \left ( \left \{ 1 \right \} \right ) = E$ by definition, we may see that $E$ is closed.

Now, let $g, h \in E$ be arbitrary and fix $t \in [0,1]$. Then, we may observe:

It follows that, for all $g, h \in E$, it must be the case that $tg+(1-t)h \in X$ whenever $t \in [0,1]$. Thus, $E$ is convex.

Further, for all $f \in E$, let us observe that the following holds:

That is, $\left \|f \right \| \leq 1$ for all $f \in E$.

Observe that if we desire to show that $C[0,1]$ is not a Hilbert space, it suffices to show that there exists no unique element $f_0$ in $E$ such that $d(0,E) = \left \| 0-f_0 \right \|$.

To do so, we construct a sequence in $E$ which approaches the minimum of the norm, but which does not converge to an element of $C[0,1]$. Setting $n= \frac{1}{t}$ for convenience, let us then consider the sequence $\left \{ f_t \right \}_{t=1}^\infty$ defined by:

While rather undesirably messy when expressed as such, this is simply the following function:

Observe by inspection (for convenience, as this is already cumbersome enough) that $f_t \in E$ for all $t > 0$, and $\left \| f_l \right \| > \left \| f_p \right \|$ whenever $l > p$.

Moreover, note that $\left \| f_t\right \| = 1 + \frac{2n}{1-2n}$ and $\left \| f_t\right \| \rightarrow 1$ as $t \rightarrow \infty$. But, observe as well that $f_t(X) \rightarrow \chi_{[0,1/2)} (x) - \chi_{[1/2,1]} (x)$ point-wise as $t \rightarrow \infty$, which is not a continuous function. Thus, we have constructed a sequence of functions in $E$ which grow arbitrarily close to the minimum of the norm, but which do not converge point-wise to an element of $C[0,1]$. Correspondingly, there cannot exist an element of minimal norm in $E$ with regard to $0$, and thus $C[0,1]$ is not a Hilbert space.    $\blacksquare$

Problem 3

Compute

$\underset{a,b,c}{min}\int_{-1}^1 \left | x^3 - a -bx -cx^2 \right |^2 dx.$

Solution

We seek to compute $\underset{a,b,c}{min}\int_{-1}^1 \left | x^3 - a -bx -cx^2 \right |^2 dx$.

Let us begin by observing that the set $\mathscr{X} := \left \{ \alpha_1x^2+\alpha_2x +\alpha_3 \mid \alpha_i \in \mathbb{R} \right \}$ is a closed, linear subspace of the Hilbert space $L^2[-1,1]$ with inner product $\langle f, g \rangle = \int_{-1}^1 (fg)(x) dx$. Further, note that it may be easily checked via routine computation that $\mathscr{E} = \left \{ \frac{\sqrt{2}}{2}, \frac{\sqrt{6}}{2}x, \frac{3\sqrt{10}}{4}(x^2-\frac{1}{3}) \right \}$ is an orthonormal basis for $\mathscr{X}$.

Observing that $x^3 \in L^2[-1,1]$, let $f_0 \in \mathscr{X}$ be the unique element of $\mathscr{X}$ with minimal norm in regard to $x^3$. That is, let $f_0$ be such that

Then, we may compute $f_0$ by taking the projection of $x^3$ onto $\mathscr{X}$. As $\mathscr{E}$ is an orthonormal basis for $\mathscr{X}$, it follows that

As a Hilbert space is necessarily complete, the following then holds

Thus, the given integral is minimized when $a =0$, $b= \frac{3}{5}$, and $c=0$.   $\blacksquare$

Problem 4

If the closed unit ball of $\mathscr{H}$ is compact, show that $\textrm{dim} \mathscr{H} < \infty$.

Solution

Suppose $\mathscr{H}$ is a Hilbert space, and suppose the closed unit ball of $\mathscr{H}$, denoted by $\mathcal{B}_\mathscr{H}$, is compact. For some index set $I$, let $\mathscr{E} = \left \{e_i \right \}_{i \in I}$ be an orthonormal basis for $\mathscr{H}$. Then, it follows that $e_i \in \mathcal{B}$ for all $i \in I$. Moreover, as $\langle e_i, e_j \rangle =0$ for any two $e_i, e_j \in \mathscr{E}$ with $i\neq j$, observe that we may deduce:

Let us denote the open ball of radius $\delta$ about $x \in \mathscr{H}$ by $\mathcal{B}(x; \delta)$ and construct the set:

Observe that each open ball in $C$ contains precisely one element of $\mathscr{E}$. Further, by the polar identity, let us note that, for all $x \in \mathcal{B}_\mathscr{H}$, the following holds:

From this, we may see that each $x \in \mathcal{B}_\mathscr{H}$ lies within at least one open ball in the set $C$. Thus, $\mathcal{B}_\mathscr{H} \subseteq \bigcup_{i \in I}\mathcal{B}(e_i ; 1)$, and it follows that $C$ is indeed an open cover of $\mathcal{B}_\mathscr{H}$.

But, as $\mathcal{B}_\mathscr{H}$ is compact, every open cover admits a finite subcover. As no open ball in $C$ may be omitted without also omitting an element of the basis $\mathscr{E}$, it must then be the case that $C$ is finite. Correspondingly, as the cardinality of $C$ is the same as the cardinality of $\mathscr{E}$, it must be the case that our orthonormal basis is finite as well. Thus, dim$\mathscr{H} < \infty$.   $\blacksquare$