# Iterative Solutions of Linear Systems
## Matrix Norms
The **subordinate matrix norm** is defined as the following:
$
\| \mathbf{A} \| \equiv \text{sup} \ \{\| \mathbf{Ax} \| \ : \ \mathbf{x} \in \mathbb{R}^n \ \text{and} \ \| \mathbf{x} \| = 1\}
$
given vector norm $\| \mathbf{x} \|$.
We can then define $\ell_1, \ell_2, \ell_\infty$ matrix norms much like taxicab and euclidean norms for vectors:
$
\begin{aligned}
\| \mathbf{A} \|_1 &= \max_{1 \leq j \leq n} \sum^n_{i=1} |a_{ij}| && \text{$\ell_1$-matrix norm} \\
\| \mathbf{A} \|_2 &= \max_{1 \leq i \leq n} \sqrt{|\sigma_{\text{max}}|} && \text{$\ell_2$-matrix norm} \\
\| \mathbf{A} \|_\infty &= \max_{1 \leq i \leq n} \sum^n_{j=1} |a_{ij}| && \text{$\ell_\infty$-matrix norm} \\
\end{aligned}
$
where $\sigma_{\text{max}}$ denotes the largest absolute eigenvalue of $\mathbf{A}^T \mathbf{A}$
## Condition Numbers
In the context of systems of linear equations $Ax = b$, the condition number $\kappa(A)$ can be interpreted as the maximum ratio of the relative error in $x$ to the relative error in $b$. Thus, it's the proportion between the rate of change in $b$ and the rate of change in $x$. Therefore, a large condition number means that a subtle error in $b$ will lead to large errors in the computed $x$.
When $\kappa(A)$ is a small number, we say $\mathbf{A}$ is *well-conditioned*, meaning that the computed solution($x$) is relatively insensitive to perturbations in $b$. Similarily, we say $\mathbf{A}$ is *ill-conditioned* when $\kappa(A)$ is large.
Condition numbers provide an upper bound of the error, which can be used as a diagnostic to judge whether the results can be trusted.
## Iterative Methods for Solving Systems of Linear Equations
Normally when we wish to solve a system of linear equations, we would employ direct methods such as Gauss-Jordan elimination or LU decomposition. These methods compute the *exact solution*, excluding roundoff errors(error induced due to floating-point representatoins).
Direct methods compute the exact solution, meaning that you either run the algorithm to the end and obtain the solution, or terminate prematurely and get nothing.
The time complexity of LU is $O(N^3)$ with space complexity $N^2$. For some types of matrices, we can try and exploit its structure to come up with a faster algorithm. And hence iterative methods were invented.
Iterative methods start with an initial solution, and iteratively update the solution so that the error between the current solution and the true solution diminishes to zero. This *may* be faster for certain matrix types, as well as for spare matrices. Obviously, you can expect that iterative methods probably will perform faster for large matrices, given LU's grim cubic time complexity.
Iterative methods create a sequence of solutions to $Ax = b$, $x^{(0)}, x^{(1)}, x^{(2)}, ...$ so that $x^{(n)}$ converges to the true solution $x$.
### Iterative Method Definition
An iterative method recursively refines the solution at the current iteration:
$
x^{(k + 1)} = \Psi(x^{(k)})
$
The sequence converges if the difference between successive values become zero:
$
\lim_{k \rightarrow \infty} |x^{(k + 1)} - x^{(k)}| = 0
$
Let $A = M - N$ for some *invertible* matrix $M$. Then,
$
Mx^{(\infty)} = Nx^{(\infty)} + b
$
Since $N = M - A$:
$
Mx^{(\infty)} = (M - A)x^{(\infty)} + b
$
Because $x^{\infty}$ is the asymptotic result of the convergent sequence $x^{(k)}$, under the assumption that $x^{(k + 1)} = \Psi(x^{(k)})$ produces a convergent sequence of $x^{(k)}$:
$
Mx^{(k + 1)} = (M - A)x^{(k)} + b
$
Since the RHS is known, we can solve for $x^{(k + 1)}$. Ideally $M$ should be a matrix that makes the system easy to solve, such as $\mathbf{I}$ or a triangular matrix. In addition, we must ensure the assumption of $x^{(k + 1)} = \Psi(x^{(k)})$ producing a convergent sequence must hold, meaning that $M$ must be chosen carefully.
If we set $M = \mathbf{I}$, then this iteration method is called the Richardson iteration.
### Element-wise Formulation: Jacobi and Gauss-Seidel Method
In $Ax = b$, each element of $b$ is just the linear combination of a row of $A$ and $x$:
$
b_i = \sum^n_{j = 1} A_{ij} \ x_j
$
where $n$ is the number of variables.
Let's try writing the terms out:
$
\begin{aligned}
b_i &= \sum^n_{j = 1} A_{ij} \ x_j \\
&= A_{i1}x_1 + A_{i2}x_2 + A_{i3}x_3 + \dotsb + A_{in}x_n
\end{aligned}
$
Since we are interested in finding $x$, let's rearrange the terms for $x_j$. Suppose we want to find $x_i$:
$
A_{ii}x_i = -\Big\{ A_{i1}x_1 + A_{i2}x_2 + A_{i4}x_4 + \dotsb + A_{in}x_n \Big\} + b_i
$
Therefore,
$
x_i = -\sum^n_{j = 1, j \neq i}( \frac{A_{ij}}{A_{ii}} x_j ) + \frac{b_i}{A_{ii}}
$
Again, under the assumption that $x^{(k + 1)} = \Psi(x^{(k)})$ produces a convergent sequence:
$
x^{(k + 1)}_i = -\sum^n_{j = 1, j \neq i}( \frac{A_{ij}}{A_{ii}} x^{(k)}_j ) + \frac{b_i}{A_{ii}}
$
This is the **Jacobi Iterative Method** for solving a system of linear equations.
Now imagine you're running this method in practice. You would start from $x_1$, and then calculate $x_2, x_3$ and so on. This means that by the time you're calculating $x^{(k+1)}_i$, you know the values of $x^{(k+1)}_1, x^{(k+1)}_2, ..., x^{(k+1)}_{i - 1}$. So instead of using $x^{(k)}_1, x^{(k)}_2, ..., x^{(k)}_{i - 1}$, let's let's use the freshly calculated $k+1$ th values instead:
$
x^{(k + 1)}_i = -\sum^n_{j = 1, j < i}( \frac{A_{ij}}{A_{ii}} x^{(k + 1)}_j ) -\sum^n_{j = 1, j > i}( \frac{A_{ij}}{A_{ii}} x^{(k)}_j ) + \frac{b_i}{A_{ii}}
$
This is the **Gauss-Seidel Method**
### Relaxation
(TODO: more explanation on why relaxation accelerates convergence)
We can boost the convergence rate of the Gauss-Seidel method through relaxation, which takes the previous iteration's value into account a little bit.
Given some relaxation factor $\omega$, the **successive overrelaxation method**(SOR) is defined as the following:
$
x^{(k + 1)}_i = \omega \bigg\{ -\sum^n_{j = 1, j < i}( \frac{A_{ij}}{A_{ii}} x^{(k + 1)}_j ) -\sum^n_{j = 1, j > i}( \frac{A_{ij}}{A_{ii}} x^{(k)}_j ) + \frac{b_i}{A_{ii}} \bigg\} + (1 - \omega)x^{(k)}_i
$
We can see that we take the result of the Gauss-Seidel method with proportion $\omega$, and combine it with the previous iteration's value, $x^{(k)}_i$, with proportion $1 - \omega$.
All in all SOR gives the impression that it's performing some sort of smoothing along the solution sequence. If $\omega > 1$ it puts stronger emphasis on new approximations potentially speeding up the convergence, and if $\omega < 1$, attempts to adjust the solution overshooting. Selecting the optimal $\omega$ is non-trivial and is highly dependant on the traints of $A$.
### Convergence Analysis
The *error vector*, $e^k$, is the difference between the k'th approximated solution $X^{(k)}$ and the 'true' solution $x$:
$e^k = x^{(k)} - x$
From the definition of the iteration method $Mx^{(k + 1)} = (M - A)x^{(k)} + b$, we have:
$
x^{(k + 1)} = M^{-1}\{(M - A)x^{(k)} + b\}
$
Then,
$
\begin{aligned}
e^k &= x^{(k)} - x \\
&= M^{-1}(M - A)x^{(k)} - x + M^{-1}b \\
&= I - M^{-1}Ax^{(k)} - x + M^{-1}b \\
&= (I - M^{-1}A)x^{(k)} - (I - M^{-1}A)x \\
&= (I - M^{-1}A)(x^{(k)} - x)
\end{aligned}
$
We can then write the relation between $e^k$ and $e^{k - 1}$:
$
e^k = (I - M^{-1}A)e^{k - 1}
$
In order to ensure that the difference between $e^k$ and $e^{k - 1}$ is strictly decreasing, the following must hold:
$
\lim_{k \to \infty} \| (I - M^{-1}A)^{k} \| = 0
$
The spectral radius theorem shows that:
> The spectral radius of a matrix $A$, $\rho(A)$, is less than 1 if and only if $\lim_{k \to \infty} A^k = 0$
(You may read the full proof here https://en.wikipedia.org/wiki/Spectral_radius#Power_sequence)
So we can identify the condition for convergence of an iterative method is:
$
\rho(I - M^{-1}A) < 1
$
Note that this condition is independent from the initial approximation, $x^0$!
#### Jacobi and Gauss-Seidel Convergence
#### SOR Convergence