
Overview
Definitions
Addition
Norms
Matrix vector
product
Naïve matrix
product
Alternative
implementations
Blocking 
Matrix
Field of m rows
and n columns
If m=n called
square matrix
Otherwise
rectangular

Submatrice
Vector can be
considered as m×1 matrix
Matlab does this (everything is a
matrix in matlab)
Scalar values
are 1×1 matrices
However,
vectors are special cases of
matrices concerning representation
Mathematically
usually different meaning
Submatrices
gained by deleting rows and
columns of matrix 
Partitioning
Splitting
matrix into submatrices
Partitioned
matrix sometimes called block
matrix
In most cases,
submatrices of block matrix have
same size

Special Matric
An n×n matrix
is called
Diagonal if a_{ij}
= 0 for all i ≠ j
Upper
triangular if a_{ij}
= 0 for all i > j
Lower
triangular if a_{ij}
= 0 for all i < j
A partitioned
matrix is called
Upper block
triangular matrix if A_{ij}
= 0 for all i > j

More Special Matrices
The transposed
A^{T} of an m×n matrix is
n×m matrix
where a_{ij}
and a_{ij}
are swapped
Correspondingly
the transposed conjugate matrix A^{H}
A is called
symmetric if A=A^{T}
n A is called
hermitian if A=A^{H} 
Matrix Addition
Matrices are
added elementwise
Dimensions must
be equal
Is this
operation faster or slower then vector
addition?
Is it better to
iterate first over the
rows or first over the columns? 
Run Time of Matrix Addition
Adding matrices
of dimensions m×n takes
2 m×n loads
m×n stores
m×n additions
Recall that the
additions can be in 1 or 2
processor cycles
Loading and
storing is more expansive,
especially from main memory
But also from
L2 (or L3 if exist) 
Is Matrix or Vector Addition Faste
Adding vectors
of length l = m×n takes
2 l loads
l stores
l additions
Thus same
arithmetic and memory operations are
performed
Should run at
the same speed, unless …
See next slides 
Row Major and Column Major
In C and C++ 2D
arrays are stored row major
All elements of
one row are stored together
In Fortran 2D
arrays are stored column major
All elements of
one column are stored together
If you define
your own matrix type with your own
indexing you use either one
What is done in
some libraries, like ??? 
Loop Nesting
Loop nesting should fit storing scheme
For row major
matrices iterating first over rows
For column
major first over columns
Stride1 memory
access
Why is this
important? 
Matrix Norms
Matrices are
operators on vectors
Matrix vector
product Ax can be considered as
applying the operator A on x
Therefore most
matrix norms are operator
norms
n Definitions of
vector norms be applied to the
image of the operation 
Special Matrix Norms
._{p} for p =
1, 2, and ∞ are the most
interesting
p = 2 difficult
to compute
p = 1 and ∞
much simpler
The first is
the maximum of column sums
The latter is
the maximal row sum 
Matrix Vector Product
Number of
columns must be equal to vector
dimension
Result is a
vector
Dimension is
number of rows
ith element of
result is dot product of ith
row and multiplied vector 
Alternative Computation
Dot product is
loop over matrix rows
What
corresponds to a loop over columns of
the matrix?
Scaled vector
addition
Is this faster or slower than dot product? 
Run Time Analysis
Operand vector:
n loads
Result vector:
m stores
Matrix: mn
loads
Multiplications: mn
Additions:
m(n1)
Which is the
most expensive? 
Run Time Analysis (ii)
Number of
additions, multiplications and loads for
the matrix are almost equal
Loading from
memory is much slower then basic
arithmetic
Assume we have
a rowmajor matrix and a cache
line of s_{c} double
Loads are
composes of:
Vector
addition: mn memory loads
Dot product: mn
memory loads and mn loads
from cache
Safe to assume
For
columnmajor matrices it is the opposite 