Linear Algebra Data Structures and Operations

Source: Deep Learning on Medium

Linear Algebra Data Structures and Operations


Linear algebra is incredibly useful for deep learning. If probability and statistics is the theory behind machine learning, then linear algebra is what makes it possible and computationally efficient.

This article assumes you have a pretty good computer science and math background. The article is on the visual side, meaning I tried to illustrate most concepts graphically.

Data Structures


  • Scalars are single numbers, they can be represented by an italic smaller case letter. It doesn’t matter whether they’re integers or doubles, they’re all scalars
n is an element of the Natural Numbers, r is an element of the Real Numbers


  • Array of numbers, mathematically speaking they can be thought of as directions from the origin, or points in space.
  • The individual elements of the vector are scalars
  • Also the space of the vector is denoted by:
this vector is made up of Real Numbers and there are m of them
  • Vectors and Sets, sets are pretty important, so to establish the proper notation for understanding vectors with sets:
one thing that’s important here, is that in a set order doesn’t matter, whereas in a vector it does


  • 2D array of numbers
  • To identify an element you gotta use 2 indexes, whereas in a vector you would use 1 index
  • The space of the matrix is denoted by:
where for an mxn matrix that is m sections high and n sections wide
  • To denote the ith row, we say A(i, 🙂
  • To denote the jth column, we say A(:, j)
  • The notation for a transpose is, if you’re not sure, I’ll explain these later:


  • Tensor A, is just a grid of matrices, in other words a 3D array of numbers
  • Therefore to access an element you need i, j, k indexes
this tensor is not actually indexed properly, its just copied and layered


Going back to matrices. Let’s try to find a motivation to study these.

Like I mentioned it’s just a 2D array of numbers but what can they represent? In the most traditional sense, they represent a system of linear equations. They are not polynomial equations. They are linear equations.

this is a polynomial equation, its got powers on the variables

This is a system of linear equations.

this is a 2×3 matrix, 2 rows and 3 columns

From this form you can do an algorithm called RREF. To solve this system of equations. See how this works. But there are better ways to do things.

in this representation you have A as a 2×2 matrix, x as a 2×1 vector and finally b as a 2×1 vector

You might be wondering how can you just do that? So to verify that this works, you have to know matrix multiplication.

Matrix Operations

In matrix multiplication where the order matters, you always wanna make sure the column of the first matrix, matches the rows of the second. Thereby leading to a rows of the first times column size of the second matrix as a result.


But let’s back it up a bit. You can do a lot of things with matrices. You can add them.

very straightforward

Multiplication or Cross Product

You can multiply them too, assuming certain conditions are met first.

Let’s say you have an mxn matrix A. You also have an pxq matrix B. You can multiple these matrices as AB, where A is on the left, if and only if n = p.

For a simple matrix that’s 2 by 2, and another matrix that’s 2 by 3. You can multiply them together only because the number of columns of the first matrix is the same as number of rows of the first.

This is because the formula for getting the resulting matrix is this:

this formula makes it clear that for each iteration A’s column and B’s row must be in sync and they are the same size

You should definitely look at this video to see how this works on paper. But soon we will be doing this where you’re familiar, in Python and Swift.

This is great. Because now we can add and multiply matrices, we can also use these matrix operation properties to utilize them for machine learning uses.

this is the distributive property
this is the associative property


Another important operation is the transpose, the formula for the transpose is:

where the row index becomes the column index, and the column index becomes the row index for every element

The transpose can be thought to be a mirror reflection along the main diagonal. The main diagonal elements are the elements that have the same index for row and column.

notice that the (row, column) indexes are always the same (i, i)

Main diagonals are important because they come up a lot. I always thought of them like this. Whenever you either Sum or Product all the elements of the main diagonal, no matter how stretched or compressed the matrix is, those values will remain the same. So the sum or the product of all the elements of a matrix, also called the determinant or the trace are few of the constant properties of a matrix.


We also allow for matrix and vector addition, where the resulting object is a matrix, in which every element of the vector’s row gets distributed to every row of the matrix:

notice that both matrices are the same size and that A and b must have same number of rows

Motivation To Understand Matrix Multiplication

Couple of things lead us needing to understand matrix multiplication on a deeper level.

  • The cost functions is defined as a summation(summation(long calculation)).
  • It’s possible to vectorize our math and code so that we are representing our cost functions as a matrix or a set of vectors. Therefore saving us from having nested for loops, which would take an eternity to compute. Because nested loops are a Big O no-noes!

Dot Product

The dot product, or the Hadamard product, is the element wise multiplication of two sets, usually vectors. It gives you a scalar value as opposed to a matrix value in the case of a Cross Product.

given these two vectors, the dot product is

Sometimes we might also use a times the transpose of b, to say the same thing. The dot product and the a times the transpose of b are equal. Another thing the dot product is equal to is the norm of a times the norm of b times cos(angle between them). It can be thought of as the projection of one onto the other.


I just mentioned norm of a, what is that? Norms generally describe the size of a vector.

Norm is really a type of function. A function that maps a vector into a non-negative scalar. Geometrically it’s like the distance between origin and the vector.

Norm, Euclidean Norm, Max Norm, Lp Norm, L1 Norm, Frobenius Norm and Norm of a Norm brought to you by Norm

So why so many norms? Well they’re for different uses.


For example, the L1 Norm is mathematically very simple, and it grows at the same rate everywhere. Can you think of a situation where that might be useful?

Let’s say that you wanted to see if a vector was actually a zero or just a very small vector pretending to be zero. Applying L1 Norm just does the trick.

L Max

The max norm is the absolute value of the value of the biggest element in the vector.

Formal Definition

The formal definition of a norm is given by:

where p denotes the subscript of L, and p must be greater than or equal to 1, and it must also be a Real number
L2 is also known as the Euclidean Norm

If we don’t do the root part of 1/2, the Euclidean Norm of a vector can be represented as x transpose times x. This is computationally and mathematically convenient, and this value still represents the size of the vector.

If the L2 norm is for vectors, is there a norm for matrices? YES, YES THERE IS.

It’s called a Frobenius Norm. It’s the sum of the squared of each element square rooted. Did you get that? No?

since we’re squaring and square rooting, its closer to the Euclidean Norm in spirit than the other norms

So the formula I was referring to earlier with the dot product is this:

also the dot product between x and y

Up Next…

In the next article, I’ll talk about matrix invesion, linear dependence and different types of decompositions of a matrix. If you would like me to write another article explaining a topic in depth, please leave a comment.