Overview of Matrix Factorisation Techniques using Python

Source: Deep Learning on Medium

Overview of Matrix Factorization Techniques using Python

Different types of Matrix Factorization Techniques and Scaling mechanisms for online Recommendation Engines


Low-rank approximations of data matrices have become an important tool in Machine Learning in the field of bio-informatics, computer vision, text processing, recommender systems, and others. They allow for embedding high dimensional data in lower dimensional spaces which mitigate effects due to noise, uncover latent relations, or facilitate further processing.

In general, MF is a process to find two factor matrices, P ∈ R, k×m and Q ∈ R, k×n, to describe a given m-by-n training matrix R in which some entries may be missing. MF can be found in many applications, but we only use collaborative filtering in recommender systems as examples. It is based on the assumption that the entries of R are the historical users’ preferences for merchandises, and the task on hand is to predict unobserved user behavior (i.e., missing entries in R) to make a suitable recommendation.

In this blog, I discuss about different types of matrix factorization techniques for real-time recommendation engines and their corresponding Python libraries. In the next blog, let’s get familiar with some Python code samples.

PyMF — Python Matrix Factorization Module

Python Matrix Factorization (PyMF) is a Python open-source module for several constrained/unconstrained matrix factorization (and related) methods for both sparse and dense matrices. It requires cvxopt, numpy and scipy. Applications of factorizing large matrices are: gene expression data and microarrays, term by document matrices, graph adjacency matrices, etc.

$ pip install pyMF # for python 2.67/2.7$ pip install pymf3 # for python 3

Convex Non-negative Matrix Factorization (CNMF):

NMF factorizes a non-negative input matrix V into two non-negative matrix factors V = WH such that W describes “clusters” of the datasets. Analyzing genotypes, social networks or images, it can be beneficial to ensure V to contain meaningful “cluster centroids”, i.e., to restrict W to be convex combinations of data points. Each data point is a convex combination of vertices of the data convex hull; the aim is to restrict W further to be vertices of the convex hull. Characteristics of convex-hull NMF approach are:

  • The expected size of the convex hull typically grows considerably slower than that of the dataset, where for n random Gaussian points in the plane, the expected number of vertices of the convex hull is Ω(√log n), i.e., the candidate set typically grows much slower than the data set.
  • Distance preserving, low-dimensional projections of the data allow efficient sampling of the convex hull and evaluate the candidate vertices.
  • Convex-NMF enforces notion of cluster centroids and is naturally sparse.

Convex-NMF applies to both nonnegative and mixed-sign data matrices and the factors W and G both tend to be very sparse.

Semi Non-negative Matrix Factorization (SNMF)

For an input data matrix X = (x1, . . . , xn) contain a collection of n data vectors as columns. Considering factorizations in the form of SVD as : X ≈ F G^T, where X ∈ R p×n , F ∈ R p×k and G ∈ R n×k . In the case of the SVD, there are no restrictions on the signs of F and G; moreover, the data matrix X is also unconstrained. NMF can also be written in this form, where the data matrix X is assumed to be nonnegative, as are the factors F and G. For non-negative matrices, the following holds true:

When the data matrix is unconstrained (i.e., it may have mixed signs), the factorization technique can be referred to as Semi-NMF, in which we restrict G to be nonnegative while placing no restriction on the signs of F. Further, Semi-NMF can be motivated from the perspective of clustering. On applying K-means clustering on X with cluster centroids F = (f1, . . . ,fk), G denotes the cluster indicators: i.e., gik = 1, if xi belongs to cluster ck; gik = 0, otherwise.

K-means clustering objective can be alternatively viewed as an objective function for matrix approximation. This approximation will generally be tighter on relaxing the optimization by allowing gij to range over values in (0, 1), or values in (0,∞). This yields the Semi-NMF matrix factorization.

Four random datasets, each with 3 clusters. “⊳” are Fsemi factors and “[]” are Fcnvx factors.

Tight Semi Non-Negative Matrix Factorization

In close relation to methods SNMF and CMF, the tight semi non-negative matrix decomposition is the result of a multi-objective optimization problem. However, for V = WH, it does not require that the templates, i.e., the columns of W, be convex combinations of the original data in V. There will be two objectives:

  • Minimizing the error in the approximation of the original data (model fidelity).
  • Minimizing the maximum angle between any two columns of W (model tightness).

NMF-like algorithms (Semi-NMF, Convex-NMF, Kernel-NMF, Tri-NMF) always outperform K- means clustering.

Other matrix factorization techniques supported by PyMF include: Archetypal analysis (AA), Simplex volume maximization (SiVM), Convex-hull non-negative matrix factorization (CHNMF), Binary matrix factorization (BNMF), Singular value decomposition (SVD), Principal component analysis (PCA), K-means clustering (Kmeans), C-means clustering (Cmeans), CUR decomposition (CUR), Compaxt matrix decomposition (CMD).

Fast Parallel Stochastic Gradient Method for Matrix Factorization

LIBMF is an open source large-scale sparse matrix factorization tool used primarily for approximating an incomplete matrix using the product of two matrices in a latent space.

  • LIBMF supports binary matrix factorization (BMF), and one-class matrix factorization. Some packages (e.g., scikit-learn,2 mlpack,3 and nimfa4) support a variety of MF problems, but they do not implement parallel MF algorithms published in recent five years.
  • LIBMF framework is extended to cover BMF, OCMF, and L1 regularization.
  • LIBMF allows users to use their disk as a buffer to factorize a huge matrix that may not fit into their memory. For example, the training matrix can be cached, thereby helping to reduce the memory usage.
  • LIBMF supports parallel computation in a multi-core machine.
  • LIBPMF (parallel version of LIBMF) can only solve real-valued matrix factorization (RVMF) with a squared loss and L2-norm regularization.
  • RVMF aims to find P and Q to approximate the values of entries as accurate as possible, while BMF focuses on learning the correct signs of entries. Both RVMF and BMF can be formulated as a non-convex optimization problem.
  • LIBMF uses CPU instructions (e.g., SSE) to accelerate vector operations, taking less than 20 minutes to converge to a reasonable level on a dataset of 1.7B ratings
  • Supports cross-validation for parameter selection and allows selection of suitable evaluation criteria such as mean percentile rank and area under the curve for different MF problems.
pip install libmf

Collective Matrix Factorization (pyCMF and cmfrec)

Collective Matrix Factorization is a Machine Learning method that decomposes two matrices X, Y, into three matrices U, V and Z, such that

where f is either the identity or sigmoid function. This is a collaborative filtering model for recommender systems that takes as input explicit item ratings and side information about users and/or items. The overall idea was extended here to also be able to do cold-start recommendations (of users and items that were not in the training data).


CMF decomposes complex and multiple relationships into a small number of components, and can provide valuable insights into your data.

$ pip install git+https://github.com/smn-ailab/PyCMF


cmfrec fits a collective matrix factorization model to ratings data along with item and/or user side information, by factorizing all matrices with common (shared) factors.

$ pip install cmfrec

Topic modeling and text classification

CMF can be used for topic modeling to explore the data to extract topics like toxicity, sentiments, classify relevant texts and documents clustering.

Movie rating prediction

Prediction tasks involving relations between entities like movies, users, ratings, actors use CMF to the model relations and predict unobserved edges.

Singular Value Decomposition

Singular Value Decomposition (SVD) is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler.

A = U . Sigma . V^T

where A is the real m x n matrix that is to be decomposed, U is an m x m matrix, Sigma is an m x n diagonal matrix, and V^T is the transpose of an n x n matrix where T is a superscript.

pip install numpypip install scipy


Nimfa is a Python library for nonnegative matrix factorization including both dense and sparse matrix, and supports methods like nonnegative double singular value decomposition, initialization approaches, quality scoring.

NMF’s distinguishing feature is imposition of non-negativity constraints, where only non-subtractive combinations of vectors in original space are allowed. Specific knowledge of the problem domain can be modelled by further imposing discriminative constraints, locality preservation, network-regularization or constraint on sparsity.

In a standard model of NMF, a data matrix V is factorized to V W H by solving a related optimization problem. Non-negative matrices W and H are commonly referred to as basis and mixture matrix, respectively.

Nimfa implements an originally proposed optimization with Euclidean or Kullback-Leibler cost function, along with Frobenius, divergence or connectivity costs. It also supports alternative optimization algorithms including Bayesian NMF Gibbs sampler, iterated conditional modes NMF, probabilistic NMF and alternating least squares NMF using projected gradient method for sub-problems. Sparse matrix factorization is provided either through probabilistic or alternating non-negativity-constrained least squares factorization. Fisher local factorization may be used when dependency of a new feature is constrained to a given small number of original features.

Nimfa also implements several non-standard models. These comprise non-smooth factorization V W S(θ) H and multiple model factorization for simultaneous treatment of several input matrices and their factorization with the same basis matrix W.

All mentioned optimizations are incremental, starting with initial approximation of matrices W and H. The initialization choice can speed up the convergence and improve the overall quality of the factorization results.


$ pip install surprise

Probabilistic Matrix Factorization

PMF models user preference matrix as a product of two lower-rank user and movie matrix. It scales linearly with the number of observations and performs well on the large, sparse, and very imbalanced datasets. Further, PMF model can include learnable adaptive priors over the movie and user feature vectors to tune and control the model complexity automatically. Further, dimensionality reduction and regularization help to determine suitable parameters for training the best model, thereby reducing the computational demands of traditional (deterministic) algorithms.

The constrained version of the PMF model is designed on the basis of assumption that users who have rated similar sets of movies are likely to have similar preferences, helping better generalization for users with very few ratings. It achieves this by constraining user-specific feature vectors that have a strong effect on infrequent users. Nimfa supports PMF along with others:

$ git clone https://github.com/Songbo729066989/recsys.gitor$ git clone https://github.com/meowoodie/Probabilistic-Matrix-Factorization.git