What is NMF and what can you do with it?
What is NMF?
NMF stands for non-negative matrix factorization, a technique for obtaining low rank representation of matrices with non-negative or positive elements. Such matrices are common in a variety of applications of interest. For example, images are nothing but matrices of positive integer numbers representing pixel intensities. In information retrieval and text mining, we rely on term-document matrices for representing document collections. In recommendation systems, we have utility matrices showing customers’ preferences for items.
Given a data matrix A of m rows and n columns with each and every element aij ≥ 0, NMF seeks matrices W and H of size m rows and k columns, and k rows and n columns, respectively, such that A≈WH, and every element of matrices W and H is either zero or positive. The quantity k is set by the user and is required to be equal or less than the smallest of m and n. The matrix W is generally called the dictionary or basis matrix, and H is known as expansion or coefficient matrix. The underlying idea of this terminology is that a given data matrix A can be expressed in terms of summation of k basis vectors (columns of W) multiplied by the corresponding coefficients (columns of H).
The matrices W and H are determined by minimizing the Frobenius norm:
||A–WH||2
The minimization is carried out by some suitable iterative search process and the solution, i.e. matrices W and H, is not unique. There are many variations to the basic NMF approach wherein additional constraints, for example sparsity, are imposed on the resulting solution to limit the solution space for W and H.
As an example, consider a gray level image of size 128 x 128 as our data matrix A. I carried out NMF factorization in Matlab using the code available at http://www.csie.ntu.edu.tw/~cjlin/nmf. Matlab also has a built-in function for performing such factorization; however I found cjlin’s factorization perform better. If you do not have Matlab, you can try open source versions at scikit-learn or R of NMF. I obtained three low rank approximations of A for k = 32, 16, and 8, respectively. These approximations along with the original image are given below. You can see that the quality of approximations, particularly for k = 32 and 16 is excellent.
NMF and SVD
The singular value decomposition (SVD) is another and more established method of matrix factorization for obtaining low rank approximations. In SVD, a real-valued matrix A of size m x n is factorized as
A = UDVt
where U is an orthogonal matrix of size m x m of left singular vectors and V is an orthogonal matrix of size n x n of right singular vectors. The matrix D is a diagonal matrix of size m x n of singular values. A low rank approximation to A can be obtained by using only a subset of singular values and the corresponding left and right singular vectors.
To compare the approximations produced by NMF and SVD, I used the same mandrill image to perform singular value decomposition and reconstruct the image using 32, 16, and 8 top singular values. The results of this reconstruction along with the original image are shown below.
As you can note, both NMF and SVD provide almost identical results leading to a question about what is it that NMF offers over SVD. One big difference between the two factorization methods is that SVD does not impose restriction that factored matrices be positive while NMF imposes this restriction. The consequence of this restriction is that the NMF based factorization offers a better interpretation of the original data matrix as it is represented/approximated as a sum of positive matrices/vectors. On the other hand, SVD based representation leads to basis images of positive and negative elements that makes interpretation difficult. Let me illustrate this by using a simple example of seven segment display images. Such displays are common place to display numbers in various electronic appliances. As shown below, different digit patterns can be obtained by selectively turning on/off segments of a display arranged in a certain fashion.

I created the following matrix A of 7 rows and 10 columns to describe digits 0-9. A “1” implies the corresponding segment is on and a “0” means the segment is off.
Using the above A matrix, I performed NMF factorization with k = 4. The columns of the resulting W matrix represent the four basis vectors which when combined according to coefficients of H matrix give rise to digit patterns. A visual representation of the four basis vectors of matrix W is shown below. It is easy to visualize that different combination of basis images added together with suitable weights will reproduce 0-9 digit patterns.

To see how SVD based basis vectors will look like, the matrix A was also decomposed using SVD. In this case, both U and V matrices were found to have positive and negative elements. A visual representation of the first four column vectors of matrix U forming basis vectors turned out to be as shown below. In this case, I have included gray scale used in visualization right next to each image to highlight the presence of positive and negative components. [The shading in the center of images is due to scaling of background] It is not hard to see that the interpretation of different digit patterns as a combination of different basis vectors in this case is difficult as it involve adding and subtracting segments with different multipliers.

To summarize the major difference between NMF and SVD, it is suffice to say that NMF decomposition emphasizes representation by parts which helps in better understanding the factorization results. Another related outcome is that resulting factored matrices tend to be sparse.
NMF Applications
NMF has been used to perform document clustering, making recommendations, visual pattern recognition such as face recognition, gene expression analysis, feature extraction, source separation etc. Basically, it can be used in any application where data matrix A has no negative elements.
I am going to use a small example to demonstrate how NMF can be used for clustering. Consider the 7 x 9 matrix showing some of nutritional components of nine different types of food items.
Applying NMF factorization to this matrix with k = 3, we get W matrix as:
Each column of W matrix stands for a cluster. I have highlighted the highest entry in each row using a different color. The highlighted entries under each column denote the features emphasized by the respective clusters. Let us take a look at the 3 x 9 H matrix resulting from factorization.
The interpretation of this matrix is that the row index of the highest entry under each column indicates the cluster membership of that column. I have highlighted the highest entry under each column with one of the three colors used in W matrix. Thus, NMF factorization shows that apple, banana, and pear form one (yellow) cluster, blue crab and shrimp second (red) cluster, and bell pepper, broccoli, and carrot form third (green) cluster.
Thus, to the question “what can you do with NMF?”, the answer is that NMF can be used to perform a variety of machine learning tasks as long as we have a positive data matrix. Not only that, NMF factorization can be extended to factorizing tensors (stacks of matrices) to perform tasks such as video segmentation and activity recognition.
how are the values in the W and H matrices calculated?
These are calculated by an iterative process to minimize the difference between actual matrix elements and their approximated values.