Sit back
Let's learn
About

How to Build a Labelled Training Data Set with a Few Labelled Examples

Published on:
April 10, 2023
Published by:
Professor Ishwar Sethi
This post was originally published by one of our partners on:
https://iksinc.tech/how-to-build-a-labelled-training-data-set-with-a-few-labelled-examples/

When building a classifier, we assume a large enough training data set with labels is available. This situation is what we call as supervised learning. In a real world setting, such training examples with labels need to be acquired. In any application domain where labeling requires domain expertise such as in medicine, gathering a large training set with labels is an expensive and time consuming task.  In such cases, it is not uncommon to use a small set of correctly labeled examples to label the rest of training examples. This type of learning is referred as semi-supervised learning and it falls somewhere between supervised and unsupervised learning. Often the term semi-supervised classification is used to describe the process of labeling training examples using a small set of labeled examples to differentiate from semi-supervised clustering. In semi-supervised clustering, the goal is to group a given set of examples into different clusters with the condition that certain examples must be clustered together and certain examples must be put in different clusters. In other words, some kind of constraints are imposed on resulting clusters in terms of cluster memberships of certain specified examples.

In this blog post, I am going to illustrate semi-supervised classification leaving semi-supervised clustering for another post. When we have a small set of labeled examples and we want to rely on them to label a much larger set of unlabeled training examples, we need to make some assumptions. For example, we might make an assumption that training examples close to each other are likely to have similar class labels, an assumption made when applying k-nearest neighbor classification. Instead one might assume classes to have Gaussian distribution and we may try to iteratively find the distribution parameters. We must remember that our result will be as good as our assumptions are.

Label Propagation Algorithm (LPA)

One of the semi-supervised classification method is label propagation that I will explain here. This method is based on the assumption that examples near each other are likely to have similar class labels. The basic idea of this method is to consider all examples, labeled and unlabeled, as interconnected nodes in a network. Each node in the network tries to propagate its label to other nodes. How much of a node’s label influences other nodes is determined by their respective closeness or proximity.

We will work through a series of steps to illustrate the working of the label propagation algorithm. Let us consider the following nine training examples, each with two-features:

[5.4 3.9], [4.8 3.0], [5.1 3.3], [5.7 2.8], [5.7 3.0], [5.9 3.2], [6.9 3.2], [6.4 2.7], [6.7 3.0]

We know labels of three examples only. These three examples are shown above in color; each color representing a different class label. What label propagation algorithm does is that it tries to determine the labels of the six unlabeled examples.

The first step is to calculate closeness between each pair of examples. In LPA, the closeness between examples is measured by the following formula:

w i j = exp ( d i j 2 σ 2 )

where

d i j 2

is the squared euclidean distance between the example pair ‘i-j’ and σ2�2 is a parameter to scale proximity. Since we have nine examples, we end up with a 9×9 symmetric weight matrix W. The following code snippets show the calculations for W. The examples are in array X and sigma equals 0.4.

from sklearn.metrics.pairwise import euclidean_distancesD = euclidean_distances(X, X, squared = True)W =np.exp(-D/0.16)print(W)[[1.   0.   0.06 0.   0.   0.01 0.   0.   0.  ] [0.   1.   0.32 0.   0.01 0.   0.   0.   0.  ] [0.06 0.32 1.   0.02 0.06 0.02 0.   0.   0.  ] [0.   0.   0.02 1.   0.78 0.29 0.   0.04 0.  ] [0.   0.01 0.06 0.78 1.   0.61 0.   0.03 0.  ] [0.01 0.   0.02 0.29 0.61 1.   0.   0.04 0.01] [0.   0.   0.   0.   0.   0.   1.   0.04 0.61] [0.   0.   0.   0.04 0.03 0.04 0.04 1.   0.32] [0.   0.   0.   0.   0.   0.01 0.61 0.32 1.  ]]

Next, we associate with each node a c-dimensional label vector, c being the number of classes, that reflects the class probabilities associated with the node training example. In our example, we have three classes. We set the label vectors to ensure the sum of each vector being one and the examples/nodes with known labels have o’s and 1’s only in there respective vectors. These initial nine, three-dimensional vectors are shown below as a matrix Y.

[[0.33 0.33 0.34]
[0.33 0.33 0.34]
[1.   0.   0.  ]
[0.33 0.33 0.34]
[0.   1.   0.  ]
[0.33 0.33 0.34]
[0.33 0.33 0.34]
[0.33 0.33 0.34]
[0.   0.   1.  ]]

To perform label propagation, we convert the weight matrix W into a transition matrix T using the following relationship:

t i j = w i j k w k j

where tij��� denote the label transition probabilities propagated from node j to node i.

Having initialized the label probabilities and determined the transition matrix, we are now ready to propagate label information in the network. We do this by updating the Y matrix by the following relationship: Y←TY. The rows of the updated Y matrix are normalized to ensure sum of each row equals 1. The rows corresponding to nodes of known labels are reset to have o’s and 1’s only in there respective vectors as the labels of these nodes are known and fixed. The result of these steps is the following Y matrix, shown as in transposed form.

[[0.36 0.48 1.   0.23 0.   0.25 0.22 0.27 0.  ]
[0.32 0.26 0.   0.54 1.   0.5  0.22 0.28 0.  ]
[0.33 0.26 0.   0.23 0.   0.25 0.56 0.46 1.  ]]

You can check that each column adds up to one to show the probabilities of three class labels for each node. We also note that probability values have begun to move either upwards towards one or downwards towards zero. The above steps of updating the Y matrix, normalizing its rows, and resetting label vectors of known node labels complete one iteration of label propagation. Repeating these steps a few times moves probabilities sufficiently in upward and downward directions to let us terminate the algorithm. In the present example, we obtain the following Y matrix after 10 iterations.

[[0.58 0.93 1.   0.04 0.   0.04 0.01 0.02 0.  ]
[0.24 0.05 0.   0.91 1.   0.89 0.02 0.22 0.  ]
[0.18 0.02 0.   0.05 0.   0.07 0.97 0.76 1.  ]]

Looking at the Y matrix, we can safely assign class 1 label to first two examples, class 2 label to fourth and sixth examples, and class 3 label to seventh and eighth examples, thus completing the task of assigning labels to training examples with no labels.

You might wonder whether the algorithm will converge or not. It turns out that there is a theoretical result that establishes the convergence of LPA. The choice of sigma is important as it affects the convergence rate as well as the final result. This method is available in scikit-learn and you may want to check it out.

Check Out These Brilliant Topics
Understanding Tensors and Tensor Decompositions: Part 3
Published on:
April 6, 2023
Published by:
Professor Ishwar Sethi

This is my third post on tensors and tensor decompositions. The first post on this topic primarily introduced tensors and some related terminology. The second post was meant to illustrate a particularly simple tensor decomposition method, called the CP decomposition. In this post, I will describe another tensor decomposition method, known as the Tucker decomposition. While the CP decomposition’s chief merit is its simplicity, it is limited in its approximation capability and it requires the same number of components in each mode. The Tucker decomposition, on the other hand, is extremely efficient in terms of approximation and allows different number of components in different modes. Before going any further, lets look at factor matrices and n-mode product of a tensor and a matrix. Factor Matrices Recall the CP decomposition of an order three tensor expressed as X≈∑r=1Rar∘br∘cr, where (∘ ) represents the outer product. We can also represent this decomposition in terms of organizing the vectors, ar,br,cr,r=1,⋯R , into three matrices, A, B, and C, as A=[a1a2⋯aR], B=[b1b2⋯bR],and C=[c1c2⋯cR] The CP decomposition is then expressed as X≈[Λ;A,B,C], where Λ is a super-diagonal tensor with all zero entries except the diagonal elements. The matrices A, B, and C are called the factor matrices. Next, lets try to understand the n-mode product. Multiplying a Tensor and a Matrix How do you multiply a tensor and a matrix? The answer is via n-mode product. The n-mode product of a tensor X∈RI1×I2×⋯IN with a matrix U∈RJ1×In is a tensor of size I1×I2×⋯In−1×J×In+1×⋯×IN, and is denoted by X×nU . The product is calculated by multiplying each mode-n fibre by the U matrix. Lets look at an example to better understand the n-mode product. Lets consider a 2x2x3 tensor whose frontal slices are:

Want Us to Send You New Posts?

We add Value. Not spam.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Kevadiya INC. © 2023 All Rights Reserved.