LabelPropagation

seqlearner.SemiSupervisedLearner.label_propagation(kernel, gamma, n_neighbors, alpha, max_iter, tol, n_jobs)

LabelPropagation classifier for semi-supervised learning. It's one of the basic semi-supervised learning algorithms that assigns labels to previously unlabeled data points. At the start of the algorithm, a (generally small) subset of the data points have labels (or classifications). These labels are propagated to the unlabeled points throughout the course of the algorithm.

Arguments

  • kernel: String, String identifier for kernel function to use or the kernel function itself. Only 'rbf' and 'knn' strings are valid inputs. The function passed should take two inputs, each of shape [n_samples, n_features], and return a [n_samples, n_samples] shaped weight matrix.
  • gamma: Float, Parameter for rbf kernel
  • n_neighbors: Positive integer, Parameter for knn kernel
  • alpha: Float, Clamping factor
  • max_iter: Positive integer, Change maximum number of iterations allowed
  • tol: Float, Convergence tolerance: threshold to consider the system at steady state
  • n_jobs: Positive integer, The number of parallel jobs to run

Example: predict the unlabeled sequences

from sklearn.model_selection import train_test_split
from seqlearner import MultiTaskLearner
labeled_path = "../data/labeled.csv"
unlabeled_path = "../data/unlabeled.csv"
mtl = MultiTaskLearner(labeled_path, unlabeled_path)
encoding = mtl.embed(word_length=5)
X, y, X_t, y_t = train_test_split(mtl.sequences, mtl.labels, test_size=0.33)
score = mtl.semi_supervised_learner(X, y, X_t, y_t, ssl="label_propagation")

See Also