instancelib.machinelearning.base module

class instancelib.machinelearning.base.AbstractClassifier(*args, **kwds)[source]

Bases: ABC, Generic[IT, KT, DT, VT, RT, LT, LMT, PMT]

This class provides an interface that can be used to connect your model to InstanceProvider, LabelProvider, and Environment objects.

The main methods of this class are listed below:

Examples

Fit a classifier on train data:

>>> model.fit_provider(train, env.labels)

Predict the class labels for a list of instances:

>>> model.predict([ins])
[(20, frozenset({"Games"}))]

Return the class labels and probabilities:

>>> model.predict_proba(test)
[(20, frozenset({("Games", 0.66), ("Bedrijfsnieuws", 0.22), ("Smartphones", 0.12)})), ... ]

Return the raw prediction matrix:

>>> preds = model.predict_proba_raw(test, batch_size=512)
>>> next(preds)
([3, 4, 5, ...], array([[0.143, 0.622, 0.233],
                        [0.278, 0.546, 0.175],
                        [0.726, 0.126, 0.146],
                        ...]))
abstract fit_instances(instances, labels)[source]

Fit the classifier with the instances and accompanied labels found in the arguments.

Parameters:
  • instances (Iterable[Instance[KT, DT, VT, RT]]) – The train data

  • labels (Iterable[Iterable[LT]]) – The labels of the train data

Return type:

None

abstract fit_provider(provider, labels, batch_size=200)[source]

Fit the classifier with the instances found in the InstanceProvider based on the labels in the LabelProvider

Parameters:
  • provider (InstanceProvider[IT, KT, DT, VT, RT]) – The provider that contains the training data

  • labels (LabelProvider[KT, LT]) – The provider that contains the labels of the training data

  • batch_size (int, optional) – A batch size for the training process, by default 200

Return type:

None

fit_val_provider(provider, labels, validation=None, batch_size=200)[source]
Parameters:
Return type:

None

abstract property fitted: bool

Return true if the classifier has been fitted

Returns:

True if the classifier has been fitted

Return type:

bool

abstract get_label_column_index(label)[source]

Return the column in which the labels are stored in the label and prediction matrices

Parameters:

label (LT) – The label

Returns:

The column index of the label

Return type:

int

property name: str

The name of the classifier

Returns:

A name that can be used to identify the classifier

Return type:

str

predict(instances, batch_size=200)[source]

Predict the labels on input instances.

Parameters:
  • instances (InstanceInput[IT, KT, DT, VT, RT]) – An InstanceProvider or Iterable of Instance objects.

  • batch_size (int, optional) – A batch size, by default 200

Returns:

A Tuple of Keys corresponding with their labels

Return type:

Sequence[Tuple[KT, FrozenSet[LT]]]

Raises:

ValueError – If you supply incorrect formatted arguments

abstract predict_instances(instances, batch_size=200)[source]

Predict the labels for a iterable of instances

Parameters:
  • instances (Iterable[Instance[KT, DT, VT, RT]]) – The instances

  • batch_size (int, optional) – The batch size, by default 200

Returns:

A sequence of (identifier, prediction) pairs

Return type:

Sequence[Tuple[KT, FrozenSet[LT]]]

predict_proba(instances, batch_size=200)[source]

Predict the labels and corresponding probabilities on input instances.

Parameters:
  • instances (InstanceInput[IT, KT, DT, VT, RT]) – An InstanceProvider or Iterable of Instance objects.

  • batch_size (int, optional) – A batch size, by default 200

Returns:

Tuple of Keys corresponding with tuples of probabilities and the labels

Return type:

Sequence[Tuple[KT, FrozenSet[Tuple[LT, float]]]]

Raises:

ValueError – If you supply incorrect formatted arguments

abstract predict_proba_instances(instances, batch_size=200)[source]

Predict the labels for each instance in the provider and return the probability for each label.

Parameters:
Returns:

A sequence of tuples consisting of:

  • The instance identifier

  • The class labels and their probabilities

Return type:

Sequence[Tuple[KT, FrozenSet[Tuple[LT, float]]]]

abstract predict_proba_instances_raw(instances, batch_size=200)[source]

Generator function that predicts the labels for each instance. The generator lazy evaluates the prediction function on batches of instances and yields class probabilities in matrix form.

Parameters:
  • instances (Iterable[Instance[TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]]) – Input instances

  • batch_size (int, optional) – The batch size in which instances are processed, by default 200 This also influences the shape of the resulting probability matrix.

Yields:

Tuple[Sequence[KT], PMT]

An iterator yielding tuples consisting of:

  • A sequence of keys that match the rows of the probability matrix

  • The Probability matrix with shape (batch_size, n_labels)

Return type:

Iterator[Tuple[Sequence[TypeVar(KT)], TypeVar(PMT)]]

abstract predict_proba_provider(provider, batch_size=200)[source]

Predict the labels for each instance in the provider and return the probability for each label.

Parameters:
Returns:

A sequence of tuples consisting of:

  • The instance identifier

  • The class labels and their probabilities

Return type:

Sequence[Tuple[KT, FrozenSet[Tuple[LT, float]]]]

abstract predict_proba_provider_raw(provider, batch_size=200)[source]

Generator function that predicts the labels for each instance in the provider. The generator lazy evaluates the prediction function on batches of instances and yields class probabilities in matrix form.

Parameters:
  • provider (InstanceProvider[TypeVar(IT, bound= Instance[Any,Any,Any,Any], covariant=True), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]) – The input InstanceProvider

  • batch_size (int, optional) – The batch size in which instances are processed, by default 200 This also influences the shape of the resulting probability matrix.

Yields:

Iterator[Tuple[Sequence[KT], PMT]]

An iterator yielding tuples consisting of:

  • A sequence of keys that match the rows of

the probability matrix - The Probability matrix with shape (len(keys), batch_size)

Return type:

Iterator[Tuple[Sequence[TypeVar(KT)], TypeVar(PMT)]]

predict_proba_raw(instances, batch_size=200)[source]

Generator function that predicts the labels for each instance. The generator lazy evaluates the prediction function on batches of instances and yields class probabilities in matrix form.

Parameters:
Yields:

Tuple[Sequence[KT], PMT]

An iterator yielding tuples consisting of:

  • A sequence of keys that match the rows of the probability matrix

  • The Probability matrix with shape (batch_size, n_labels)

Return type:

Iterator[Tuple[Sequence[TypeVar(KT)], TypeVar(PMT)]]

abstract predict_provider(provider, batch_size=200)[source]

Predict the labels for all instances in an InstanceProvider.

Parameters:
Returns:

A sequence of (identifier, prediction) pairs

Return type:

Sequence[Tuple[KT, FrozenSet[LT]]]

abstract set_target_labels(labels)[source]

Set the target labels of the classifier

Parameters:

labels (Iterable[LT]) – The class labels that the classifier can predict

Return type:

None