instancelib.feature_extraction package

Submodules

Module contents

class instancelib.feature_extraction.BaseVectorizer[source]

Bases: ABC, Generic[DT]

This is the ABC specifies a generic vectorizer. Vectorizers transform raw data examples into feature vectors. Given a data type DT, it specifies the methods fit() that initializes or fits the vectorizer. The method transform() transforms the data into vector form.

abstract fit(x_data, **kwargs)[source]

Fit the vectorizer according to the data in the given Sequence.

Parameters:

x_data (Sequence[DT]) – A Sequence of examples with type DT.

Returns:

A fitted vectorizer for data with type DT

Return type:

BaseVectorizer[DT]

Examples

Assume the creation of a vectorizer and a sequence of data examples in the variable data_list

>>> vectorizer = BaseVectorizer[DT]()
>>> vectorizer = vectorizer.fit(data_list)
Parameters:

kwargs (Any) –

abstract fit_transform(x_data, **kwargs)[source]

Transform a list of data to a feature matrix. The transformation is based on the data contained in the parameter x_data. Subsequent transformations with transform() will be based on the fit of the data provided in this call.

Parameters:

x_data (Sequence[DT]) – A sequence of raw data examples with length n_examples

Returns:

A feature matrix with shape (n_examples, n_features)

Return type:

npt.NDArray[Any]

Examples

Assume the vectorizer is fitted

>>> x_mat = vectorizer.fit_transform(x_data)
Parameters:

kwargs (Any) –

property fitted: bool

Check if the vectorizer has been fitted

Returns:

True if the vectorizer has been fitted

Return type:

bool

property name: str
abstract transform(x_data, **kwargs)[source]

Transform a list raw data points to a feature matrix according to the fitted vectorizer

Parameters:

x_data (Sequence[DT]) – A sequence of raw data examples with length n_examples

Returns:

A feature matrix with shape (n_examples, n_features)

Return type:

npt.NDArray[Any]

Examples

Assume the vectorizer is fitted

>>> x_mat = vectorizer.transform(x_data)
Parameters:

kwargs (Any) –

class instancelib.feature_extraction.SklearnVectorizer(vectorizer, storage_location=None, filename=None)[source]

Bases: BaseVectorizer[str], SaveableInnerModel

Parameters:
fit(x_data, **kwargs)[source]

Fit the vectorizer according to the data in the given Sequence.

Parameters:

x_data (Sequence[DT]) – A Sequence of examples with type DT.

Returns:

A fitted vectorizer for data with type DT

Return type:

BaseVectorizer[DT]

Examples

Assume the creation of a vectorizer and a sequence of data examples in the variable data_list

>>> vectorizer = BaseVectorizer[DT]()
>>> vectorizer = vectorizer.fit(data_list)
Parameters:

kwargs (Any) –

fit_transform(x_data, **kwargs)[source]

Transform a list of data to a feature matrix. The transformation is based on the data contained in the parameter x_data. Subsequent transformations with transform() will be based on the fit of the data provided in this call.

Parameters:

x_data (Sequence[DT]) – A sequence of raw data examples with length n_examples

Returns:

A feature matrix with shape (n_examples, n_features)

Return type:

npt.NDArray[Any]

Examples

Assume the vectorizer is fitted

>>> x_mat = vectorizer.fit_transform(x_data)
Parameters:

kwargs (Any) –

innermodel: BaseEstimator
property name: str
transform(x_data, **kwargs)[source]

Transform a list raw data points to a feature matrix according to the fitted vectorizer

Parameters:

x_data (Sequence[DT]) – A sequence of raw data examples with length n_examples

Returns:

A feature matrix with shape (n_examples, n_features)

Return type:

npt.NDArray[Any]

Examples

Assume the vectorizer is fitted

>>> x_mat = vectorizer.transform(x_data)
Parameters:

kwargs (Any) –

class instancelib.feature_extraction.TextInstanceVectorizer(vectorizer)[source]

Bases: BaseVectorizer[Instance[Any, str, ndarray, Any]]

Parameters:

vectorizer (BaseVectorizer[str]) –

fit(x_data, **kwargs)[source]

Fit the vectorizer according to the data in the given Sequence.

Parameters:

x_data (Sequence[DT]) – A Sequence of examples with type DT.

Returns:

A fitted vectorizer for data with type DT

Return type:

BaseVectorizer[DT]

Examples

Assume the creation of a vectorizer and a sequence of data examples in the variable data_list

>>> vectorizer = BaseVectorizer[DT]()
>>> vectorizer = vectorizer.fit(data_list)
Parameters:

kwargs (Any) –

fit_transform(x_data, **kwargs)[source]

Transform a list of data to a feature matrix. The transformation is based on the data contained in the parameter x_data. Subsequent transformations with transform() will be based on the fit of the data provided in this call.

Parameters:

x_data (Sequence[DT]) – A sequence of raw data examples with length n_examples

Returns:

A feature matrix with shape (n_examples, n_features)

Return type:

npt.NDArray[Any]

Examples

Assume the vectorizer is fitted

>>> x_mat = vectorizer.fit_transform(x_data)
Parameters:

kwargs (Any) –

property fitted: bool

Check if the vectorizer has been fitted

Returns:

True if the vectorizer has been fitted

Return type:

bool

transform(x_data, **kwargs)[source]

Transform a list raw data points to a feature matrix according to the fitted vectorizer

Parameters:

x_data (Sequence[DT]) – A sequence of raw data examples with length n_examples

Returns:

A feature matrix with shape (n_examples, n_features)

Return type:

npt.NDArray[Any]

Examples

Assume the vectorizer is fitted

>>> x_mat = vectorizer.transform(x_data)
Parameters:

kwargs (Any) –