instancelib.instances.hdf5 module

class instancelib.instances.hdf5.ExternalVectorInstanceProvider(*args, **kwds)[source]

Bases: InstanceProvider[IT, KT, DT, VT, RT], ABC, Generic[IT, KT, DT, VT, RT, MT]

abstract bulk_add_vectors(keys, values)[source]

This methods adds vectors in values to the instances specified in keys.

In some use cases, vectors are not known beforehand. This library provides several vectorizer s that convert raw data points in feature vector form. Once these vectors are available, they can be added to the provider by using this method

Parameters:
Return type:

None

Warning

We assume that the indices and length of the parameters keys and values match.

bulk_get_vectors(keys)[source]

Given a list of instance keys, return the vectors

Parameters:

keys (Sequence[KT]) – A list of vectors

Returns:

A tuple of two sequences, one with keys and one with vectors. The indices match, so the instance with keys[2] has as vector vectors[2]

Return type:

Tuple[Sequence[KT], Sequence[VT]]

Warning

Some underlying implementations do not preserve the ordering of the parameter keys. Therefore, always use the keys variable from the returned tuple for the correct matching.

abstract load_vectors()[source]
Return type:

VectorStorage[TypeVar(KT), TypeVar(VT), TypeVar(MT)]

vector_chunker(batch_size=200)[source]

Iterate over all pairs of keys and vectors in this provider

Parameters:

batch_size (int) – The batch size, the generator will return lists with size batch_size

Returns:

An iterator over sequences of key vector tuples

Return type:

Iterator[Sequence[Tuple[KT, VT]]]

Yields:

Sequence[Tuple[KT, VT]] – Sequences of key vector tuples

vector_chunker_selector(keys, batch_size=200)[source]

Iterate over all instances (with or without vectors) in belonging the identifier Iterable in the keys parameter.

Parameters:
  • keys (Iterable[KT]) – The keys that should should be chunked

  • batch_size (int) – The batch size, the generator will return lists with size batch_size

Yields:

Sequence[Instance[KT, DT, VT, RT]]] – A sequence of instances with length batch_size. The last list may have a shorter length.

Returns:

An iterator over sequences of key vector tuples

Return type:

Iterator[Sequence[Tuple[KT, VT]]]

vectorstorage: Optional[VectorStorage[TypeVar(KT), TypeVar(VT), TypeVar(MT)]]
class instancelib.instances.hdf5.HDF5VectorInstanceProvider(*args, **kwds)[source]

Bases: ExternalVectorInstanceProvider[IT, KT, DT, ndarray, RT, ndarray], Generic[IT, KT, DT, RT]

bulk_add_vectors(keys, values)[source]

This methods adds vectors in values to the instances specified in keys.

In some use cases, vectors are not known beforehand. This library provides several vectorizer s that convert raw data points in feature vector form. Once these vectors are available, they can be added to the provider by using this method

Parameters:
Return type:

None

Warning

We assume that the indices and length of the parameters keys and values match.

load_vectors()[source]
Return type:

HDF5VectorStorage[TypeVar(KT), Any]

vector_storage_location: PathLike[str]