instancelib.instances.tablebacked module

class instancelib.instances.tablebacked.RowInstance(provider, data, vector=None)[source]

Bases: Mapping[str, Any], Instance[KT, Mapping[str, Any], VT, Mapping[str, Any]], Generic[IT, KT, DT, VT, RT, MT]

Parameters:
property columns: Sequence[str]
property data: Mapping[str, Any]

Return the raw data of this instance

Returns:

The Raw Data

Return type:

DT

property representation: Mapping[str, Any]

Return a representation for annotation

Returns:

A representation of the raw data

Return type:

RT

property vector: VT | None

Get the vector represenation of the raw data

Returns:

The Vector

Return type:

Optional[VT]

class instancelib.instances.tablebacked.TableInstance(identifier, data, vector=None, data_extractor=<instancelib.instances.extractors.ColumnExtractor object>, repr_extractor=<instancelib.instances.extractors.ColumnExtractor object>)[source]

Bases: MutableMapping[str, Any], UpdateHookInstance[KT, DT, VT, RT], Generic[IT, KT, DT, VT, RT, MT]

Parameters:
property columns: Sequence[str]
property data: DT

Return the raw data of this instance

Returns:

The Raw Data

Return type:

DT

property identifier: KT

Get the identifier of the instance

Returns:

The identifier key of the instance

Return type:

KT

property representation: RT

Return a representation for annotation

Returns:

A representation of the raw data

Return type:

RT

property vector: VT | None

Get the vector represenation of the raw data

Returns:

The Vector

Return type:

Optional[VT]

class instancelib.instances.tablebacked.TableProvider(storage, columns, vectors, builder, children, parents)[source]

Bases: MemoryChildrenMixin[IT, KT, DT, VT, RT], TableProviderRO[IT, KT, DT, VT, RT, MT], InstanceProvider[IT, KT, DT, VT, RT], Generic[IT, KT, DT, VT, RT, MT]

Parameters:
builder: Callable[[InstanceProvider[TypeVar(IT, bound= UpdateHookInstance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)], TypeVar(KT), Mapping[str, Any], Optional[TypeVar(VT)]], TypeVar(IT, bound= UpdateHookInstance[Any, Any, Any, Any])]
construct(**kwargs)[source]
Parameters:
  • args (Any) –

  • kwargs (Any) –

Return type:

TypeVar(IT, bound= UpdateHookInstance[Any, Any, Any, Any])

create(*args, **kwargs)[source]

Create a new instance of type InstanceType. The created instance is subsequently added to the provider.

Note: The number of arguments and keyword arguments may differ in actual implementation, so there are no standard arguments.

Returns:

The new instance Type

Return type:

InstanceType

Parameters:
  • args (Any) –

  • kwargs (Any) –

storage: MutableMapping[TypeVar(KT), MutableMapping[str, Any]]
class instancelib.instances.tablebacked.TableProviderRO(storage, columns, vectors, builder)[source]

Bases: ROInstanceProvider[IT, KT, DT, VT, RT], Generic[IT, KT, DT, VT, RT, MT]

Parameters:
builder: Callable[[TypeVar(KT), Mapping[str, Any], Optional[TypeVar(VT)]], TypeVar(IT, bound= UpdateHookInstance[Any, Any, Any, Any])]
bulk_get_all()[source]

Returns a list of all instances in this provider.

Returns:

A list of all instances in this provider

Return type:

List[Instance[KT, DT, VT, RT]]

Warning

When using this method on very large providers with lazily loaded instances, this may yield Out of Memory errors, as all the data will be loaded into RAM. Use with caution!

bulk_get_vectors(keys)[source]

Given a list of instance keys, return the vectors

Parameters:

keys (Sequence[KT]) – A list of vectors

Returns:

A tuple of two sequences, one with keys and one with vectors. The indices match, so the instance with keys[2] has as vector vectors[2]

Return type:

Tuple[Sequence[KT], Sequence[VT]]

Warning

Some underlying implementations do not preserve the ordering of the parameter keys. Therefore, always use the keys variable from the returned tuple for the correct matching.

clear()[source]

Removes all instances from the provider :rtype: None

Warning

Use this operation with caution! This operation is intended for use with providers that function as temporary user queues, not for large proportions of the dataset like unlabeled and labeled sets.

columns: Sequence[str]
property empty: bool

Determines if the provider does not contain instances

Returns:

True if the provider is empty

Return type:

bool

get_all()[source]

Get an iterator that iterates over all instances

Yields:

InstanceType – An iterator that iterates over all instances

Return type:

Iterator[TypeVar(IT, bound= UpdateHookInstance[Any, Any, Any, Any])]

storage: Mapping[TypeVar(KT), Mapping[str, Any]]
vector_chunker_selector(keys, batch_size=200)[source]

Iterate over all instances (with or without vectors) in belonging the identifier Iterable in the keys parameter.

Parameters:
  • keys (Iterable[KT]) – The keys that should should be chunked

  • batch_size (int) – The batch size, the generator will return lists with size batch_size

Yields:

Sequence[Instance[KT, DT, VT, RT]]] – A sequence of instances with length batch_size. The last list may have a shorter length.

Returns:

An iterator over sequences of key vector tuples

Return type:

Iterator[Sequence[Tuple[KT, VT]]]

vectors: VectorStorage[TypeVar(KT), TypeVar(VT), TypeVar(MT)]