instancelib.environment.base module

class instancelib.environment.base.AbstractEnvironment(*args, **kwds)[source]

Bases: Environment[InstanceType, KT, DT, VT, RT, LT], ABC, Generic[InstanceType, KT, DT, VT, RT, LT]

Environments provide an interface that enable you to access all data stored in the datasets. If there are labels stored in the environment, you can access these as well from here.

There are two important properties in every Environment:

  • dataset(): Contains all Instances of the original dataset

  • labels(): Contains an object that allows you to access labels easily

Besides these properties, this object also provides methods to create new InstanceProvider objects that contain a subset of the set of all instances stored in this environment.

Examples

Access the dataset:

>>> dataset = env.dataset
>>> instance = next(iter(dataset.values()))

Access the labels:

>>> labels = env.labels
>>> ins_lbls = labels.get_labels(instance)

Create a train-test split on the dataset (70 % train, 30 % test):

>>> train, test = env.train_test_split(dataset, 0.70)
class instancelib.environment.base.Environment(*args, **kwds)[source]

Bases: MutableMapping[str, InstanceProvider[InstanceType, KT, DT, VT, RT]], ABC, Generic[InstanceType, KT, DT, VT, RT, LT]

Environments provide an interface that enable you to access all data stored in the datasets. If there are labels stored in the environment, you can access these as well from here.

There are two important properties in every Environment:

  • dataset(): Contains all Instances of the original dataset

  • labels(): Contains an object that allows you to access labels easily

Besides these properties, this object also provides methods to create new InstanceProvider objects that contain a subset of the set of all instances stored in this environment.

Examples

Access the dataset:

>>> dataset = env.dataset
>>> instance = next(iter(dataset.values()))

Access the labels:

>>> labels = env.labels
>>> ins_lbls = labels.get_labels(instance)

Create a train-test split on the dataset (70 % train, 30 % test):

>>> train, test = env.train_test_split(dataset, 0.70)
add_vectors(keys, vectors)[source]

This method adds feature vectors or embeddings to instances associated with the keys in the first parameters. The sequences keys and vectors should have the same length.

Parameters:
  • keys (Sequence[KT]) – A sequence of keys

  • vectors (Sequence[VT]) – A sequence of vectors that should be associated with the instances of the sequence keys

Return type:

None

property all_datapoints: InstanceProvider[InstanceType, KT, DT, VT, RT]

This provider should include all instances in all providers. If there are any synthethic datapoints constructed, they should be also in here.

Returns:

The all_datapoints InstanceProvider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

Warning

Deprecated, use the all_instances property instead!

abstract property all_instances: InstanceProvider[InstanceType, KT, DT, VT, RT]

This provider should include all instances in all providers. If there are any synthethic datapoints constructed, they should be also in here.

Returns:

The all_instances InstanceProvider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

combine(*providers)[source]

Combine Providers into a single Provider

Parameters:

providers (InstanceProvider[TypeVar(InstanceType, bound= Instance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]) – The providers that should be combined into a single provider

Returns:

The provider that contains all elements of the supplied Providers

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

create(*args, **kwargs)[source]

Create a new Instance

Returns:

A new instance

Return type:

InstanceType

Parameters:
  • args (Any) –

  • kwargs (Any) –

abstract create_bucket(keys)[source]

Create an InstanceProvider that contains certain keys found in this environment.

Parameters:

keys (Iterable[KT]) – The keys that should be included in this bucket

Returns:

An InstanceProvider that contains the instances specified in keys

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

abstract create_empty_provider()[source]

Use this method to create an empty InstanceProvider

Returns:

The newly created provider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

abstract create_named_provider(name, keys=[])[source]
Parameters:
Return type:

InstanceProvider[TypeVar(InstanceType, bound= Instance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]

abstract property dataset: InstanceProvider[InstanceType, KT, DT, VT, RT]

This property contains the InstanceProvider that contains the original dataset. This provider should include all original instances.

Returns:

The dataset InstanceProvider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

discard_children(parent)[source]

Discard all children from this parent

Parameters:

parent (Union[KT, Instance[KT, DT, VT, RT]]) – The parent Instance

Return type:

None

get_children(parent)[source]

Get the children that are registered to this parent

Parameters:

parent (Union[KT, Instance[KT, DT, VT, RT]]) – The parent from which you want to get the children from.

Returns:

A Provider that contains all children

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

get_parent(child)[source]

Get the parent of a child

Parameters:

child (Union[KT, Instance[KT, DT, VT, RT]]) – A child instance from which you want to get the children from.

Returns:

The parent of this child instance

Return type:

InstanceType

get_subset_by_labels(provider, *labels, labelprovider=None)[source]
Parameters:
Return type:

InstanceProvider[TypeVar(InstanceType, bound= Instance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]

abstract property labels: LabelProvider[KT, LT]

This property contains provider that has a mapping from instances to labels and vice-versa.

Returns:

The label provider

Return type:

LabelProvider[KT, LT]

property named_providers: Mapping[str, InstanceProvider[InstanceType, KT, DT, VT, RT]]
abstract set_named_provider(name, value)[source]
Parameters:
to_pandas(provider=None, labels=None, instance_viewer=<function default_instance_viewer>, label_viewer=<function default_label_viewer>, provider_hooks=[])[source]
Parameters:
Return type:

DataFrame

train_test_split(source, train_size)[source]

Divide an InstanceProvider into two different providers containing a random division of the input according to the parameter train_size.

Parameters:
  • source (InstanceProvider[InstanceType, KT, DT, VT, RT]) – The InstanceProvider that should be divided

  • train_size (Union[float, int]) – The number (int) of instances that should be included in the training or a float (between 0 and 1) of train / test ratio.

Examples

Example usage

>>> train_val, test = env.train_test_split(provider, 0.70)
>>> train, val = env.train_test_split(train_val, 0.70)
Returns:

A Tuple containing two InstanceProviders:
  • The training set (containing train_size documents)

  • The test set

Return type:

Tuple[InstanceProvider[InstanceType, KT, DT, VT, RT], InstanceProvider[InstanceType, KT, DT, VT, RT]]