instancelib.environment.memory module

class instancelib.environment.memory.AbstractMemoryEnvironment(*args, **kwds)[source]

Bases: AbstractEnvironment[InstanceType, KT, DT, VT, RT, LT], ABC, Generic[InstanceType, KT, DT, VT, RT, LT]

Environments provide an interface that enable you to access all data stored in the datasets. If there are labels stored in the environment, you can access these as well from here.

There are two important properties in every Environment:

  • dataset(): Contains all Instances of the original dataset

  • labels(): Contains an object that allows you to access labels easily

Besides these properties, this object also provides methods to create new InstanceProvider objects that contain a subset of the set of all instances stored in this environment.

Variables:
  • _public_dataset – An InstanceProvider that contains all original Instances

  • _dataset – An InstanceProvider that contains all instances

  • _labelprovider – This object contains all labels

  • _named_provider – All user generated providers that were given a name

Examples

Access the dataset:

>>> dataset = env.dataset
>>> instance = next(iter(dataset.values()))

Access the labels:

>>> labels = env.labels
>>> ins_lbls = labels.get_labels(instance)

Create a train-test split on the dataset (70 % train, 30 % test):

>>> train, test = env.train_test_split(dataset, 0.70)
property all_instances: InstanceProvider[InstanceType, KT, DT, VT, RT]

This provider should include all instances in all providers. If there are any synthethic datapoints constructed, they should be also in here.

Returns:

The all_instances InstanceProvider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

create_bucket(keys)[source]

Create an InstanceProvider that contains certain keys found in this environment.

Parameters:

keys (Iterable[KT]) – The keys that should be included in this bucket

Returns:

An InstanceProvider that contains the instances specified in keys

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

create_empty_provider()[source]

Use this method to create an empty InstanceProvider

Returns:

The newly created provider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

create_named_provider(name)[source]
Parameters:

name (str) –

Return type:

InstanceProvider[TypeVar(InstanceType, bound= Instance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]

property dataset: InstanceProvider[InstanceType, KT, DT, VT, RT]

This property contains the InstanceProvider that contains the original dataset. This provider should include all original instances.

Returns:

The dataset InstanceProvider

Return type:

InstanceProvider[InstanceType, KT, DT, VT, RT]

property labels: LabelProvider[KT, LT]

This property contains provider that has a mapping from instances to labels and vice-versa.

Returns:

The label provider

Return type:

LabelProvider[KT, LT]

set_named_provider(name, value)[source]
Parameters:
class instancelib.environment.memory.MemoryEnvironment(dataset, labelprovider)[source]

Bases: AbstractMemoryEnvironment[InstanceType, KT, DT, VT, RT, LT], Generic[InstanceType, KT, DT, VT, RT, LT]

This class implements the ABC Environment. In this method, all data is loaded and stored in RAM and is not preserved on disk. There are two important properties in every Environment:

  • dataset(): Contains all Instances of the original dataset

  • labels(): Contains an object that allows you to access labels easily

Besides these properties, this object also provides methods to create new InstanceProvider objects that contain a subset of the set of all instances stored in this environment.

Parameters:
  • dataset (InstanceProvider[InstanceType, KT, DT, VT, RT]) – An InstanceProvider that contains all Instances

  • labelprovider (MemoryLabelProvider[KT, LT]) – The label provider that contains the labels associated with the instances from the dataset variable

Variables:
  • _public_dataset – An InstanceProvider that contains all original Instances

  • _dataset – An InstanceProvider that contains all instances

  • _labelprovider – This object contains all labels

  • _named_provider – All user generated providers that were given a name

Examples

Access the dataset:

>>> dataset = env.dataset
>>> instance = next(iter(dataset.values()))

Access the labels:

>>> labels = env.labels
>>> ins_lbls = labels.get_labels(instance)

Create a train-test split on the dataset (70 % train, 30 % test):

>>> train, test = env.train_test_split(dataset, 0.70)

Store the environment to disk:

>>> import pickle
>>> with open("file.pkl", "wb") as fh:
...     pickle.dump(env, fh)
>>> print("The file is saved to file.pkl")

Load the environment from disk:

>>> import pickle
>>> with open("file.pkl", "rb") as fh:
...     env = pickle.load(fh)
>>> dataset = env.dataset
classmethod shuffle(env, rng=Generator(PCG64) at 0x7FA122512040)[source]

Shuffle the identifiers according to a mapping

Parameters:
  • env (Self) – The environment that needs to be shuffled

  • mapping (Mapping[KT, KT]) – The mapping that is to be used

  • rng (Generator) –

Returns:

A shuffled environment

Return type:

Self