instancelib.environment.memory module

class instancelib.environment.memory.AbstractMemoryEnvironment(*args, **kwds)[source]

Bases: AbstractEnvironment[InstanceType, KT, DT, VT, RT, LT], ABC, Generic[InstanceType, KT, DT, VT, RT, LT]

Environments provide an interface that enable you to access all data stored in the datasets. If there are labels stored in the environment, you can access these as well from here.

There are two important properties in every Environment:

dataset(): Contains all Instances of the original dataset
labels(): Contains an object that allows you to access labels easily

Besides these properties, this object also provides methods to create new InstanceProvider objects that contain a subset of the set of all instances stored in this environment.

Variables:

_public_dataset – An InstanceProvider that contains all original Instances
_dataset – An InstanceProvider that contains all instances
_labelprovider – This object contains all labels
_named_provider – All user generated providers that were given a name

Examples

Access the dataset:

>>> dataset = env.dataset
>>> instance = next(iter(dataset.values()))

Access the labels:

>>> labels = env.labels
>>> ins_lbls = labels.get_labels(instance)

Create a train-test split on the dataset (70 % train, 30 % test):

>>> train, test = env.train_test_split(dataset, 0.70)

property all_instances: InstanceProvider[InstanceType, KT, DT, VT, RT]

This provider should include all instances in all providers. If there are any synthethic datapoints constructed, they should be also in here.

Returns:: The all_instances InstanceProvider
Return type:: InstanceProvider[InstanceType, KT, DT, VT, RT]

create_bucket(keys)[source]

Create an InstanceProvider that contains certain keys found in this environment.

Parameters:: keys (Iterable[KT]) – The keys that should be included in this bucket
Returns:: An InstanceProvider that contains the instances specified in keys
Return type:: InstanceProvider[InstanceType, KT, DT, VT, RT]

create_empty_provider()[source]

Use this method to create an empty InstanceProvider

Returns:: The newly created provider
Return type:: InstanceProvider[InstanceType, KT, DT, VT, RT]

create_named_provider(name)[source]

Parameters:: name (str) –
Return type:: InstanceProvider[TypeVar(InstanceType, bound= Instance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]

property dataset: InstanceProvider[InstanceType, KT, DT, VT, RT]

This property contains the InstanceProvider that contains the original dataset. This provider should include all original instances.

Returns:: The dataset InstanceProvider
Return type:: InstanceProvider[InstanceType, KT, DT, VT, RT]

property labels: LabelProvider[KT, LT]

This property contains provider that has a mapping from instances to labels and vice-versa.

Returns:: The label provider
Return type:: LabelProvider[KT, LT]

set_named_provider(name, value)[source]

Parameters:

name (str) –
value (InstanceProvider[TypeVar(InstanceType, bound= Instance[Any, Any, Any, Any]), TypeVar(KT), TypeVar(DT), TypeVar(VT), TypeVar(RT)]) –

class instancelib.environment.memory.MemoryEnvironment(dataset, labelprovider)[source]

Bases: AbstractMemoryEnvironment[InstanceType, KT, DT, VT, RT, LT], Generic[InstanceType, KT, DT, VT, RT, LT]

This class implements the ABC Environment. In this method, all data is loaded and stored in RAM and is not preserved on disk. There are two important properties in every Environment:

dataset(): Contains all Instances of the original dataset
labels(): Contains an object that allows you to access labels easily

Besides these properties, this object also provides methods to create new InstanceProvider objects that contain a subset of the set of all instances stored in this environment.

Parameters:

dataset (InstanceProvider[InstanceType, KT, DT, VT, RT]) – An InstanceProvider that contains all Instances
labelprovider (MemoryLabelProvider[KT, LT]) – The label provider that contains the labels associated with the instances from the dataset variable

Variables:

_public_dataset – An InstanceProvider that contains all original Instances
_dataset – An InstanceProvider that contains all instances
_labelprovider – This object contains all labels
_named_provider – All user generated providers that were given a name

Examples

Access the dataset:

>>> dataset = env.dataset
>>> instance = next(iter(dataset.values()))

Access the labels:

>>> labels = env.labels
>>> ins_lbls = labels.get_labels(instance)

Create a train-test split on the dataset (70 % train, 30 % test):

>>> train, test = env.train_test_split(dataset, 0.70)

Store the environment to disk:

>>> import pickle
>>> with open("file.pkl", "wb") as fh:
...     pickle.dump(env, fh)
>>> print("The file is saved to file.pkl")

Load the environment from disk:

>>> import pickle
>>> with open("file.pkl", "rb") as fh:
...     env = pickle.load(fh)
>>> dataset = env.dataset

classmethod shuffle(env, rng=Generator(PCG64) at 0x7FA122512040)[source]

Shuffle the identifiers according to a mapping

Parameters:

env (Self) – The environment that needs to be shuffled
mapping (Mapping[KT, KT]) – The mapping that is to be used
rng (Generator) –

Returns:

A shuffled environment

Return type:

Self