instancelib.environment.base module
- class instancelib.environment.base.AbstractEnvironment(*args, **kwds)[source]
Bases:
Environment
[InstanceType
,KT
,DT
,VT
,RT
,LT
],ABC
,Generic
[InstanceType
,KT
,DT
,VT
,RT
,LT
]Environments provide an interface that enable you to access all data stored in the datasets. If there are labels stored in the environment, you can access these as well from here.
There are two important properties in every
Environment
:dataset()
: Contains all Instances of the original datasetlabels()
: Contains an object that allows you to access labels easily
Besides these properties, this object also provides methods to create new
InstanceProvider
objects that contain a subset of the set of all instances stored in this environment.Examples
Access the dataset:
>>> dataset = env.dataset >>> instance = next(iter(dataset.values()))
Access the labels:
>>> labels = env.labels >>> ins_lbls = labels.get_labels(instance)
Create a train-test split on the dataset (70 % train, 30 % test):
>>> train, test = env.train_test_split(dataset, 0.70)
- class instancelib.environment.base.Environment(*args, **kwds)[source]
Bases:
MutableMapping
[str
,InstanceProvider
[InstanceType
,KT
,DT
,VT
,RT
]],ABC
,Generic
[InstanceType
,KT
,DT
,VT
,RT
,LT
]Environments provide an interface that enable you to access all data stored in the datasets. If there are labels stored in the environment, you can access these as well from here.
There are two important properties in every
Environment
:dataset()
: Contains all Instances of the original datasetlabels()
: Contains an object that allows you to access labels easily
Besides these properties, this object also provides methods to create new
InstanceProvider
objects that contain a subset of the set of all instances stored in this environment.Examples
Access the dataset:
>>> dataset = env.dataset >>> instance = next(iter(dataset.values()))
Access the labels:
>>> labels = env.labels >>> ins_lbls = labels.get_labels(instance)
Create a train-test split on the dataset (70 % train, 30 % test):
>>> train, test = env.train_test_split(dataset, 0.70)
- add_vectors(keys, vectors)[source]
This method adds feature vectors or embeddings to instances associated with the keys in the first parameters. The sequences keys and vectors should have the same length.
- property all_datapoints: InstanceProvider[InstanceType, KT, DT, VT, RT]
This provider should include all instances in all providers. If there are any synthethic datapoints constructed, they should be also in here.
- Returns:
The all_datapoints
InstanceProvider
- Return type:
InstanceProvider[InstanceType, KT, DT, VT, RT]
Warning
Deprecated, use the all_instances property instead!
- abstract property all_instances: InstanceProvider[InstanceType, KT, DT, VT, RT]
This provider should include all instances in all providers. If there are any synthethic datapoints constructed, they should be also in here.
- Returns:
The all_instances
InstanceProvider
- Return type:
InstanceProvider[InstanceType, KT, DT, VT, RT]
- combine(*providers)[source]
Combine Providers into a single Provider
- Parameters:
providers (
InstanceProvider
[TypeVar
(InstanceType
, bound= Instance[Any, Any, Any, Any]),TypeVar
(KT
),TypeVar
(DT
),TypeVar
(VT
),TypeVar
(RT
)]) – The providers that should be combined into a single provider- Returns:
The provider that contains all elements of the supplied Providers
- Return type:
InstanceProvider[InstanceType, KT, DT, VT, RT]
- abstract create_bucket(keys)[source]
Create an InstanceProvider that contains certain keys found in this environment.
- abstract create_empty_provider()[source]
Use this method to create an empty InstanceProvider
- Returns:
The newly created provider
- Return type:
InstanceProvider[InstanceType, KT, DT, VT, RT]
- abstract property dataset: InstanceProvider[InstanceType, KT, DT, VT, RT]
This property contains the InstanceProvider that contains the original dataset. This provider should include all original instances.
- Returns:
The dataset
InstanceProvider
- Return type:
InstanceProvider[InstanceType, KT, DT, VT, RT]
- get_subset_by_labels(provider, *labels, labelprovider=None)[source]
- abstract property labels: LabelProvider[KT, LT]
This property contains provider that has a mapping from instances to labels and vice-versa.
- Returns:
The label provider
- Return type:
- to_pandas(provider=None, labels=None, instance_viewer=<function default_instance_viewer>, label_viewer=<function default_label_viewer>, provider_hooks=[])[source]
- Parameters:
provider (
Optional
[InstanceProvider
[TypeVar
(InstanceType
, bound= Instance[Any, Any, Any, Any]),TypeVar
(KT
),TypeVar
(DT
),TypeVar
(VT
),TypeVar
(RT
)]]) –labels (
Optional
[LabelProvider
[TypeVar
(KT
),TypeVar
(LT
)]]) –instance_viewer (
Callable
[[Instance
[TypeVar
(KT
),TypeVar
(DT
),TypeVar
(VT
),TypeVar
(RT
)]],Mapping
[str
,Any
]]) –label_viewer (
Callable
[[TypeVar
(KT
),LabelProvider
[TypeVar
(KT
),TypeVar
(LT
)]],Mapping
[str
,Any
]]) –provider_hooks (
Sequence
[Callable
[[InstanceProvider
[TypeVar
(InstanceType
, bound= Instance[Any, Any, Any, Any]),TypeVar
(KT
),TypeVar
(DT
),TypeVar
(VT
),TypeVar
(RT
)]],Mapping
[TypeVar
(KT
),Mapping
[str
,Any
]]]]) –
- Return type:
- train_test_split(source, train_size)[source]
Divide an InstanceProvider into two different providers containing a random division of the input according to the parameter train_size.
- Parameters:
Examples
Example usage
>>> train_val, test = env.train_test_split(provider, 0.70) >>> train, val = env.train_test_split(train_val, 0.70)
- Returns:
- A Tuple containing two InstanceProviders:
The training set (containing train_size documents)
The test set
- Return type:
Tuple[InstanceProvider[InstanceType, KT, DT, VT, RT], InstanceProvider[InstanceType, KT, DT, VT, RT]]