instancelib.ingest.spreadsheet module
- instancelib.ingest.spreadsheet.build_environment(df, label_mapper, labels, data_cols, label_cols)[source]
Build an environment from a data frame
- Parameters:
- Returns:
A MemoryEnvironment that contains the
- Return type:
MemoryEnvironment[int, str, npt.NDArray[Any], str]
- instancelib.ingest.spreadsheet.build_environment_with_id(df, label_mapper, labels, id_col, data_cols, label_cols)[source]
- instancelib.ingest.spreadsheet.build_from_multiple_dfs(df_dict, label_mapper, labels, data_cols, label_cols)[source]
Build an environment from a data frame
- Parameters:
- Returns:
A MemoryEnvironment that contains the
- Return type:
MemoryEnvironment[int, str, npt.NDArray[Any], str]
- instancelib.ingest.spreadsheet.build_from_multiple_dfs_with_ids(df_dict, label_mapper, labels, id_col, data_cols, label_cols)[source]
Build an environment from a data frame
- Parameters:
df (pd.DataFrame) – A data frame that contains all texts and labels
label_mapping (Mapping[int, str]) – A mapping from indices to label strings
data_cols (Sequence[str]) – A sequence of columns that contain the texts
label_col (str) – The name of the column that contains the label data
id_col (
str) –
- Returns:
A MemoryEnvironment that contains the
- Return type:
MemoryEnvironment[int, str, npt.NDArray[Any], str]
- instancelib.ingest.spreadsheet.extract_data(dataset_df, data_cols, labelfunc)[source]
Extract text data and labels from a dataframe
- instancelib.ingest.spreadsheet.extract_data_with_id(dataset_df, id_col, data_cols, labelfunc)[source]
Extract text data and labels from a dataframe
- Parameters:
- Returns:
[description]
- Return type:
- instancelib.ingest.spreadsheet.identity_mapper(value)[source]
Coerces any value to its string represenation
- Parameters:
value (Any) – Any value that can be coerced into a string
- Returns:
The string representation of the value. If coercion somehow failed, it will return None.
- Return type:
Optional[str]
- instancelib.ingest.spreadsheet.instance_extractor(df, id_extractor, data_extractor, vector_extractor, repr_extractor, label_extractor, builder)[source]
- Parameters:
- Return type:
Iterator[Tuple[TypeVar(KT),TypeVar(IT, bound= Instance[Any, Any, Any, Any]),FrozenSet[TypeVar(LT)]]]
- instancelib.ingest.spreadsheet.inv_transform_mapping(columns, row, label_mapper=<function identity_mapper>)[source]
Convert the numeric coded label in column column_name in row row to a string according to the mapping in label_mapping.
- Parameters:
- Returns:
A set of labels that belong to the row
- Return type:
FrozenSet[str]
- instancelib.ingest.spreadsheet.pandas_to_env(df, data_cols, label_cols, labels=None)[source]
- instancelib.ingest.spreadsheet.pandas_to_env_with_id(df, id_col, data_cols, label_cols, labels=None)[source]
- instancelib.ingest.spreadsheet.read_csv_dataset(path, data_cols, label_cols, labels=None, label_mapper=<function identity_mapper>)[source]
Read Excel filse that contain text data
- Parameters:
path (Union[str, PathLike[str]]) – The path to the Excel file
data_cols (Sequence[str]) – The columns that contain the text data
label_cols (Sequence[str]) – The columns that contain the columns
labels (Optional[Iterable[str]], optional) – The set of labels that are possible. If None, the set will be inferred from data This parameter is by default None
label_mapper (Callable[[Any], Optional[str]], optional) – A function that transferm labels into another representation This paramater is by default
identity_mapper(), which just outputs its input.
- Returns:
An environment that contains all the information from the Excel file
- Return type:
AbstractEnvironment[TextInstance[int, npt.NDArray[Any]], Union[int, UUID], str, npt.NDArray[Any], str, str]
- instancelib.ingest.spreadsheet.read_excel_dataset(path, data_cols, label_cols, labels=None, label_mapper=<function identity_mapper>)[source]
Read csv datasets that contain text data
- Parameters:
data_cols (Sequence[str]) – The columns that contain the text data
label_cols (Sequence[str]) – The columns that contain the columns
labels (Optional[Iterable[str]], optional) – The set of labels that are possible. If None, the set will be inferred from data This parameter is by default None
label_mapper (Callable[[Any], Optional[str]], optional) – A function that transferm labels into another representation This paramater is by default
identity_mapper(), which just outputs its input.
- Returns:
An environment that contains all the information from the CSV file
- Return type:
AbstractEnvironment[TextInstance[int, npt.NDArray[Any]], Union[int, UUID], str, npt.NDArray[Any], str, str]
- instancelib.ingest.spreadsheet.text_builder(identifier, data, vector, representation, row, idx)[source]
- instancelib.ingest.spreadsheet.text_from_pandas_multilabel(df_dict, text_cols, label_cols, labelset)[source]
- instancelib.ingest.spreadsheet.to_environment(prov_builder, labelprov_builder, dictionaries)[source]
- Parameters:
prov_builder (
Callable[[Mapping[TypeVar(KT),TypeVar(IT, bound= Instance[Any, Any, Any, Any])]],InstanceProvider[TypeVar(IT, bound= Instance[Any, Any, Any, Any]),TypeVar(KT),TypeVar(DT),TypeVar(VT),TypeVar(RT)]]) –labelprov_builder (
Callable[[Mapping[TypeVar(KT),FrozenSet[TypeVar(LT)]]],LabelProvider[TypeVar(KT),TypeVar(LT)]]) –dictionaries (
Tuple[Mapping[TypeVar(KT),TypeVar(IT, bound= Instance[Any, Any, Any, Any])],Mapping[TypeVar(KT),FrozenSet[TypeVar(LT)]]]) –
- Return type:
AbstractEnvironment[TypeVar(IT, bound= Instance[Any, Any, Any, Any]),TypeVar(KT),TypeVar(DT),TypeVar(VT),TypeVar(RT),TypeVar(LT)]