Utilities

Library-specific

create_tutorial_dataset(root: str) None[source]

Create tutorial dataset at root.

Parameters:

root (str) – The path where the dataset will be created.

Dataframe Manipulation

Dataframe manipulation utility functions.

common_rows(child: DataFrame, parent: DataFrame, child_on: Any | None = None, parent_on: Any | None = None) Series[source]

Return boolean mask where True if record common in Child and Parent.

Parameters:
  • child (pd.DataFrame) – The child dataframe.

  • parent (pd.DataFrame) – The parent dataframe.

  • child_on (Any) – Column or index level names in child to join on.

  • parent_on (Any) – Column or index level names in parent to join on.

Returns:

A Boolean series indicating common rows.

Return type:

pd.Series

convert_column_type(series: Series, type_string: str, **kwargs) Series[source]

Convert series to pandas dtype specified in type string representation.

Parameters:
  • series (pd.Series) – Series for which the dtype has to set.

  • type_string (str) – Supported dtype to which the series will be converted.

  • kwargs (key, value mappings) – Other keyword arguments are passed down to pandas.to_datetime if the dtype is ‘datetime’.

Returns:

The converted series with the appropriate dtype.

Return type:

pd.Series

python_to_pandas_type(python_type: Any) str[source]

Return pandas type equivalent of python type.

Parameters:

python_type (Any) – Python type for which pandas equivalent is required.

Returns:

String representation of a pandas dtype.

Return type:

str

Logging

Logging utility functions.

log_df(df: DataFrame, msg: str, hide: List[str] | str = [], level: int = 40, **kwargs) None[source]

Log DataFrame records with a message.

Parameters:
  • df (pandas.DataFrame) – DataFrame to be logged.

  • msg (str) – Log message compatible with string formatting.

  • hide (list-like or scalar, optional) – Sensitive column or columns to hide.

  • level (int, optional) – Logging level to use. Accepts logging.LEVEL values.

  • kwargs (key, value mappings) – Other keyword arguments are passed to str.format().

log_col(series: Series, msg: str, hide: bool = False, level: int = 40, **kwargs) None[source]

Log column values with a message.

Parameters:
  • series (pd.Series) – Series to be logged.

  • msg (str) – Log message compatible with string formatting.

  • hide (bool, optional) – If True, hides the values of the series.

  • level (int, optional) – Logging level to use. Accepts logging.LEVEL values.

  • kwargs (key, value mappings) – Other keyword arguments are passed to str.format().

General-purpose

General-purpose utility functions.

commafy(sequence: List[Any]) str[source]

Return the comma separated string version of a sequence.

copy(src: str, dst: str, overwrite: bool = False) None[source]

Copy content from source path to destination path.

deep_get(dictionary: Dict[Any, Any], keys: str, default: Any | None = None) Any[source]

dict.get() for nested dictionaries.

denest_dict(dictionary: Dict[Any, Any]) Dict[Any, Any][source]

Return denested dict with all nested elements removed.

filename(path: str) str[source]

Return the filename without extension given the path.

get_dir_contents(root: str, pattern: str, skip: List[str] | None = None) List[str][source]

Return list of contents in a directory that match pattern.

get_incomplete_keys(dict: Dict[Any, Any]) List[Any][source]

Return keys whose value are None.

get_metadata(path: str) Dict[str, Any][source]

Return dict equivalent of json in file.

listify(dictionary: dict) Dict[str, List][source]

Return dict with value type always List.

read_yaml(path: str) Dict[Any, Any][source]

Return dict equivalent of yaml in file.

read_multi_yaml(path: str) List[Dict[Any, Any]][source]

Return list of dict equivalents of yamls in file.

run_shell(cmd: str, suppress_output: bool = True) CompletedProcess[source]

Execute shell command in background.