Database and DBManager

class DBManager[source]

Bases: object

Manages database connections and sessions.

Session = None
property connection: Connection

Provides a context-managed connection to the database.

property session: None

Provides a context-managed session for performing database operations.

Database functionality for data write and access.

class Database[source]

Bases: Component

Generic database representation.

id: Mapped[int]
name: Mapped[str]
host: Mapped[str]
backend: Mapped[str]
property connection
property worksheet
property meta
property token
build_column(name, dtype, **kwargs)[source]

Build SQLalchemy Column object given column description.

build_constraint(cols, links)[source]

Build SQLalchmy constraint objects given links.

create_table(table, cols, refs=[])[source]

Create or extend table within database given the description.

Parameters:
  • table_name (str) – Name of the table to create or extend.

  • columns (list of dict) – List of columns specifications to add to the table.

  • constraints (list of dict, optional) – List of table constraints.

connect(username=None, password=None, keyfile=None)[source]

Establish a connection to the database.

Parameters:
  • username (str) – The database username.

  • password (str) – The database password.

  • keyfile (str) – Path to the service account keyfile. Required if backend is gsheet.

Raises:
  • SQLAlchemyError – If the connection cannot be established at database.

  • ValueError – If the connection cannot be established at URL endpoint.

get_primary(table)[source]

Return priamry keys for table.

get_records(table, cols=None)[source]

Retrieve records from table in database as a DataFrame.

Parameters:
  • table (str) – Table in database from which to retrieve records.

  • cols (list of str) – Column names to select from table.

to_table(df, table, check_dups=True, resolve_dups=False, check_fks=True, resolve_fks=False, insert_ignore=False, drop_na=None, threshold=None, if_exists='append', index=False, insert_method=None, **kwargs)[source]

Write records in DataFrame to a table.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing records.

  • table (str) – Table in database into which the records will be inserted.

  • check_dups (bool, default True) – Check for duplicates in records.

  • resolve_dups ([False, 'first', 'last'], default False) – Resolution method for duplicates if found.

  • check_fks (bool, default False) – Check if foreign keys present in parent.

  • resolve_fks (bool, default False) – Attempt to resolve missing foreign keys by inserting to parent.

  • insert_ignore (bool, default Flase) – Ignore insertion of records already present in table.

  • drop_na (list of df column names, default None) – If provided, records with na in all given columns are dropped.

  • threshold (int, None) – If non-NA values < threshold, then record dropped.

  • if_exists (['append', 'replace', 'fail'], default 'append') – Insert behavior in case table exists. - ‘append’ : Insert new values to the existing table. - ‘replace’ : Drop the table before inserting new values. - ‘fail’ : Raise a ValueError if table exists.

  • index (bool, default False) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • insert_method ({None, 'multi', callable}, optional) – Controls the SQL insertion clause used. - None : Uses standard SQL INSERT clause (one per row). - ‘multi’: Pass multiple values in a single INSERT clause. - callable with signature (pd_table, conn, keys, data_iter). Details and a sample callable implementation can be found on Insertion method section of Insertion method.

  • kwargs (key, value mappings) – Other keyword arguments are passed down to pandas.DataFrame.to_sql.

Returns:

rows – Number of rows affected by to_sql. None is returned if the callable passed into insert_method does not return an integer number of rows.

Return type:

None or int

Raises:
  • ValueError

    • When values provided are not sufficient for insert operation. - When the table already exists and if_exists is ‘fail’.

  • OperationalError

    • Most likely there are duplicates records in the DataFrame. Other reasons are related to the database operation and are detailed on OperationalError.

resolve_dups(df, table, resolve=False)[source]

Resolve duplicate primary keys.

Parameters:
  • df (pandas.DataFrame) – DataFrame to check for duplicate records.

  • table (str) – Table the records will be inserted into.

  • resolve ([False, 'first', 'last'], default False) – Determines resolution method. * False : Mark all duplicates as False. * ‘first’ : Mark duplicates as False except for first occurence. * ‘last’ : Mark duplicates as False except for last occurence.

Returns:

mask – Series of booleans showing whether each record in the Dataframe is not a duplicate.

Return type:

pandas.Series

Raises:

ValueError – When primary key duplicates are found and resolve is True.

resolve_fks(df, table, resolve=False)[source]

Resolve missing foreign keys.

Parameters:
  • df (pandas.DataFrame) – Non-duplicated records to be checked.

  • table (str) – Table the records will be inserted into.

  • resolve (bool, optional) – If True, attempt to resolve missing parent records by inserting.

Returns:

mask – Series of booleans showing whether each record in the DataFrame has all required parent records.

Return type:

pandas.Series

Warning

Resolution fails with a ValueError when a foreign key constraint on a table does not reference all the columns that provide the required values to insert records into the parent table.

report()[source]

Generate report for database.

query(returns=None, **filters)[source]

Return table records that fit filter criteria.

classmethod get(**filters) Base | None

Return an instance from index if found, else None.

Parameters:

filters (dict) – Dictionary of filter conditions used for querying.

Return type:

An instance of cls or None if not found.

classmethod get_identifiers(**kwargs) Dict[str, Any]

Return dictionary of class identifier attributes and their values.

Parameters:

kwargs (key, value pairs) – Key-value pairs of identifier attributes and their values.

Return type:

A dictionary of the identifiers with their respective values.

classmethod options(**filters) List[Base]

Return all existing instances from index.

Parameters:

filters (dict) – Dictionary of filter conditions used for querying.

Return type:

A list of instances of cls that match the filters.

type: Mapped[str]
check_for_key(key, mapping)[source]

Return columns in table mapping which contain the key provided.

migrate(src, dst, mapping, dry_run=False, na_vals=[], dtype_kws=None, **kwargs)[source]

Transform and migrate records from one database to another.

Parameters:
  • src (str) – Connection established source Database instance from which data is to be transferred.

  • dst (str) – Connection established destination Database instance to which data is to be transferred.

  • mapping (dict) – Dictionary providing information on column mappings, type, transformations, and validation checks.

  • dry_run (bool) – If dry run, no records are inserted into target and tables are not created, but errors and invalid records are logged.

  • na_vals (list, optional) – List of values to be considered as record not available. By default, the values ‘’, ‘None’, ‘NONE’, ‘NA’, and ‘Not Applicable’ are considered.

  • dtype_kws (dict, optional) – Key, value pairs that will be passed to almirah.utils.df.convert_column_type() kwargs.

  • kwargs (key, value mappings) – Other keyword arguments are passed down to almirah.Database.to_table.

reshape(records, steps)[source]

Reshape records into appropriate shape.

replace_value(value, column, mapping, file)[source]

Return unique replacement for given value based on mapping in file.

replace_column(series, value, to, file, strict=True)[source]

Replace values in series based on mapping in file.

transform(records, dtype_kws, mapping)[source]

Transform table into appropriate format.

transform_column(series, dtype_kws={}, **kwargs)[source]

Transform column to appropriate datatype.

validate(records, mapping)[source]

Validate table and return mask where True means valid.

validate_column(series, **kwargs)[source]

Validate column and return mask where True means valid.