User Guide

Installing

almirah can be installed with pip

$ python -m pip install almirah

Organizing the dataset

The first step to using any dataset is getting it in shape to allow manual exploration or automated retrieval of a subset. To get started, import the almirah module:

import almirah

almirah relies on two components to organize a dataset:

spec = Specification.create_from_file("/path to details")
spec.organize("/path to rules")

Note

For modalities such as Eye tracking, and Genomics where the BIDS-specification is still in the proposal stage or is only present as a descriptor, a custom BIDS-like specification that mimics the general pattern of the specification is used in the config.

Indexing

The indexing operation crawls through the organized dataset and stores files and directories that have a matching path in the specification in a database. This enables easy querying and filtering.

At an abstract level, each dataset can be thought of as a Layout of files. Each File is associated with a bunch of tags. Each Tag is a name:value pair that is derived from the filename and metadata files associated to a file. To create an instance of Layout, pass the root directory path of the dataset and the Specification name to Layout:

lay = Layout(root="/path to dataset", specification_name="name")

almirah automatically retrieves a previous index if the layout is found. If not, the Layout can be indexed using index(). Index changes and additions are not written unless commited using commit().

Tip

Setting valid_only = False does not limit the files indexed to only those that having matching paths in the specification associated. This can act as a quick way to index the whole directory or a dirty trick when you do not have time to redefine the specification to accomodate a new path.

It is also possible to have a collection of layouts as a :class`~dataset.Dataset`:

ds = Dataset(name="name")
ds.add(layout_1, layout_2,..., layout_n)

By this, parts of a dataset located in diverse paths can be virtually collected into one for querying.

Tip

Objects can be retrieved once commited from the index, or if present in the current session by using get(). To retrieve the a Layout of specification name bids, you can do Layout.get(specification_name='bids').

Filter and Query

To retrieve a subset of files that match certain tags, provide the criterions as keyword arguments to query() and File objects of passing files will be returned:

lay.query(subject = "A3456", extension = ".png")

Tip

If you do not know the possible tags, options() might be of help. option() is available for all objects in and can be used as a look-up.

Converting file formats

Sometimes you want to convert the file format of a file. For example, from DICOM to NIfTI, or from EDF (Eyelink Data Format) to ASCII. These are possible by provided the files to be converted, the output format desired, and the output directory as arguments to convert():

from almirah.utils.convert import convert

files = lay.query(extension = ".dcm")
convert(files, "NIfTI", <Layout of output dir>)

Currently, the following conversions are supported:

Input extensions

Output formats

Datatype

dcm

NIfTI

Magnetic resonance imaging

bdf, cnt, data, edf, gdf, mat, mff, nxe, set, vhdr

BrainVision, EDF, FIF

Electroencephalography

edf

ASCII

Eye tracking

nirx

SNIRF

Functional near-infrared spectroscopy

Interfacing with a Database

almirah can connect to databases supported by SQLAlchemy, Google sheets, and URL endpoints that support retrieval of database contents with Database. During object creation, name, host, and backend have to be provided to Database. Later, while querying a connection needs to established using connect() by providing the credentials:

import almirah

db = Database(name="db_name", host="db_host", backend="db_type")

# Create connection with database
db.connect("username", "password")

Only reading is supported is databases that are Google sheets or URL endpoints. Operations such as table creation, writing, metadata manipulation are only available in SQLAlchemy-valid databases.

To create a table in the database, the table schema is described by the Database mapping config and passed to create_table():

db.create_table({"mapping":"dict"})

To insert records into a table in the database, a pandas.DataFrame object whose columns match the table columns is provided as an argument to to_table() along with the table name:

db.to_table(df, "table name")

Important

If no table of the name is present, a table is created automatically. This might not be desirable if you would like to define relationships between tables as the created table is vanilla and lacks these.

get_records() can retrieve records from a table in the database given the table name. A subset of table columns can be provided via cols, if not all columns are to be retrieved.

db.get_records("table_name")

Reporting

High-level summaries of a dataset can be reported by using dataset.Database.report().

obj.report()

The tags based on which the summary is to be generated can be provided via the tags argument. subject is the used if no values are provided.

Errors and Exceptions

almirah wraps built-in python exceptions with appropriate messages, for example:

raise ValueError(f"Unsupported transform value {transform}")

See Exception for context.

Logging

If you are using the standard library logging module, almirah will emit several logs. In some cases, this can be undesireable. You can use the standard logger interface to change the log level for almirah’s logger:

logging.getLogger("almirah").setLevel(logging.WARNING)

The CALM-Brain Resource

If you would like to use almirah to access the CALM-Brain resource, visit the CALM-Brain wiki.