Specification details¶
A Specification is a agreed-upon standard that describes how a dataset should be organized. It includes the file paths allowed, the tags associated with a file, and their role in file name generation. This document will detail how to provide each of this options.
A minimal configuration of tags for a dataset looks like this:
tags:
- name: filename
pattern: "[/\\\\](.*)\\."
- name: extension
pattern: "(\\.[^/\\\\]+)$"
path_patterns:
- "{filename}{extension}"
Using the above minimal specification for path building with build_path():
# Build path according to specification with tags as parameters
path = specification.build_path(filename="file", extension=".txt")
# Print the built path.
print(path)
# file.txt
Top-level keys¶
path_patterns¶
All details regarding permissible file paths sit inside the
path_patterns key. The path_patterns key consists of a
sequence of paths relative to the dataset root. Usage of tag values in
paths is supported.
A path can contain both template and ordinary patterns. The template patterns are:
[contents]Used to indicate that the contents are optional.
The path
/dir[/subdir]/filewill match both/dir/fileanddir/subdir/file.{name<values>|default}Used to indicate that the template will be filled in by a tag value.
namerefers to the name of the tag,valuesrefers to the set of valid values separated by|, anddefaultrefers to the default value that is chosen while building path name from tags associated.valuesanddefaultare optional.The path
/dir/{filename<file1|file2>|file1}will match/dir/file1and/dir/file2but not/dir/file3. If nofilenametag is provided during path building,file1is chosen as the default.