2. Metadata schemas: technical specifications#

This article describes how the metadata schemas used in the ManGO portal are stored and represented, from the folder structure that supports the schemas lifecycle to the JSON format that codes the different fields and their characteristics.

Section 1 gives a brief overview of the lifecycle of a schema and how that is coded into the folder structure and filename system. Section 2 follows with a technical description of the JSON file that represents each version of a specific schema, followed by Section 3, in which the different kinds of fields are described. Finally, Section 4 shows an example of the JSON file for a draft version.

Note

If you would like to design your own metadata schema for Tier-1 Data without using the Metadata Schema Manager, you should focus on Section 3 and create a JSON file that matches the value of properties in the main JSON. On upload to Tier-1 Data, you will be able to provide the name and title of your schema, and the versioning will be taken care of in the backend.

Before we go into these sections, here is some useful vocabulary:

(metadata) schema

A set of rules to apply metadata in a systematic way; a collection of fields with format instructions for a specific AVU.

schema version

A specific version of a schema, with a given status. Multiple versions of a given schema may co-exist, only one can be in a ‘published’ status, meaning that it can be applied to data.

field

A component of a schema with instructions for a specific AVU or for multiple AVUs that have the same name (or prefix, in the case of a composite field).

to apply a schema / annotate with a schema

The action of adding or editing metadata of a data object or collection based on a given schema.

2.1. Lifecycle and folder structure#

In the Tier-1 Data infrastructure, schemas belong to “realms”, such as projects (other implementations of this infrastructure could extend this to personal collections). When designing a schema, one must first select a realm; at the moment, a schema designed within a certain realm can only be used to apply metadata to data of that realm.

For each realm, there is a directory “schemas” that contains all the schemas designed within it. Each schema has its own subdirectory, which contains one JSON file for each existing version of the schema. In the example used for illustration in this article, the directory would be called “book”, which is also the schema_name attribute in the JSON file of each version.

There can be any number of versions of a schema, following semantic versioning (although for now only major versions are supported), and each version can have one of three states: “draft”, “published” or “archived”.

The draft status is the first, default state of a schema, although it is possible to publish a schema directly without saving the draft first. It can be edited, viewed and deleted, but it cannot be applied.

Once a draft is published, it cannot be edited or deleted anymore, but it can be viewed and, more importantly, it can be applied. Attempts to edit the published version of a schema will result in the creation of a new draft with a higher version number. If this draft is then published, the current published version is archived. Moreover, the metadata schema manager also allows you to clone (or “copy”) a published schema into a draft of a whole new schema with a different name and version 1.0.0.

Archived versions cannot be edited, deleted or used. At the moment they cannot be viewed either, but this will be addressed in the future. A published version can also be purposefully archived, without having to publish a draft that replaces it. You can still have data with metadata based on an archived version of a schema, but if you try to reapply the metadata schema you will only be able to use the current published version, overriding any differences between the version originally used for annotation and the current version.

Table 1 summarizes what can be done with a metadata schema version depending on its stage.

Table 2.1 Table 1: Summary of metadata schema version states in Tier-1 Data.#

draft

published

archived

when:

on creation

on publication

by archiving

can be edited

can be viewed

can be applied

can be deleted

The name of a file corresponding to a version of a schema includes the name, version and (unless archived) the status, accordign to the following convention: {schema_name}-v{version}(-{status}).json, with status being one of “draft” or “published”. For example, when we first create the “book” schema used for illustration in this document, a file will be created called “book-v1.0.0-draft.json”, which will be stored inside the “book” subdirectory of the “schemas” directory. As shown in Section 2, the version and status are also registered as attributes inside the JSON file. Once we are ready to make it available for annotation, we can publish the version in the metadata schema manager, which will update the status inside the JSON file and rename the file itself as “book-v1.0.0-published.json”. If we want to create a new version, this will generate a new file “book-v2.0.0-draft.json”, which will have the same name, title and realm as the previous version but a different version number and status. Publishing this new version will change its status and rename the file as “book-v2.0.0-published.json”, but it will also archive the first version. This means that the older file will become “book-v1.0.0.json” (without a suffix indicating the status) and change the status inside the JSON file to “archived”.

As mentioned above, if we have already annotated data using version 1.0.0 of the “book” schema, that metadata will remain unchanged unless we try to update it. In that case, fields that have not changed between versions will be untouched, whereas fields that were deleted in version 2.0.0 will be permanently deleted, and those that were added will become available.

2.2. JSON format#

A specific version of a metadata schema will be stored in a json file with a series of key-value pairs.

{
    "schema_name" : "book",
    "version" : "1.0.0",
    "status" : "draft",
    "properties" : {...},
    "title" : "Book schema as an example",
    "edited_by" : "username",
    "realm" : "project_collection",
    "parent" : ""
}

The schema_name attributes indicates the name or ID of the schema, i.e. the namespace of the AVUs assigned via this schema. In this example, all the attribute names generated with this schema will be prefixed with mgs.book., where mgs refers to “ManGO schema”. The status attribute refers to the state in the lifecycle as described in Section 1, and with version they constitute the main characteristics to distinguish between versions of a schema.

The title of a schema is used in the UI of the schema manager and when implementing schemas as a the user-facing label. The edited-by attribute is self-explanatory. As introduced above, realm refers to the space (such as a project) to which the schema belongs and in which it can be used. The parent attribute is relevant when a schema has been initialized as clone of an existing schema; in that case, it records the name and version of the schema it originated from.

The value of the properties element is itself a series of key-value pairs indicating fields of the metadata schema. The key is the ID of the field (how it is defined in the namespace of the schema) and the value is itself a series of key-value pairs describing the field. The format of these objects is documented in Section 3.

The order of the attributes is not important, but the order of the fields inside properties will define the order they take when rendering the form used to assing metadata from the schema.

2.3. Schema fields#

There are three main kinds of fields that can be included in a metadata schema: simple fields, multiple-choice fields and composite fields. Simple fields, described in Section 3.2, include any form of text or numeric input for which a pattern or range may be defined but not, strictly speaking, the possible values. It also includes single (boolean) checkboxes. Multiple-choice fields (Section 3.3) include any field that provides a specific, limited selection of possible values. Finally, the composite fields, described in Section 3.4, are mini-schemas: collections of fields of other kinds related to each other.

Each field is represented by a key-value pair in the properties element of the schema JSON. Before going through the specific characteristics of each kind of field, Section 3.1 offers an overview of their common attributes.

2.3.1. General Attributes#

The following attributes are used in at least two different kinds of fields.

title

All fields in a metadata schema must include the title attribute, which provides a user-facing, human-readable label. While the ID or name of the field is used in the AVU itself, the title is used in the schema manager, during annotation and when we inspect the existing metadata in the ManGO portal.

type

All fields need a type attribute indicating the kind of field they represent. The possible values are discussed in the sections dedicated to each type of field.

required

Simple fields and single-value multiple-choice fields may contain an optional boolean required attribute indicating whether the field is required when assigning metadata from the schema. A required field needs to be filled for the metadata form to be submitted. If this attribute is missing, it is read as “false”.

default

Simple fields and single-value multiple-choice fields, if required is true, may also contain a default attribute providing a default value for the field.

In the metadata schema manager, the title, id and (if relevant) default attributes are provided via text input fields and required via a switch button. In contrast, type is defined by the choice of field in the metadata schema manager, except for simple fields, in which there is an additional dropdown to select among its various subtypes.

2.3.2. Simple fields#

The prototypical example of a simple field is a text field, such as the example below. They key “title” indicates that, when assigning metadata via this field, the name will be msg.book.title.

"title" : {
    "type" : "text",
    "title" : "Book title",
    "required" : true
}

The type attribute can have one of several different values, to be selected from a dropdown menu when designing an instance of this field. Next to the basic “text” value, other standard inputs are available that provide minimal validation: “date”, “time”, “email”, or “url”. For a longer-form, non-restricted text output, the “textarea” value is also an option; in that case, it is not longer possible to provide a default value.

For numeric inputs, the possible types are “integer” or “float”. Fields with these types also have two optional key-value pairs indicating the range of allowed values:

"copies_published": {
    "type": "integer",
    "title": "Number of copies published",
    "minimum": "100"
},
"market_price": {
    "type": "float",
    "title": "Market price (in euros)",
    "minimum": "0.99",
    "maximum": "999.99"
}

Finally, it is also possible to create an individual checkbox (with type “checkbox”), which takes the value “true” when checked and no value when unchecked.

Except for the “checkbox”, all the other simple field types can additionally have a repeatable attribute. If “true”, the field can be copied when assinging the metadata to a collection or data object, in order to generate multiple AVUs with the same attribute name and different values.

In the metadata schema manager, minimum and maximum values for numeric types can be provided via numeric input fields, whereas the repeatable attribute is selected via a switch button.

2.3.3. Multiple-choice#

Multiple-choice fields are indicated by providing the “select” value to the type attribute. They are characterized by a restricted selection of possible values for the metadata field they define. These values are indicated as a list in the values attribute:

"ebook": {
    "type": "select",
    "multiple": false,
    "ui": "radio",
    "values": [
        "Available",
        "Unavailable"
        ],
    "title": "Is there an e-book?",
    "required": true
}

The metadata schema manager offers two types of multiple-choice fields: single-value and multiple-value. The former represents radio buttons and classic dropdowns in which the user must choose up to one of the possible options. The latter, in contrast, represents checkboxes and dropdowns in which the user may choose more than one of the possible options. This choice is coded in the multiple attribute, which takes the “false” value in the first case and “true” in the second.

In addition, the ui attribute indicates what the field will look like in the form used to apply the schema. Its value can be “dropdown”, “checkbox” (if multiple is “true”) or “radio” (if multiple is “false”). This choice is made via a switch button in the metadata schema manager.

In the metadata schema manager, each value of the list of options must be provided manually and then can be edited, deleted or reordered. It is not yet possible to import a list of values from an external source.

2.3.4. Composite field#

Composite fields are miniature schemas nested inside schemas (or other composite fields) and are meant to bring together multiple fields that conceptually come together. They take the type “object”, which is assigned when the composite field is selected in the metadata schema manager. Like for schemas, they have a properties attribute describing the fields it is composed of.

"author": {
    "type": "object",
    "title": "Author",
    "properties": {
        "name": {
            "type": "text",
            "title": "Name and Surname",
            "required": true
        },
        "age": {
            "type": "integer",
            "title": "Age",
            "minimum": "12",
            "maximum": "99"
        },
        "email": {
            "type": "email",
            "title": "Email address",
            "required": true,
            "repeatable": true
        }
    }
}

Composite fields cannot be required: this is a property of their components. Currently, they cannot be repeatable either, but that might change in the future.

In practical terms, composite fields generate a nested namespace for the AVUs they contain. As an example, the fields shown in Section 3.2 would be coded with the names msg.book.title, msg.book.copies_published and msg.book.market_price, and the one shown in Section 3.3 as msg.book.ebook. In contrast, the composite field shown above results in AVUs with attribute names msg.book.author.name, msg.book.author.age and msg.book.author.email.

2.4. Full example#

This section contains the full example of a JSON file that represents a schema draft.

  1{
  2    "schema_name": "book",
  3    "version" : "1.0.0",
  4    "status" : "draft",
  5    "properties": {
  6        "title": {
  7            "type": "text",
  8            "title": "Book title",
  9            "required": true
 10        },
 11        "cover_colors": {
 12            "type": "select",
 13            "multiple": true,
 14            "ui": "checkbox",
 15            "title": "Colors in the cover",
 16            "values": [
 17                "red",
 18                "blue",
 19                "green",
 20                "yellow"
 21            ]
 22        },
 23        "publisher": {
 24            "type": "select",
 25            "multiple": false,
 26            "ui": "dropdown",
 27            "values": [
 28                "Penguin House",
 29                "Tor",
 30                "Corgi",
 31                "Nightshade books"
 32            ],
 33            "title": "Publishing house",
 34            "required": true
 35        },
 36        "author": {
 37            "type": "object",
 38            "title": "Author",
 39            "properties": {
 40                "name": {
 41                    "type": "text",
 42                    "title": "Name and Surname",
 43                    "required": true
 44                },
 45                "age": {
 46                    "type": "integer",
 47                    "title": "Age",
 48                    "minimum": "12",
 49                    "maximum": "99"
 50                },
 51                "email": {
 52                    "type": "email",
 53                    "title": "Email address",
 54                    "required": true,
 55                    "repeatable": true
 56                }
 57            }
 58        },
 59        "ebook": {
 60            "type": "select",
 61            "multiple": false,
 62            "ui": "radio",
 63            "values": [
 64                "Available",
 65                "Unavailable"
 66            ],
 67            "title": "Is there an e-book?",
 68            "required": true
 69        },
 70        "genre": {
 71            "type": "select",
 72            "multiple": true,
 73            "ui": "dropdown",
 74            "values": [
 75                "Speculative fiction",
 76                "Mystery",
 77                "Non-fiction",
 78                "Encyclopaedia",
 79                "Memoir",
 80                "Literary fiction"
 81            ],
 82            "title": "Genre"
 83        },
 84        "publishing_date": {
 85            "type": "date",
 86            "title": "Publishing date",
 87            "required": true,
 88            "repeatable": true
 89        },
 90        "copies_published": {
 91            "type": "integer",
 92            "title": "Number of copies published",
 93            "minimum": "100"
 94        },
 95        "market_price": {
 96            "type": "float",
 97            "title": "Market price (in euros)",
 98            "minimum": "0.99",
 99            "maximum": "999.99"
100        },
101        "website": {
102            "type": "url",
103            "title": "Website"
104        },
105        "synopsis": {
106            "type": "textarea",
107            "title": "Synopsis"
108        }
109    },
110    "title": "Book schema as an example",
111    "edited_by" : "username",
112    "realm" : "project_collection",
113    "parent" : ""
114}