Hierarchical enrichment

Hierarchical enrichment is for encoding properties that reference multiple levels of annotations. For instance, something like speech rate of an utterance requires referencing both utterances as well as the rate per second of an annotation type below, usually syllables. Likewise, encoding number of syllables in a word or the position of a phone in a word again reference multiple levels of annotation.

Note

See Annotation Graphs for details on the implementation and representations of the annotation graph hierarchy that PolyglotDB uses.

Encode count

Count enrichment creates a property on the higher annotation that is a measure of the number of lower annotations of a type it contains. For instance, if we want to encode how many phones there are within each word, the following code is used:

with CorpusContext('corpus') as c:
    c.encode_count('word', 'phone', 'number_of_phones')

Following enrichment, all word tokens will have a property for number_of_phones that can be referenced in queries and exports.

Encode rate

Rate enrichment creates a property on a higher annotation that is a measure of lower annotations per second. It is calculated as the count of units contained by the higher annotation divided by the duration of the higher annotation.

with CorpusContext('corpus') as c:
    c.encode_rate('word', 'phone', 'phones_per_second')

Following enrichment, all word tokens will have a property for phones_per_second that can be referenced in queries and exports.

Encode position

Position enrichment creates a property on the lower annotation that is the position of the element in relation to other annotations within a higher annotation. It starts at 1 for the first element.

with CorpusContext('corpus') as c:
    c.encode_position('word', 'phone', 'position_in_word')

The encoded property is then queryable/exportable, as follows:

with CorpusContext('corpus') as c:
     q = c.query_graph(c.phone).filter(c.phone.position_in_word == 1)
     print(q.all())

The above query will match all phones in the first position (i.e., identical results to a query using alignment, see Hierarchical queries for more details on those).