Corpus API

Corpus classes

Base corpus

class polyglotdb.corpus.BaseContext(*args, **kwargs)[source]

Base CorpusContext class. Inherit from this and extend to create more functionality.


If the first argument is not a CorpusConfig object, it is the name of the corpus


If a CorpusConfig object is not specified, all arguments and keyword arguments are passed to a CorpusConfig object

Phonological functionality

class polyglotdb.corpus.PhonologicalContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with phones

Syllabic functionality

class polyglotdb.corpus.SyllabicContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with syllables

Lexical functionality

class polyglotdb.corpus.LexicalContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with words

Pause functionality

class polyglotdb.corpus.PauseContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with non-speech elements

Utterance functionality

class polyglotdb.corpus.UtteranceContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with utterances

Audio functionality

class polyglotdb.corpus.AudioContext(*args, **kwargs)[source]

Class that contains methods for dealing with audio files for corpora

Summarization functionality

class polyglotdb.corpus.SummarizedContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with summary measures for linguistic items

Spoken functionality

class polyglotdb.corpus.SpokenContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with speaker and sound file metadata

Structured functionality

class polyglotdb.corpus.StructuredContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with metadata for the corpus

Annotation functionality

class polyglotdb.corpus.AnnotatedContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with annotations on linguistic items (termed “subannotations” in PolyglotDB

Omnibus class

class polyglotdb.corpus.CorpusContext(*args, **kwargs)[source]

Main corpus context, inherits from the more specialized contexts.


Either a CorpusConfig object or sequence of arguments to be passed to a CorpusConfig object


sequence of keyword arguments to be passed to a CorpusConfig object

Corpus structure class

class polyglotdb.structure.Hierarchy(data=None, corpus_name=None)[source]

Class containing information about how a corpus is structured.

Hierarchical data is stored in the form of a dictionary with keys for linguistic types, and values for the linguistic type that contains them. If no other type contains a given type, its value is None.

Subannotation data is stored in the form of a dictionary with keys for linguistic types, and values of sets of types of subannotations.


Information about the hierarchy of linguistic types


Name of the corpus

Corpus config class

class polyglotdb.config.CorpusConfig(corpus_name, data_dir=None, **kwargs)[source]

Class for storing configuration information about a corpus.


Identifier for the corpus

kwargskeyword arguments

All keywords will be converted to attributes of the object


Identifier of the corpus


Username for connecting to the graph database


Password for connecting to the graph database


Host for the graph database


Port for connecting to the graph database


Type of SQL database


Base directory to store information and temporary files for the corpus defaults to “.pgdb” under the current user’s home directory