Corpus API

Corpus classes

Base corpus

class polyglotdb.corpus.BaseContext(*args, **kwargs)[source]

Base CorpusContext class. Inherit from this and extend to create more functionality.

Parameters:
*args

If the first argument is not a CorpusConfig object, it is the name of the corpus

**kwargs

If a CorpusConfig object is not specified, all arguments and keyword arguments are passed to a CorpusConfig object

Phonological functionality

class polyglotdb.corpus.PhonologicalContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with phones

Syllabic functionality

class polyglotdb.corpus.SyllabicContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with syllables

Lexical functionality

class polyglotdb.corpus.LexicalContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with words

Pause functionality

class polyglotdb.corpus.PauseContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with non-speech elements

Utterance functionality

class polyglotdb.corpus.UtteranceContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with utterances

Audio functionality

class polyglotdb.corpus.AudioContext(*args, **kwargs)[source]

Class that contains methods for dealing with audio files for corpora

Summarization functionality

class polyglotdb.corpus.SummarizedContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with summary measures for linguistic items

Spoken functionality

class polyglotdb.corpus.SpokenContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with speaker and sound file metadata

Structured functionality

class polyglotdb.corpus.StructuredContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with metadata for the corpus

Annotation functionality

class polyglotdb.corpus.AnnotatedContext(*args, **kwargs)[source]

Class that contains methods for dealing specifically with annotations on linguistic items (termed “subannotations” in PolyglotDB

Omnibus class

class polyglotdb.corpus.CorpusContext(*args, **kwargs)[source]

Main corpus context, inherits from the more specialized contexts.

Parameters:
argsargs

Either a CorpusConfig object or sequence of arguments to be passed to a CorpusConfig object

kwargskwargs

sequence of keyword arguments to be passed to a CorpusConfig object

Corpus structure class

class polyglotdb.structure.Hierarchy(data=None, corpus_name=None)[source]

Class containing information about how a corpus is structured.

Hierarchical data is stored in the form of a dictionary with keys for linguistic types, and values for the linguistic type that contains them. If no other type contains a given type, its value is None.

Subannotation data is stored in the form of a dictionary with keys for linguistic types, and values of sets of types of subannotations.

Parameters:
datadict

Information about the hierarchy of linguistic types

corpus_namestr

Name of the corpus

Corpus config class

class polyglotdb.config.CorpusConfig(corpus_name, data_dir=None, **kwargs)[source]

Class for storing configuration information about a corpus.

Parameters:
corpus_namestr

Identifier for the corpus

kwargskeyword arguments

All keywords will be converted to attributes of the object

Attributes:
corpus_namestr

Identifier of the corpus

graph_userstr

Username for connecting to the graph database

graph_passwordstr

Password for connecting to the graph database

graph_hoststr

Host for the graph database

graph_portint

Port for connecting to the graph database

enginestr

Type of SQL database

base_dirstr

Base directory to store information and temporary files for the corpus defaults to “.pgdb” under the current user’s home directory