Corpus API
Corpus classes
Base corpus
- class polyglotdb.corpus.BaseContext(*args, **kwargs)[source]
Base CorpusContext class. Inherit from this and extend to create more functionality.
- Parameters:
- *args
If the first argument is not a
CorpusConfig
object, it is the name of the corpus- **kwargs
If a
CorpusConfig
object is not specified, all arguments and keyword arguments are passed to a CorpusConfig object
Phonological functionality
Syllabic functionality
Lexical functionality
Pause functionality
Utterance functionality
Audio functionality
Summarization functionality
Spoken functionality
Structured functionality
Annotation functionality
Omnibus class
- class polyglotdb.corpus.CorpusContext(*args, **kwargs)[source]
Main corpus context, inherits from the more specialized contexts.
- Parameters:
- argsargs
Either a CorpusConfig object or sequence of arguments to be passed to a CorpusConfig object
- kwargskwargs
sequence of keyword arguments to be passed to a CorpusConfig object
Corpus structure class
- class polyglotdb.structure.Hierarchy(data=None, corpus_name=None)[source]
Class containing information about how a corpus is structured.
Hierarchical data is stored in the form of a dictionary with keys for linguistic types, and values for the linguistic type that contains them. If no other type contains a given type, its value is
None
.Subannotation data is stored in the form of a dictionary with keys for linguistic types, and values of sets of types of subannotations.
- Parameters:
- datadict
Information about the hierarchy of linguistic types
- corpus_namestr
Name of the corpus
Corpus config class
- class polyglotdb.config.CorpusConfig(corpus_name, data_dir=None, **kwargs)[source]
Class for storing configuration information about a corpus.
- Parameters:
- corpus_namestr
Identifier for the corpus
- kwargskeyword arguments
All keywords will be converted to attributes of the object
- Attributes:
- corpus_namestr
Identifier of the corpus
- graph_userstr
Username for connecting to the graph database
- graph_passwordstr
Password for connecting to the graph database
- graph_hoststr
Host for the graph database
- graph_portint
Port for connecting to the graph database
- enginestr
Type of SQL database
- base_dirstr
Base directory to store information and temporary files for the corpus defaults to “.pgdb” under the current user’s home directory