Parser Classes

Base parser

class polyglotdb.io.parsers.base.BaseParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]

Base parser, extend this class for new parsers.

Parameters:

annotation_tiers: list: Annotation types of the files to parse
hierarchyHierarchy: Details of how linguistic types relate to one another
make_transcriptionbool, defaults to True: If true, create a word attribute for transcription based on segments that are contained by the word
stop_checkcallable, optional: Function to check whether to halt parsing
call_backcallable, optional: Function to output progress messages

TextGrid parser

class polyglotdb.io.parsers.textgrid.TextgridParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]

Parser for Praat TextGrid files.

Parameters:

annotation_tiers: list: Annotation types of the files to parse
hierarchyHierarchy: Details of how linguistic types relate to one another
make_transcriptionbool, defaults to True: If true, create a word attribute for transcription based on segments that are contained by the word
stop_checkcallable, optional: Function to check whether to halt parsing
call_backcallable, optional: Function to output progress messages

Forced alignment output parser

class polyglotdb.io.parsers.aligner.AlignerParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Base class for parsing TextGrid output from forced aligners.

Parameters:

annotation_tierslist: List of the annotation tiers to store data from the TextGrid
hierarchyHierarchy: Basic hierarchy of the TextGrid
make_transcriptionbool: Flag for whether to add a transcription property to words based on phones they contain
stop_checkcallable: Function to check for whether parsing should stop
call_backcallable: Function to report progress in parsing

Attributes:

word_labelstr: Label identifying word tiers
phone_labelstr: Label identifying phone tiers
namestr: Name of the aligner the TextGrids are from
speaker_firstbool: Whether speaker names precede tier types in the TextGrid when multiple speakers are present

load_textgrid(path)

Load a TextGrid file

Parameters:

pathstr: Path to the TextGrid file

Returns:

TextGrid: TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:

filenamestr: the filename of the file being checked

Returns:

boolean: True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)[source]

Parse a forced aligned TextGrid file for later importing.

Parameters:

pathstr: Path to TextGrid file
types_onlybool: Flag for whether to only save type information, ignoring the token information

Returns:

DiscourseData: Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:

pathstr: a path to the corpus
corpus_namestr: name of the corpus

Returns:

data.typeslist: a list of data types

MFA

class polyglotdb.io.parsers.mfa.MfaParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids generated by the Montreal Forced Aligner.

load_textgrid(path)

Load a TextGrid file

Parameters:

pathstr: Path to the TextGrid file

Returns:

TextGrid: TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:

filenamestr: the filename of the file being checked

Returns:

boolean: True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)

Parse a forced aligned TextGrid file for later importing.

Parameters:

pathstr: Path to TextGrid file
types_onlybool: Flag for whether to only save type information, ignoring the token information

Returns:

DiscourseData: Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:

pathstr: a path to the corpus
corpus_namestr: name of the corpus

Returns:

data.typeslist: a list of data types

FAVE

class polyglotdb.io.parsers.fave.FaveParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids generated by the FAVE-align.

load_textgrid(path)

Load a TextGrid file

Parameters:

pathstr: Path to the TextGrid file

Returns:

TextGrid: TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:

filenamestr: the filename of the file being checked

Returns:

boolean: True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)

Parse a forced aligned TextGrid file for later importing.

Parameters:

pathstr: Path to TextGrid file
types_onlybool: Flag for whether to only save type information, ignoring the token information

Returns:

DiscourseData: Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:

pathstr: a path to the corpus
corpus_namestr: name of the corpus

Returns:

data.typeslist: a list of data types

MAUS

class polyglotdb.io.parsers.maus.MausParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids generated by the Web-MAUS aligner.

load_textgrid(path)

Load a TextGrid file

Parameters:

pathstr: Path to the TextGrid file

Returns:

TextGrid: TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:

filenamestr: the filename of the file being checked

Returns:

boolean: True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)

Parse a forced aligned TextGrid file for later importing.

Parameters:

pathstr: Path to TextGrid file
types_onlybool: Flag for whether to only save type information, ignoring the token information

Returns:

DiscourseData: Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:

pathstr: a path to the corpus
corpus_namestr: name of the corpus

Returns:

data.typeslist: a list of data types

TIMIT parser

class polyglotdb.io.parsers.timit.TimitParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]

Parser for the TIMIT corpus.

Has annotation types for word labels and surface transcription labels.

Parameters:

annotation_tiers: list: Annotation types of the files to parse
hierarchyHierarchy: Details of how linguistic types relate to one another
stop_checkcallable, optional: Function to check whether to halt parsing
call_backcallable, optional: Function to output progress messages

Buckeye parser

class polyglotdb.io.parsers.buckeye.BuckeyeParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]

Parser for the Buckeye corpus.

Has annotation types for word labels, word transcription, word part of speech, and surface transcription labels.

Parameters:

annotation_tiers: list: Annotation types of the files to parse
hierarchyHierarchy: Details of how linguistic types relate to one another
stop_checkcallable, optional: Function to check whether to halt parsing
call_backcallable, optional: Function to output progress messages

LaBB-CAT parser

class polyglotdb.io.parsers.labbcat.LabbCatParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids exported from LaBB-CAT

Parameters:

annotation_tierslist: List of the annotation tiers to store data from the TextGrid
hierarchyHierarchy: Basic hierarchy of the TextGrid
make_transcriptionbool: Flag for whether to add a transcription property to words based on phones they contain
stop_checkcallable: Function to check for whether parsing should stop
call_backcallable: Function to report progress in parsing

Speaker parsers

Filename Speaker Parser

class polyglotdb.io.parsers.speaker.FilenameSpeakerParser(number_of_characters, left_orientation=True)[source]

Class for parsing a speaker name from a path that gets a specified number of characters from either the left or the right of the base file name.

Parameters:

number_of_charactersint: Number of characters to include in the speaker designation, set to 0 to get the full file name
left_orientationbool: Whether to pull characters from the left or right of the base file name, defaults to True

Directory Speaker Parser

class polyglotdb.io.parsers.speaker.DirectorySpeakerParser[source]: Class for parsing a speaker name from a path that gets the directory immediately containing the file and uses its name as the speaker name