Parser Classes¶

Base parser¶

class polyglotdb.io.parsers.base.BaseParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]¶

Base parser, extend this class for new parsers.

Parameters:

annotation_tiers: list: Annotation types of the files to parse
hierarchy : Hierarchy: Details of how linguistic types relate to one another
make_transcription : bool, defaults to True: If true, create a word attribute for transcription based on segments that are contained by the word
stop_check : callable, optional: Function to check whether to halt parsing
call_back : callable, optional: Function to output progress messages

match_extension(filename)[source]¶

Ensures that filename ends with acceptable extension

Parameters:	filename : str the filename of the file being checked
Returns:	boolean True if filename is acceptable extension, false otherwise

parse_discourse(name, types_only=False)[source]¶

Parse annotations for later importing.

Parameters:	name : str Name of the discourse types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data

parse_information(path, corpus_name)[source]¶

Parses types out of a corpus

Parameters:	path : str a path to the corpus corpus_name : str name of the corpus
Returns:	data.types : list a list of data types

TextGrid parser¶

class polyglotdb.io.parsers.textgrid.TextgridParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]¶

Parser for Praat TextGrid files.

Parameters:

annotation_tiers: list: Annotation types of the files to parse
hierarchy : Hierarchy: Details of how linguistic types relate to one another
make_transcription : bool, defaults to True: If true, create a word attribute for transcription based on segments that are contained by the word
stop_check : callable, optional: Function to check whether to halt parsing
call_back : callable, optional: Function to output progress messages

load_textgrid(path)[source]¶

Load a TextGrid file

Parameters:	path : str Path to the TextGrid file
Returns:	`TextGrid` TextGrid object

parse_discourse(path, types_only=False)[source]¶

Parse a TextGrid file for later importing.

Parameters:	path : str Path to TextGrid file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data from the file

Forced alignment output parser¶

class polyglotdb.io.parsers.aligner.AlignerParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶

Base class for parsing TextGrid output from forced aligners.

Parameters:

annotation_tiers : list: List of the annotation tiers to store data from the TextGrid
hierarchy : Hierarchy: Basic hierarchy of the TextGrid
make_transcription : bool: Flag for whether to add a transcription property to words based on phones they contain
stop_check : callable: Function to check for whether parsing should stop
call_back : callable: Function to report progress in parsing

Attributes:

word_label : str: Label identifying word tiers
phone_label : str: Label identifying phone tiers
name : str: Name of the aligner the TextGrids are from
speaker_first : bool: Whether speaker names precede tier types in the TextGrid when multiple speakers are present

load_textgrid(path)¶

Load a TextGrid file

Parameters:	path : str Path to the TextGrid file
Returns:	`TextGrid` TextGrid object

match_extension(filename)¶

Ensures that filename ends with acceptable extension

Parameters:	filename : str the filename of the file being checked
Returns:	boolean True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)[source]¶

Parse a forced aligned TextGrid file for later importing.

Parameters:	path : str Path to TextGrid file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data from the file

parse_information(path, corpus_name)¶

Parses types out of a corpus

Parameters:	path : str a path to the corpus corpus_name : str name of the corpus
Returns:	data.types : list a list of data types

MFA¶

class polyglotdb.io.parsers.mfa.MfaParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶

Parser for TextGrids generated by the Montreal Forced Aligner.

load_textgrid(path)¶

Load a TextGrid file

Parameters:	path : str Path to the TextGrid file
Returns:	`TextGrid` TextGrid object

match_extension(filename)¶

Ensures that filename ends with acceptable extension

Parameters:	filename : str the filename of the file being checked
Returns:	boolean True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)¶

Parse a forced aligned TextGrid file for later importing.

Parameters:	path : str Path to TextGrid file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data from the file

parse_information(path, corpus_name)¶

Parses types out of a corpus

Parameters:	path : str a path to the corpus corpus_name : str name of the corpus
Returns:	data.types : list a list of data types

FAVE¶

class polyglotdb.io.parsers.fave.FaveParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶

Parser for TextGrids generated by the FAVE-align.

load_textgrid(path)¶

Load a TextGrid file

Parameters:	path : str Path to the TextGrid file
Returns:	`TextGrid` TextGrid object

match_extension(filename)¶

Ensures that filename ends with acceptable extension

Parameters:	filename : str the filename of the file being checked
Returns:	boolean True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)¶

Parse a forced aligned TextGrid file for later importing.

Parameters:	path : str Path to TextGrid file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data from the file

parse_information(path, corpus_name)¶

Parses types out of a corpus

Parameters:	path : str a path to the corpus corpus_name : str name of the corpus
Returns:	data.types : list a list of data types

MAUS¶

class polyglotdb.io.parsers.maus.MausParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶

Parser for TextGrids generated by the Web-MAUS aligner.

load_textgrid(path)¶

Load a TextGrid file

Parameters:	path : str Path to the TextGrid file
Returns:	`TextGrid` TextGrid object

match_extension(filename)¶

Ensures that filename ends with acceptable extension

Parameters:	filename : str the filename of the file being checked
Returns:	boolean True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)¶

Parse a forced aligned TextGrid file for later importing.

Parameters:	path : str Path to TextGrid file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data from the file

parse_information(path, corpus_name)¶

Parses types out of a corpus

Parameters:	path : str a path to the corpus corpus_name : str name of the corpus
Returns:	data.types : list a list of data types

TIMIT parser¶

class polyglotdb.io.parsers.timit.TimitParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]¶

Parser for the TIMIT corpus.

Has annotation types for word labels and surface transcription labels.

Parameters:	annotation_tiers: list Annotation types of the files to parse hierarchy : `Hierarchy` Details of how linguistic types relate to one another stop_check : callable, optional Function to check whether to halt parsing call_back : callable, optional Function to output progress messages

parse_discourse(word_path, types_only=False)[source]¶

Parse a TIMIT file for later importing.

Parameters:	word_path : str Path to TIMIT .wrd file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data from the file

Buckeye parser¶

class polyglotdb.io.parsers.buckeye.BuckeyeParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]¶

Parser for the Buckeye corpus.

Has annotation types for word labels, word transcription, word part of speech, and surface transcription labels.

Parameters:	annotation_tiers: list Annotation types of the files to parse hierarchy : `Hierarchy` Details of how linguistic types relate to one another stop_check : callable, optional Function to check whether to halt parsing call_back : callable, optional Function to output progress messages

parse_discourse(word_path, types_only=False)[source]¶

Parse a Buckeye file for later importing.

Parameters:	word_path : str Path to Buckeye .words file types_only : bool Flag for whether to only save type information, ignoring the token information
Returns:	`DiscourseData` Parsed data

LaBB-CAT parser¶

class polyglotdb.io.parsers.labbcat.LabbCatParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶

Parser for TextGrids exported from LaBB-CAT

Parameters:

annotation_tiers : list: List of the annotation tiers to store data from the TextGrid
hierarchy : Hierarchy: Basic hierarchy of the TextGrid
make_transcription : bool: Flag for whether to add a transcription property to words based on phones they contain
stop_check : callable: Function to check for whether parsing should stop
call_back : callable: Function to report progress in parsing

load_textgrid(path)[source]¶

Load a TextGrid file. Additionally ignore duplicated tier names as they can sometimes be exported erroneously from LaBB-CAT.

Parameters:	path : str Path to the TextGrid file
Returns:	`TextGrid` TextGrid object

Speaker parsers¶

Filename Speaker Parser¶

class polyglotdb.io.parsers.speaker.FilenameSpeakerParser(number_of_characters, left_orientation=True)[source]¶

Class for parsing a speaker name from a path that gets a specified number of characters from either the left or the right of the base file name.

Parameters:	number_of_characters : int Number of characters to include in the speaker designation, set to 0 to get the full file name left_orientation : bool Whether to pull characters from the left or right of the base file name, defaults to True

parse_path(path)[source]¶

Parses a file path and returns a speaker name

Parameters:	path : str File path
Returns:	str Substring of path that is the speaker name

Directory Speaker Parser¶

class polyglotdb.io.parsers.speaker.DirectorySpeakerParser[source]¶

Class for parsing a speaker name from a path that gets the directory immediately containing the file and uses its name as the speaker name

parse_path(path)[source]¶

Parses a file path and returns a speaker name

Parameters:	path : str File path
Returns:	str Directory that is the name of the speaker