Parser Classes¶
Base parser¶
- class polyglotdb.io.parsers.base.BaseParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]¶
Base parser, extend this class for new parsers.
- Parameters
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- make_transcriptionbool, defaults to True
If true, create a word attribute for transcription based on segments that are contained by the word
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
TextGrid parser¶
- class polyglotdb.io.parsers.textgrid.TextgridParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]¶
Parser for Praat TextGrid files.
- Parameters
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- make_transcriptionbool, defaults to True
If true, create a word attribute for transcription based on segments that are contained by the word
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
Forced alignment output parser¶
- class polyglotdb.io.parsers.aligner.AlignerParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶
Base class for parsing TextGrid output from forced aligners.
- Parameters
- annotation_tierslist
List of the annotation tiers to store data from the TextGrid
- hierarchyHierarchy
Basic hierarchy of the TextGrid
- make_transcriptionbool
Flag for whether to add a transcription property to words based on phones they contain
- stop_checkcallable
Function to check for whether parsing should stop
- call_backcallable
Function to report progress in parsing
- Attributes
- word_labelstr
Label identifying word tiers
- phone_labelstr
Label identifying phone tiers
- namestr
Name of the aligner the TextGrids are from
- speaker_firstbool
Whether speaker names precede tier types in the TextGrid when multiple speakers are present
- load_textgrid(path)¶
Load a TextGrid file
- Parameters
- pathstr
Path to the TextGrid file
- Returns
TextGrid
TextGrid object
- match_extension(filename)¶
Ensures that filename ends with acceptable extension
- Parameters
- filenamestr
the filename of the file being checked
- Returns
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)[source]¶
Parse a forced aligned TextGrid file for later importing.
- Parameters
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)¶
Parses types out of a corpus
- Parameters
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns
- data.typeslist
a list of data types
MFA¶
- class polyglotdb.io.parsers.mfa.MfaParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶
Parser for TextGrids generated by the Montreal Forced Aligner.
- load_textgrid(path)¶
Load a TextGrid file
- Parameters
- pathstr
Path to the TextGrid file
- Returns
TextGrid
TextGrid object
- match_extension(filename)¶
Ensures that filename ends with acceptable extension
- Parameters
- filenamestr
the filename of the file being checked
- Returns
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)¶
Parse a forced aligned TextGrid file for later importing.
- Parameters
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)¶
Parses types out of a corpus
- Parameters
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns
- data.typeslist
a list of data types
FAVE¶
- class polyglotdb.io.parsers.fave.FaveParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶
Parser for TextGrids generated by the FAVE-align.
- load_textgrid(path)¶
Load a TextGrid file
- Parameters
- pathstr
Path to the TextGrid file
- Returns
TextGrid
TextGrid object
- match_extension(filename)¶
Ensures that filename ends with acceptable extension
- Parameters
- filenamestr
the filename of the file being checked
- Returns
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)¶
Parse a forced aligned TextGrid file for later importing.
- Parameters
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)¶
Parses types out of a corpus
- Parameters
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns
- data.typeslist
a list of data types
MAUS¶
- class polyglotdb.io.parsers.maus.MausParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶
Parser for TextGrids generated by the Web-MAUS aligner.
- load_textgrid(path)¶
Load a TextGrid file
- Parameters
- pathstr
Path to the TextGrid file
- Returns
TextGrid
TextGrid object
- match_extension(filename)¶
Ensures that filename ends with acceptable extension
- Parameters
- filenamestr
the filename of the file being checked
- Returns
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)¶
Parse a forced aligned TextGrid file for later importing.
- Parameters
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)¶
Parses types out of a corpus
- Parameters
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns
- data.typeslist
a list of data types
TIMIT parser¶
- class polyglotdb.io.parsers.timit.TimitParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]¶
Parser for the TIMIT corpus.
Has annotation types for word labels and surface transcription labels.
- Parameters
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
Buckeye parser¶
- class polyglotdb.io.parsers.buckeye.BuckeyeParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]¶
Parser for the Buckeye corpus.
Has annotation types for word labels, word transcription, word part of speech, and surface transcription labels.
- Parameters
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
LaBB-CAT parser¶
- class polyglotdb.io.parsers.labbcat.LabbCatParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶
Parser for TextGrids exported from LaBB-CAT
- Parameters
- annotation_tierslist
List of the annotation tiers to store data from the TextGrid
- hierarchyHierarchy
Basic hierarchy of the TextGrid
- make_transcriptionbool
Flag for whether to add a transcription property to words based on phones they contain
- stop_checkcallable
Function to check for whether parsing should stop
- call_backcallable
Function to report progress in parsing
Speaker parsers¶
Filename Speaker Parser¶
- class polyglotdb.io.parsers.speaker.FilenameSpeakerParser(number_of_characters, left_orientation=True)[source]¶
Class for parsing a speaker name from a path that gets a specified number of characters from either the left or the right of the base file name.
- Parameters
- number_of_charactersint
Number of characters to include in the speaker designation, set to 0 to get the full file name
- left_orientationbool
Whether to pull characters from the left or right of the base file name, defaults to True