Parser Classes
Base parser
- class polyglotdb.io.parsers.base.BaseParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]
Base parser, extend this class for new parsers.
- Parameters:
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- make_transcriptionbool, defaults to True
If true, create a word attribute for transcription based on segments that are contained by the word
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
TextGrid parser
- class polyglotdb.io.parsers.textgrid.TextgridParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]
Parser for Praat TextGrid files.
- Parameters:
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- make_transcriptionbool, defaults to True
If true, create a word attribute for transcription based on segments that are contained by the word
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
Forced alignment output parser
- class polyglotdb.io.parsers.aligner.AlignerParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]
Base class for parsing TextGrid output from forced aligners.
- Parameters:
- annotation_tierslist
List of the annotation tiers to store data from the TextGrid
- hierarchyHierarchy
Basic hierarchy of the TextGrid
- make_transcriptionbool
Flag for whether to add a transcription property to words based on phones they contain
- stop_checkcallable
Function to check for whether parsing should stop
- call_backcallable
Function to report progress in parsing
- Attributes:
- word_labelstr
Label identifying word tiers
- phone_labelstr
Label identifying phone tiers
- namestr
Name of the aligner the TextGrids are from
- speaker_firstbool
Whether speaker names precede tier types in the TextGrid when multiple speakers are present
- load_textgrid(path)
Load a TextGrid file
- Parameters:
- pathstr
Path to the TextGrid file
- Returns:
TextGrid
TextGrid object
- match_extension(filename)
Ensures that filename ends with acceptable extension
- Parameters:
- filenamestr
the filename of the file being checked
- Returns:
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)[source]
Parse a forced aligned TextGrid file for later importing.
- Parameters:
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns:
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)
Parses types out of a corpus
- Parameters:
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns:
- data.typeslist
a list of data types
MFA
- class polyglotdb.io.parsers.mfa.MfaParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]
Parser for TextGrids generated by the Montreal Forced Aligner.
- load_textgrid(path)
Load a TextGrid file
- Parameters:
- pathstr
Path to the TextGrid file
- Returns:
TextGrid
TextGrid object
- match_extension(filename)
Ensures that filename ends with acceptable extension
- Parameters:
- filenamestr
the filename of the file being checked
- Returns:
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)
Parse a forced aligned TextGrid file for later importing.
- Parameters:
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns:
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)
Parses types out of a corpus
- Parameters:
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns:
- data.typeslist
a list of data types
FAVE
- class polyglotdb.io.parsers.fave.FaveParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]
Parser for TextGrids generated by the FAVE-align.
- load_textgrid(path)
Load a TextGrid file
- Parameters:
- pathstr
Path to the TextGrid file
- Returns:
TextGrid
TextGrid object
- match_extension(filename)
Ensures that filename ends with acceptable extension
- Parameters:
- filenamestr
the filename of the file being checked
- Returns:
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)
Parse a forced aligned TextGrid file for later importing.
- Parameters:
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns:
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)
Parses types out of a corpus
- Parameters:
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns:
- data.typeslist
a list of data types
MAUS
- class polyglotdb.io.parsers.maus.MausParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]
Parser for TextGrids generated by the Web-MAUS aligner.
- load_textgrid(path)
Load a TextGrid file
- Parameters:
- pathstr
Path to the TextGrid file
- Returns:
TextGrid
TextGrid object
- match_extension(filename)
Ensures that filename ends with acceptable extension
- Parameters:
- filenamestr
the filename of the file being checked
- Returns:
- boolean
True if filename is acceptable extension, false otherwise
- parse_discourse(path, types_only=False)
Parse a forced aligned TextGrid file for later importing.
- Parameters:
- pathstr
Path to TextGrid file
- types_onlybool
Flag for whether to only save type information, ignoring the token information
- Returns:
DiscourseData
Parsed data from the file
- parse_information(path, corpus_name)
Parses types out of a corpus
- Parameters:
- pathstr
a path to the corpus
- corpus_namestr
name of the corpus
- Returns:
- data.typeslist
a list of data types
TIMIT parser
- class polyglotdb.io.parsers.timit.TimitParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]
Parser for the TIMIT corpus.
Has annotation types for word labels and surface transcription labels.
- Parameters:
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
Buckeye parser
- class polyglotdb.io.parsers.buckeye.BuckeyeParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]
Parser for the Buckeye corpus.
Has annotation types for word labels, word transcription, word part of speech, and surface transcription labels.
- Parameters:
- annotation_tiers: list
Annotation types of the files to parse
- hierarchy
Hierarchy
Details of how linguistic types relate to one another
- stop_checkcallable, optional
Function to check whether to halt parsing
- call_backcallable, optional
Function to output progress messages
LaBB-CAT parser
- class polyglotdb.io.parsers.labbcat.LabbCatParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]
Parser for TextGrids exported from LaBB-CAT
- Parameters:
- annotation_tierslist
List of the annotation tiers to store data from the TextGrid
- hierarchyHierarchy
Basic hierarchy of the TextGrid
- make_transcriptionbool
Flag for whether to add a transcription property to words based on phones they contain
- stop_checkcallable
Function to check for whether parsing should stop
- call_backcallable
Function to report progress in parsing
Speaker parsers
Filename Speaker Parser
- class polyglotdb.io.parsers.speaker.FilenameSpeakerParser(number_of_characters, left_orientation=True)[source]
Class for parsing a speaker name from a path that gets a specified number of characters from either the left or the right of the base file name.
- Parameters:
- number_of_charactersint
Number of characters to include in the speaker designation, set to 0 to get the full file name
- left_orientationbool
Whether to pull characters from the left or right of the base file name, defaults to True