Parser Classes¶
Base parser¶
-
class
polyglotdb.io.parsers.base.
BaseParser
(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]¶ Base parser, extend this class for new parsers.
Parameters: - annotation_tiers: list
Annotation types of the files to parse
- hierarchy :
Hierarchy
Details of how linguistic types relate to one another
- make_transcription : bool, defaults to True
If true, create a word attribute for transcription based on segments that are contained by the word
- stop_check : callable, optional
Function to check whether to halt parsing
- call_back : callable, optional
Function to output progress messages
-
match_extension
(filename)[source]¶ Ensures that filename ends with acceptable extension
Parameters: - filename : str
the filename of the file being checked
Returns: - boolean
True if filename is acceptable extension, false otherwise
-
parse_discourse
(name, types_only=False)[source]¶ Parse annotations for later importing.
Parameters: - name : str
Name of the discourse
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data
TextGrid parser¶
-
class
polyglotdb.io.parsers.textgrid.
TextgridParser
(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]¶ Parser for Praat TextGrid files.
Parameters: - annotation_tiers: list
Annotation types of the files to parse
- hierarchy :
Hierarchy
Details of how linguistic types relate to one another
- make_transcription : bool, defaults to True
If true, create a word attribute for transcription based on segments that are contained by the word
- stop_check : callable, optional
Function to check whether to halt parsing
- call_back : callable, optional
Function to output progress messages
-
load_textgrid
(path)[source]¶ Load a TextGrid file
Parameters: - path : str
Path to the TextGrid file
Returns: TextGrid
TextGrid object
-
parse_discourse
(path, types_only=False)[source]¶ Parse a TextGrid file for later importing.
Parameters: - path : str
Path to TextGrid file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data from the file
Forced alignment output parser¶
-
class
polyglotdb.io.parsers.aligner.
AlignerParser
(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶ Base class for parsing TextGrid output from forced aligners.
Parameters: - annotation_tiers : list
List of the annotation tiers to store data from the TextGrid
- hierarchy : Hierarchy
Basic hierarchy of the TextGrid
- make_transcription : bool
Flag for whether to add a transcription property to words based on phones they contain
- stop_check : callable
Function to check for whether parsing should stop
- call_back : callable
Function to report progress in parsing
Attributes: - word_label : str
Label identifying word tiers
- phone_label : str
Label identifying phone tiers
- name : str
Name of the aligner the TextGrids are from
- speaker_first : bool
Whether speaker names precede tier types in the TextGrid when multiple speakers are present
-
load_textgrid
(path)¶ Load a TextGrid file
Parameters: - path : str
Path to the TextGrid file
Returns: TextGrid
TextGrid object
-
match_extension
(filename)¶ Ensures that filename ends with acceptable extension
Parameters: - filename : str
the filename of the file being checked
Returns: - boolean
True if filename is acceptable extension, false otherwise
-
parse_discourse
(path, types_only=False)[source]¶ Parse a forced aligned TextGrid file for later importing.
Parameters: - path : str
Path to TextGrid file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data from the file
-
parse_information
(path, corpus_name)¶ Parses types out of a corpus
Parameters: - path : str
a path to the corpus
- corpus_name : str
name of the corpus
Returns: - data.types : list
a list of data types
MFA¶
-
class
polyglotdb.io.parsers.mfa.
MfaParser
(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶ Parser for TextGrids generated by the Montreal Forced Aligner.
-
load_textgrid
(path)¶ Load a TextGrid file
Parameters: - path : str
Path to the TextGrid file
Returns: TextGrid
TextGrid object
-
match_extension
(filename)¶ Ensures that filename ends with acceptable extension
Parameters: - filename : str
the filename of the file being checked
Returns: - boolean
True if filename is acceptable extension, false otherwise
-
parse_discourse
(path, types_only=False)¶ Parse a forced aligned TextGrid file for later importing.
Parameters: - path : str
Path to TextGrid file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data from the file
-
parse_information
(path, corpus_name)¶ Parses types out of a corpus
Parameters: - path : str
a path to the corpus
- corpus_name : str
name of the corpus
Returns: - data.types : list
a list of data types
-
FAVE¶
-
class
polyglotdb.io.parsers.fave.
FaveParser
(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶ Parser for TextGrids generated by the FAVE-align.
-
load_textgrid
(path)¶ Load a TextGrid file
Parameters: - path : str
Path to the TextGrid file
Returns: TextGrid
TextGrid object
-
match_extension
(filename)¶ Ensures that filename ends with acceptable extension
Parameters: - filename : str
the filename of the file being checked
Returns: - boolean
True if filename is acceptable extension, false otherwise
-
parse_discourse
(path, types_only=False)¶ Parse a forced aligned TextGrid file for later importing.
Parameters: - path : str
Path to TextGrid file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data from the file
-
parse_information
(path, corpus_name)¶ Parses types out of a corpus
Parameters: - path : str
a path to the corpus
- corpus_name : str
name of the corpus
Returns: - data.types : list
a list of data types
-
MAUS¶
-
class
polyglotdb.io.parsers.maus.
MausParser
(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶ Parser for TextGrids generated by the Web-MAUS aligner.
-
load_textgrid
(path)¶ Load a TextGrid file
Parameters: - path : str
Path to the TextGrid file
Returns: TextGrid
TextGrid object
-
match_extension
(filename)¶ Ensures that filename ends with acceptable extension
Parameters: - filename : str
the filename of the file being checked
Returns: - boolean
True if filename is acceptable extension, false otherwise
-
parse_discourse
(path, types_only=False)¶ Parse a forced aligned TextGrid file for later importing.
Parameters: - path : str
Path to TextGrid file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data from the file
-
parse_information
(path, corpus_name)¶ Parses types out of a corpus
Parameters: - path : str
a path to the corpus
- corpus_name : str
name of the corpus
Returns: - data.types : list
a list of data types
-
TIMIT parser¶
-
class
polyglotdb.io.parsers.timit.
TimitParser
(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]¶ Parser for the TIMIT corpus.
Has annotation types for word labels and surface transcription labels.
Parameters: - annotation_tiers: list
Annotation types of the files to parse
- hierarchy :
Hierarchy
Details of how linguistic types relate to one another
- stop_check : callable, optional
Function to check whether to halt parsing
- call_back : callable, optional
Function to output progress messages
-
parse_discourse
(word_path, types_only=False)[source]¶ Parse a TIMIT file for later importing.
Parameters: - word_path : str
Path to TIMIT .wrd file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data from the file
Buckeye parser¶
-
class
polyglotdb.io.parsers.buckeye.
BuckeyeParser
(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]¶ Parser for the Buckeye corpus.
Has annotation types for word labels, word transcription, word part of speech, and surface transcription labels.
Parameters: - annotation_tiers: list
Annotation types of the files to parse
- hierarchy :
Hierarchy
Details of how linguistic types relate to one another
- stop_check : callable, optional
Function to check whether to halt parsing
- call_back : callable, optional
Function to output progress messages
-
parse_discourse
(word_path, types_only=False)[source]¶ Parse a Buckeye file for later importing.
Parameters: - word_path : str
Path to Buckeye .words file
- types_only : bool
Flag for whether to only save type information, ignoring the token information
Returns: DiscourseData
Parsed data
LaBB-CAT parser¶
-
class
polyglotdb.io.parsers.labbcat.
LabbCatParser
(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]¶ Parser for TextGrids exported from LaBB-CAT
Parameters: - annotation_tiers : list
List of the annotation tiers to store data from the TextGrid
- hierarchy : Hierarchy
Basic hierarchy of the TextGrid
- make_transcription : bool
Flag for whether to add a transcription property to words based on phones they contain
- stop_check : callable
Function to check for whether parsing should stop
- call_back : callable
Function to report progress in parsing
Speaker parsers¶
Filename Speaker Parser¶
-
class
polyglotdb.io.parsers.speaker.
FilenameSpeakerParser
(number_of_characters, left_orientation=True)[source]¶ Class for parsing a speaker name from a path that gets a specified number of characters from either the left or the right of the base file name.
Parameters: - number_of_characters : int
Number of characters to include in the speaker designation, set to 0 to get the full file name
- left_orientation : bool
Whether to pull characters from the left or right of the base file name, defaults to True