Parser Classes

Base parser

class polyglotdb.io.parsers.base.BaseParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]

Base parser, extend this class for new parsers.

Parameters:
annotation_tiers: list

Annotation types of the files to parse

hierarchyHierarchy

Details of how linguistic types relate to one another

make_transcriptionbool, defaults to True

If true, create a word attribute for transcription based on segments that are contained by the word

stop_checkcallable, optional

Function to check whether to halt parsing

call_backcallable, optional

Function to output progress messages

TextGrid parser

class polyglotdb.io.parsers.textgrid.TextgridParser(annotation_tiers, hierarchy, make_transcription=True, make_label=False, stop_check=None, call_back=None)[source]

Parser for Praat TextGrid files.

Parameters:
annotation_tiers: list

Annotation types of the files to parse

hierarchyHierarchy

Details of how linguistic types relate to one another

make_transcriptionbool, defaults to True

If true, create a word attribute for transcription based on segments that are contained by the word

stop_checkcallable, optional

Function to check whether to halt parsing

call_backcallable, optional

Function to output progress messages

Forced alignment output parser

class polyglotdb.io.parsers.aligner.AlignerParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Base class for parsing TextGrid output from forced aligners.

Parameters:
annotation_tierslist

List of the annotation tiers to store data from the TextGrid

hierarchyHierarchy

Basic hierarchy of the TextGrid

make_transcriptionbool

Flag for whether to add a transcription property to words based on phones they contain

stop_checkcallable

Function to check for whether parsing should stop

call_backcallable

Function to report progress in parsing

Attributes:
word_labelstr

Label identifying word tiers

phone_labelstr

Label identifying phone tiers

namestr

Name of the aligner the TextGrids are from

speaker_firstbool

Whether speaker names precede tier types in the TextGrid when multiple speakers are present

load_textgrid(path)

Load a TextGrid file

Parameters:
pathstr

Path to the TextGrid file

Returns:
TextGrid

TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:
filenamestr

the filename of the file being checked

Returns:
boolean

True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)[source]

Parse a forced aligned TextGrid file for later importing.

Parameters:
pathstr

Path to TextGrid file

types_onlybool

Flag for whether to only save type information, ignoring the token information

Returns:
DiscourseData

Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:
pathstr

a path to the corpus

corpus_namestr

name of the corpus

Returns:
data.typeslist

a list of data types

MFA

class polyglotdb.io.parsers.mfa.MfaParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids generated by the Montreal Forced Aligner.

load_textgrid(path)

Load a TextGrid file

Parameters:
pathstr

Path to the TextGrid file

Returns:
TextGrid

TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:
filenamestr

the filename of the file being checked

Returns:
boolean

True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)

Parse a forced aligned TextGrid file for later importing.

Parameters:
pathstr

Path to TextGrid file

types_onlybool

Flag for whether to only save type information, ignoring the token information

Returns:
DiscourseData

Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:
pathstr

a path to the corpus

corpus_namestr

name of the corpus

Returns:
data.typeslist

a list of data types

FAVE

class polyglotdb.io.parsers.fave.FaveParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids generated by the FAVE-align.

load_textgrid(path)

Load a TextGrid file

Parameters:
pathstr

Path to the TextGrid file

Returns:
TextGrid

TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:
filenamestr

the filename of the file being checked

Returns:
boolean

True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)

Parse a forced aligned TextGrid file for later importing.

Parameters:
pathstr

Path to TextGrid file

types_onlybool

Flag for whether to only save type information, ignoring the token information

Returns:
DiscourseData

Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:
pathstr

a path to the corpus

corpus_namestr

name of the corpus

Returns:
data.typeslist

a list of data types

MAUS

class polyglotdb.io.parsers.maus.MausParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids generated by the Web-MAUS aligner.

load_textgrid(path)

Load a TextGrid file

Parameters:
pathstr

Path to the TextGrid file

Returns:
TextGrid

TextGrid object

match_extension(filename)

Ensures that filename ends with acceptable extension

Parameters:
filenamestr

the filename of the file being checked

Returns:
boolean

True if filename is acceptable extension, false otherwise

parse_discourse(path, types_only=False)

Parse a forced aligned TextGrid file for later importing.

Parameters:
pathstr

Path to TextGrid file

types_onlybool

Flag for whether to only save type information, ignoring the token information

Returns:
DiscourseData

Parsed data from the file

parse_information(path, corpus_name)

Parses types out of a corpus

Parameters:
pathstr

a path to the corpus

corpus_namestr

name of the corpus

Returns:
data.typeslist

a list of data types

TIMIT parser

class polyglotdb.io.parsers.timit.TimitParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]

Parser for the TIMIT corpus.

Has annotation types for word labels and surface transcription labels.

Parameters:
annotation_tiers: list

Annotation types of the files to parse

hierarchyHierarchy

Details of how linguistic types relate to one another

stop_checkcallable, optional

Function to check whether to halt parsing

call_backcallable, optional

Function to output progress messages

Buckeye parser

class polyglotdb.io.parsers.buckeye.BuckeyeParser(annotation_tiers, hierarchy, stop_check=None, call_back=None)[source]

Parser for the Buckeye corpus.

Has annotation types for word labels, word transcription, word part of speech, and surface transcription labels.

Parameters:
annotation_tiers: list

Annotation types of the files to parse

hierarchyHierarchy

Details of how linguistic types relate to one another

stop_checkcallable, optional

Function to check whether to halt parsing

call_backcallable, optional

Function to output progress messages

LaBB-CAT parser

class polyglotdb.io.parsers.labbcat.LabbCatParser(annotation_tiers, hierarchy, make_transcription=True, stop_check=None, call_back=None)[source]

Parser for TextGrids exported from LaBB-CAT

Parameters:
annotation_tierslist

List of the annotation tiers to store data from the TextGrid

hierarchyHierarchy

Basic hierarchy of the TextGrid

make_transcriptionbool

Flag for whether to add a transcription property to words based on phones they contain

stop_checkcallable

Function to check for whether parsing should stop

call_backcallable

Function to report progress in parsing

Speaker parsers

Filename Speaker Parser

class polyglotdb.io.parsers.speaker.FilenameSpeakerParser(number_of_characters, left_orientation=True)[source]

Class for parsing a speaker name from a path that gets a specified number of characters from either the left or the right of the base file name.

Parameters:
number_of_charactersint

Number of characters to include in the speaker designation, set to 0 to get the full file name

left_orientationbool

Whether to pull characters from the left or right of the base file name, defaults to True

Directory Speaker Parser

class polyglotdb.io.parsers.speaker.DirectorySpeakerParser[source]

Class for parsing a speaker name from a path that gets the directory immediately containing the file and uses its name as the speaker name