Conversation¶

class convokit.model.conversation.Conversation(owner=None, id: Union[str, NoneType] = None, utterances: Union[List[str], NoneType] = None, meta: Union[Dict[KT, VT], NoneType] = None, from_db=False, storage: Union[convokit.storage.storageManager.StorageManager, NoneType] = None)¶

Represents a discrete subset of utterances in the dataset, connected by a reply-to chain.

Parameters:	owner – The Corpus that this Conversation belongs to id – The unique ID of this Conversation utterances – A list of the IDs of the Utterances in this Conversation meta – Table of initial values for conversation-level metadata
Variables:	id – the ID of the Conversation meta – A dictionary-like view object providing read-write access to conversation-level metadata.

add_meta(key: str, value) → None¶: Adds a key-value pair to the metadata of the corpus object :param key: name of metadata attribute :param value: value of metadata attribute :return: None

add_vector(vector_name: str)¶: Logs in the Corpus component object’s internal vectors list that the component object has a vector row associated with it in the vector matrix named vector_name. Transformers that add vectors to the Corpus should use this to update the relevant component objects during the transform() step. :param vector_name: name of vector matrix :return: None

check_integrity(verbose: bool = True) → bool¶

Check the integrity of this Conversation; i.e. do the constituent utterances form a complete reply-to chain?

Parameters:	verbose – whether to print errors indicating the problems with the Conversation
Returns:	True if the conversation structure is complete else False

delete_vector(vector_name: str)¶: Delete a vector associated with this Corpus component object. :param vector_name: :return: None

classmethod from_dbdoc(doc: convokit.storage.dbMappings.DBDocumentMapping)¶

Initialize a corpusComponent object with data contained in the DB document represented by doc.

Parameters:	cls – class to initialize: Utterance, Conversation, or Speaker doc – DB document to initialize the corpusComponent from
Returns:	the initialized corpusComponent object

get_chronological_speaker_list(selector: Callable[[convokit.model.speaker.Speaker], bool] = <function Conversation.<lambda>>)¶

Get the speakers in the conversation sorted in chronological order (speakers may appear more than once)

Parameters:	selector – (lambda) function for which speakers should be included; all speakers are included by default
Returns:	list of speakers for each chronological utterance

get_chronological_utterance_list(selector: Callable[[convokit.model.utterance.Utterance], bool] = <function Conversation.<lambda>>)¶

Get the utterances in the conversation sorted in increasing order of timestamp

Parameters:	selector – function for which utterances should be included; all utterances are included by default
Returns:	list of utterances, sorted by timestamp

get_info(key)¶: Gets attribute <key> of the corpus object. Returns None if the corpus object does not have this attribute. :param key: name of attribute :return: attribute <key>

get_longest_paths() → List[List[convokit.model.utterance.Utterance]]¶

Finds the Utterances form the longest path (i.e. root to leaf) in the Conversation tree. If there are multiple paths with tied lengths, returns all of them as a list of lists. If only one such path exists, a list containing a single list of Utterances is returned.

Returns:	a list of lists of Utterances

get_root_to_leaf_paths() → List[List[convokit.model.utterance.Utterance]]¶

Get the paths (stored as a list of lists of utterances) from the root to each of the leaves in the conversational tree

Returns:	List of lists of Utterances

get_speaker(speaker_id: str) → convokit.model.speaker.Speaker¶

Looks up the Speaker with the given name. Raises a KeyError if no speaker with that name exists.

Returns:	the Speaker with the given speaker_id

get_speaker_ids() → List[str]¶

Produces a list of ids of all speakers in the Conversation, which can be used in calls to get_speaker() to retrieve specific speakers. Provides no ordering guarantees for the list.

Returns:	a list of speaker ids

get_speakers_dataframe(selector: Union[Callable[[convokit.model.speaker.Speaker], bool], NoneType] = <function Conversation.<lambda>>, exclude_meta: bool = False)¶

Get a DataFrame of the Speakers that have participated in the Conversation with fields and metadata attributes, with an optional selector that filters Speakers that should be included. Edits to the DataFrame do not change the corpus in any way.

param exclude_meta:

whether to exclude metadata

param selector: selector: a (lambda) function that takes a Speaker and returns True or False (i.e. include / exclude). By default, the selector includes all Speakers in the Conversation.

return: a pandas DataFrame

get_subtree(root_utt_id)¶

Get the utterance node of the specified input id

Parameters:	root_utt_id – id of the root node that the subtree starts from
Returns:	UtteranceNode object

get_usernames() → List[str]¶

Produces a list of names of all speakers in the Conversation, which can be used in calls to get_speaker() to retrieve specific speakers. Provides no ordering guarantees for the list.

Returns:	a list of usernames

get_utterance(ut_id: str) → convokit.model.utterance.Utterance¶

Looks up the Utterance associated with the given ID. Raises a KeyError if no utterance by that ID exists.

Returns:	the Utterance with the given ID

get_utterance_ids() → List[str]¶

Produces a list of the unique IDs of all utterances in the Conversation, which can be used in calls to get_utterance() to retrieve specific utterances. Provides no ordering guarantees for the list.

Returns:	a list of IDs of Utterances in the Conversation

get_utterances_dataframe(selector=<function Conversation.<lambda>>, exclude_meta: bool = False)¶

Get a DataFrame of the Utterances in the COnversation with fields and metadata attributes. Set an optional selector that filters Utterances that should be included. Edits to the DataFrame do not change the corpus in any way.

Parameters:	exclude_meta – whether to exclude metadata selector – a (lambda) function that takes a Utterance and returns True or False (i.e. include / exclude). By default, the selector includes all Utterances in the Conversation.
Returns:	a pandas DataFrame

get_vector(vector_name: str, as_dataframe: bool = False, columns: Union[List[str], NoneType] = None)¶

Get the vector stored as vector_name for this object. :param vector_name: name of vector :param as_dataframe: whether to return the vector as a dataframe (True) or in its raw array form (False). False

by default.

Parameters:	columns – optional list of named columns of the vector to include. All columns returned otherwise. This parameter is only used if as_dataframe is set to True
Returns:	a numpy / scipy array

iter_speakers(selector: Callable[[convokit.model.speaker.Speaker], bool] = <function Conversation.<lambda>>) → Generator[convokit.model.speaker.Speaker, NoneType, NoneType]¶

Get Speakers that have participated in the Conversation, with an optional selector that filters for Speakers that should be included.

param selector: a (lambda) function that takes a Speaker and returns True or False (i.e. include / exclude). By default, the selector includes all Speakers in the Conversation.

return: a generator of Speakers

iter_utterances(selector: Callable[[convokit.model.utterance.Utterance], bool] = <function Conversation.<lambda>>) → Generator[convokit.model.utterance.Utterance, NoneType, NoneType]¶

Get utterances in the Corpus, with an optional selector that filters for Utterances that should be included.

Parameters:

selector –

a (lambda) function that takes an Utterance and returns True or False (i.e. include / exclude).: By default, the selector includes all Utterances in the Conversation.

return:	a generator of Utterances

print_conversation_stats()¶

Helper function for printing the number of Utterances and Spekaers in the Conversation.

Returns:	None (prints output)

print_conversation_structure(utt_info_func: Callable[[convokit.model.utterance.Utterance], str] = <function Conversation.<lambda>>, limit: int = None) → None¶

Prints an indented representation of utterances in the Conversation with conversation reply-to structure determining the indented level. The details of each utterance to be printed can be configured.

If limit is set to a value other than None, this will annotate utterances with an ‘order’ metadata indicating their temporal order in the conversation, where the first utterance in the conversation is annotated with 1.

Parameters:	utt_info_func – callable function taking an utterance as input and returning a string of the desired utterance information. By default, this is a lambda function returning the utterance’s speaker’s id limit – maximum number of utterances to print out. if k, this includes the first k utterances.
Returns:	None. Prints to stdout.

retrieve_meta(key: str)¶: Retrieves a value stored under the key of the metadata of corpus object :param key: name of metadata attribute :return: value

set_info(key, value)¶: Sets attribute <key> of the corpus object to <value>. :param key: name of attribute :param value: value to set :return: None

traverse(traversal_type: str, as_utterance: bool = True)¶

Traverse through the Conversation tree structure in a breadth-first search (‘bfs’), depth-first search (dfs), pre-order (‘preorder’), or post-order (‘postorder’) way.

Parameters:	traversal_type – dfs, bfs, preorder, or postorder as_utterance – whether the iterator should yield the utterance (True) or the utterance node (False)
Returns:	an iterator of the utterances or utterance nodes