ConvoKitMatrix

class convokit.model.convoKitMatrix.ConvoKitMatrix(name, matrix, ids: Union[List[str], NoneType] = None, columns: Union[List[str], NoneType] = None)

A ConvoKitMatrix stores the vector representations of some set of Corpus components (i.e. Utterances, Conversations, Speakers).

Parameters:
  • name – descriptive name for the matrix
  • matrix – numpy or scipy array matrix
  • ids – optional list of Corpus component object ids, where each id corresponds to each row of the matrix
  • columns – optional list of names for the columns of the matrix
Variables:
  • name – name of the matrix
  • matrix – the matrix data
  • ids – ids corresponding to rows
  • columns – names corresponding to columns
  • ids_to_idx – a mapping from id to the row index
  • cols_to_idx – a mapping from column name to the column index
dump(dirpath)

Dumps the ConvoKitMatrix as a pickle file.

Parameters:dirpath – directory path to Corpus
Returns:None
static from_dir(dirpath, matrix_name)

Initialize a ConvoKitMatrix of the specified matrix_name from a specified directory dirpath.

Parameters:
  • dirpath – path to Corpus directory
  • matrix_name – name of vector matrix
Returns:

the initialized ConvoKitMatrix

static from_file(filepath)

Initialize a ConvoKitMatrix from a file of form “vector.[name].p”.

Parameters:filepath
Returns:
get_vectors(ids: Union[List[str], NoneType] = None, columns: Union[List[str], NoneType] = None, as_dataframe: bool = False)
Parameters:
  • ids – optional list of object ids to get vectors for; all by default
  • columns – optional list of named columns of the vector to include; all by default
  • as_dataframe – whether to return the vector as a dataframe (True) or in its raw array form (False). False by default.
Returns:

a vector matrix (either np.ndarray or csr_matrix) or a pandas dataframe

static hstack()

Combines multiple ConvoKitMatrices into a single ConvoKitMatrix by stacking them horizontally (i.e. each constituent matrix must have the same ids).

Parameters:
  • name – name of new matrix
  • matrices – constituent ConvoKiMatrices
Returns:

a new ConvoKitMatrix

subset(ids: Union[List[str], NoneType] = None, columns: Union[List[str], NoneType] = None)

Get a (subset) copy of the ConvoKitMatrix object according to specified subset of ids and columns :param ids: list of ids to be included in the subset; all by default :param columns: list of columns to be included in the subset; all by default :return: a new ConvoKitMatrix object with the subset of

to_dataframe() → pandas.core.frame.DataFrame

Converts the matrix of vectors into a pandas DataFrame.

Returns:a pandas DataFrame
static vstack()

Combines multiple ConvoKitMatrices into a single ConvoKitMatrix by stacking them horizontally (i.e. each constituent matrix must have the same columns).

Parameters:
  • name – name of new matrix
  • matrices – constituent ConvoKiMatrices
Returns:

a new ConvoKitMatrix