ConvoKitMatrix¶
-
class
convokit.model.convoKitMatrix.ConvoKitMatrix(name, matrix, ids: Union[List[str], NoneType] = None, columns: Union[List[str], NoneType] = None)¶ A ConvoKitMatrix stores the vector representations of some set of Corpus components (i.e. Utterances, Conversations, Speakers).
Parameters: - name – descriptive name for the matrix
- matrix – numpy or scipy array matrix
- ids – optional list of Corpus component object ids, where each id corresponds to each row of the matrix
- columns – optional list of names for the columns of the matrix
Variables: - name – name of the matrix
- matrix – the matrix data
- ids – ids corresponding to rows
- columns – names corresponding to columns
- ids_to_idx – a mapping from id to the row index
- cols_to_idx – a mapping from column name to the column index
-
dump(dirpath)¶ Dumps the ConvoKitMatrix as a pickle file.
Parameters: dirpath – directory path to Corpus Returns: None
-
static
from_dir(dirpath, matrix_name)¶ Initialize a ConvoKitMatrix of the specified matrix_name from a specified directory dirpath.
Parameters: - dirpath – path to Corpus directory
- matrix_name – name of vector matrix
Returns: the initialized ConvoKitMatrix
-
static
from_file(filepath)¶ Initialize a ConvoKitMatrix from a file of form “vector.[name].p”.
Parameters: filepath – Returns:
-
get_vectors(ids: Union[List[str], NoneType] = None, columns: Union[List[str], NoneType] = None, as_dataframe: bool = False)¶ Parameters: - ids – optional list of object ids to get vectors for; all by default
- columns – optional list of named columns of the vector to include; all by default
- as_dataframe – whether to return the vector as a dataframe (True) or in its raw array form (False). False by default.
Returns: a vector matrix (either np.ndarray or csr_matrix) or a pandas dataframe
-
static
hstack()¶ Combines multiple ConvoKitMatrices into a single ConvoKitMatrix by stacking them horizontally (i.e. each constituent matrix must have the same ids).
Parameters: - name – name of new matrix
- matrices – constituent ConvoKiMatrices
Returns: a new ConvoKitMatrix
-
subset(ids: Union[List[str], NoneType] = None, columns: Union[List[str], NoneType] = None)¶ Get a (subset) copy of the ConvoKitMatrix object according to specified subset of ids and columns :param ids: list of ids to be included in the subset; all by default :param columns: list of columns to be included in the subset; all by default :return: a new ConvoKitMatrix object with the subset of
-
to_dataframe() → pandas.core.frame.DataFrame¶ Converts the matrix of vectors into a pandas DataFrame.
Returns: a pandas DataFrame
-
static
vstack()¶ Combines multiple ConvoKitMatrices into a single ConvoKitMatrix by stacking them horizontally (i.e. each constituent matrix must have the same columns).
Parameters: - name – name of new matrix
- matrices – constituent ConvoKiMatrices
Returns: a new ConvoKitMatrix