Matrix utilities

Helpers for working with (sparse) 2d matrices

Sparse matrix helpers

get_dense_row(matrix, row[, dtype]) Extract row from the sparse matrix
sparse_matrix_to_dense(sparse_matrix) Convert sparse_matrix to a dense numpy array
sparse_matrix_to_list(sparse_matrix) Convert sparse_matrix to a list of “sparse row vectors”.
write_sparse_matrix(target, a, compress, …) Write a to the file target in matrix market format

Matrix operation helpers

col_op(m, op) Apply op to each column in the matrix.
col_sum(m) Calculate the sum of each column in the matrix.
col_sum_mean(m, return_var) Calculate the mean of the sum of each column in the matrix.
normalize_columns(matrix) Normalize the columns of the given (dense) matrix
row_op(m, op) Apply op to each row in the matrix.
row_sum(m) Calculate the sum of each row in the matrix.
row_sum_mean(m, var) Calculate the mean of the sum of each row in the matrix.
normalize_rows(matrix) Normalize the rows of the given (dense) matrix

Other helpers

matrix_multiply(m1, m2, m3) Multiply the three matrices
permute_matrix(m, is_flat, shape) Randomly permute the entries of the matrix.

Definitions

Helpers for working with (sparse) 2d matrices

pyllars.matrix_utils.col_op(m, op)[source]

Apply op to each column in the matrix.

pyllars.matrix_utils.col_sum(m)[source]

Calculate the sum of each column in the matrix.

pyllars.matrix_utils.col_sum_mean(m: numpy.ndarray, return_var: bool = False) → float[source]

Calculate the mean of the sum of each column in the matrix.

Optionally, the variances of the column sums can also be calculated.

Parameters:
  • m (numpy.ndarray) – The (2d) matrix
  • var (bool) – Whether to calculate the variances
Returns:

  • mean (float) – The mean of the column sums in the matrix
  • variance (float) – If return_var is True, then the variance of the column sums

pyllars.matrix_utils.get_dense_row(matrix: scipy.sparse.base.spmatrix, row: int, dtype=<class 'float'>, max_length: Optional[int] = None) → numpy.ndarray[source]

Extract row from the sparse matrix

Parameters:
  • matrix (scipy.sparse.spmatrix) – The sparse matrix
  • row (int) – The 0-based row index
  • dtype (type) – The base type of elements of matrix. This is used for the corner case where matrix is essentially a sparse column vector.
  • max_length (typing.Optional[int]) – The maximum number of columns to include in the returned row.
Returns:

row – The specified row (as a 1d numpy array)

Return type:

numpy.ndarray

pyllars.matrix_utils.matrix_multiply(m1: numpy.ndarray, m2: numpy.ndarray, m3: numpy.ndarray) → numpy.ndarray[source]

Multiply the three matrices

This function performs the multiplications in an order such that the size of the intermediate matrix created by the first matrix multiplication is as small as possible.

Parameters:m{1,2,3} (numpy.ndarray) – The (2d) matrices
Returns:product_matrix – The product of the three input matrices
Return type:numpy.ndarray
pyllars.matrix_utils.normalize_columns(matrix: numpy.ndarray) → numpy.ndarray[source]

Normalize the columns of the given (dense) matrix

Parameters:m (numpy.ndarray) – The (2d) matrix
Returns:normalized_matrix – The matrix normalized such that all column sums are 1
Return type:numpy.ndarray
pyllars.matrix_utils.normalize_rows(matrix: numpy.ndarray) → numpy.ndarray[source]

Normalize the rows of the given (dense) matrix

Parameters:matrix (numpy.ndarray) – The (2d) matrix
Returns:normalized_matrix – The matrix normalized such that all row sums are 1
Return type:numpy.ndarray
pyllars.matrix_utils.permute_matrix(m: numpy.ndarray, is_flat: bool = False, shape: Optional[Tuple[int]] = None) → numpy.ndarray[source]

Randomly permute the entries of the matrix. The matrix is first flattened.

For reproducibility, the random seed of numpy should be set before calling this function.

Parameters:
  • m (numpy.ndarray) – The matrix (tensor, etc.)
  • is_flat (bool) – Whether the matrix values have already been flattened. If they have been, then the desired shape must be passed.
  • shape (typing.Optional[typing.Tuple]) – The shape of the output matrix, if m is already flattened
Returns:

permuted_m – A copy of m (with the same shape as m) with the values randomly permuted.

Return type:

numpy.ndarray

pyllars.matrix_utils.row_op(m, op)[source]

Apply op to each row in the matrix.

pyllars.matrix_utils.row_sum(m)[source]

Calculate the sum of each row in the matrix.

pyllars.matrix_utils.row_sum_mean(m: numpy.ndarray, var: bool = False) → float[source]

Calculate the mean of the sum of each row in the matrix.

Optionally, the variances of the row sums can also be calculated.

Parameters:
  • m (numpy.ndarray) – The (2d) matrix
  • return_var (bool) – Whether to calculate the variances
Returns:

  • mean (float) – The mean of the row sums in the matrix
  • variance (float) – If return_var is True, then the variance of the row sums

pyllars.matrix_utils.sparse_matrix_to_dense(sparse_matrix: scipy.sparse.base.spmatrix) → numpy.ndarray[source]

Convert sparse_matrix to a dense numpy array

Parameters:sparse_matrix (scipy.sparse.spmatrix) – The sparse scipy matrix
Returns:dense_matrix – The dense (2d) numpy array
Return type:numpy.ndarray
pyllars.matrix_utils.sparse_matrix_to_list(sparse_matrix: scipy.sparse.base.spmatrix) → List[source]

Convert sparse_matrix to a list of “sparse row vectors”.

In this context, a “sparse row vector” is simply a sparse matrix with dimensionality (1, sparse_matrix.shape[1]).

Parameters:sparse_matrix (scipy.sparse.spmatrix) – The sparse scipy matrix
Returns:list_of_sparse_row_vectors – The list of sparse row vectors
Return type:typing.List[scipy.sparse.spmatrix]
pyllars.matrix_utils.write_sparse_matrix(target: str, a: scipy.sparse.base.spmatrix, compress: bool = True, **kwargs) → None[source]

Write a to the file target in matrix market format

This function is a drop-in replacement for scipy.io.mmwrite. The only difference is that it gzip compresses the output by default. It does not alter the file extension, which should likely end in “mtx.gz” except in special circumstances.

If compress is True, then this function imports gzip.

Parameters:
  • target (str) – The complete path to the output file, including file extension
  • a (scipy.sparse.spmatrix) – The sparse matrix
  • compress (bool) – Whether to compress the output
  • **kwargs (<key>=<value> pairs) – These are passed through to scipy.io.mmwrite(). Please see the scipy documentation for more details.
Returns:

Return type:

None, but the matrix is written to disk