Matrix utilities¶
Helpers for working with (sparse) 2d matrices
Sparse matrix helpers¶
get_dense_row (matrix, row[, dtype]) |
Extract row from the sparse matrix |
sparse_matrix_to_dense (sparse_matrix) |
Convert sparse_matrix to a dense numpy array |
sparse_matrix_to_list (sparse_matrix) |
Convert sparse_matrix to a list of “sparse row vectors”. |
write_sparse_matrix (target, a, compress, …) |
Write a to the file target in matrix market format |
Matrix operation helpers¶
col_op (m, op) |
Apply op to each column in the matrix. |
col_sum (m) |
Calculate the sum of each column in the matrix. |
col_sum_mean (m, return_var) |
Calculate the mean of the sum of each column in the matrix. |
normalize_columns (matrix) |
Normalize the columns of the given (dense) matrix |
row_op (m, op) |
Apply op to each row in the matrix. |
row_sum (m) |
Calculate the sum of each row in the matrix. |
row_sum_mean (m, var) |
Calculate the mean of the sum of each row in the matrix. |
normalize_rows (matrix) |
Normalize the rows of the given (dense) matrix |
Other helpers¶
matrix_multiply (m1, m2, m3) |
Multiply the three matrices |
permute_matrix (m, is_flat, shape) |
Randomly permute the entries of the matrix. |
Definitions¶
Helpers for working with (sparse) 2d matrices
-
pyllars.matrix_utils.
col_sum_mean
(m: numpy.ndarray, return_var: bool = False) → float[source]¶ Calculate the mean of the sum of each column in the matrix.
Optionally, the variances of the column sums can also be calculated.
Parameters: - m (numpy.ndarray) – The (2d) matrix
- var (bool) – Whether to calculate the variances
Returns: - mean (float) – The mean of the column sums in the matrix
- variance (float) – If return_var is True, then the variance of the column sums
-
pyllars.matrix_utils.
get_dense_row
(matrix: scipy.sparse.base.spmatrix, row: int, dtype=<class 'float'>, max_length: Optional[int] = None) → numpy.ndarray[source]¶ Extract row from the sparse matrix
Parameters: - matrix (scipy.sparse.spmatrix) – The sparse matrix
- row (int) – The 0-based row index
- dtype (type) – The base type of elements of matrix. This is used for the corner case where matrix is essentially a sparse column vector.
- max_length (typing.Optional[int]) – The maximum number of columns to include in the returned row.
Returns: row – The specified row (as a 1d numpy array)
Return type:
-
pyllars.matrix_utils.
matrix_multiply
(m1: numpy.ndarray, m2: numpy.ndarray, m3: numpy.ndarray) → numpy.ndarray[source]¶ Multiply the three matrices
This function performs the multiplications in an order such that the size of the intermediate matrix created by the first matrix multiplication is as small as possible.
Parameters: m{1,2,3} (numpy.ndarray) – The (2d) matrices Returns: product_matrix – The product of the three input matrices Return type: numpy.ndarray
-
pyllars.matrix_utils.
normalize_columns
(matrix: numpy.ndarray) → numpy.ndarray[source]¶ Normalize the columns of the given (dense) matrix
Parameters: m (numpy.ndarray) – The (2d) matrix Returns: normalized_matrix – The matrix normalized such that all column sums are 1 Return type: numpy.ndarray
-
pyllars.matrix_utils.
normalize_rows
(matrix: numpy.ndarray) → numpy.ndarray[source]¶ Normalize the rows of the given (dense) matrix
Parameters: matrix (numpy.ndarray) – The (2d) matrix Returns: normalized_matrix – The matrix normalized such that all row sums are 1 Return type: numpy.ndarray
-
pyllars.matrix_utils.
permute_matrix
(m: numpy.ndarray, is_flat: bool = False, shape: Optional[Tuple[int]] = None) → numpy.ndarray[source]¶ Randomly permute the entries of the matrix. The matrix is first flattened.
For reproducibility, the random seed of numpy should be set before calling this function.
Parameters: - m (numpy.ndarray) – The matrix (tensor, etc.)
- is_flat (bool) – Whether the matrix values have already been flattened. If they have been, then the desired shape must be passed.
- shape (typing.Optional[typing.Tuple]) – The shape of the output matrix, if m is already flattened
Returns: permuted_m – A copy of m (with the same shape as m) with the values randomly permuted.
Return type:
-
pyllars.matrix_utils.
row_sum_mean
(m: numpy.ndarray, var: bool = False) → float[source]¶ Calculate the mean of the sum of each row in the matrix.
Optionally, the variances of the row sums can also be calculated.
Parameters: - m (numpy.ndarray) – The (2d) matrix
- return_var (bool) – Whether to calculate the variances
Returns: - mean (float) – The mean of the row sums in the matrix
- variance (float) – If return_var is True, then the variance of the row sums
-
pyllars.matrix_utils.
sparse_matrix_to_dense
(sparse_matrix: scipy.sparse.base.spmatrix) → numpy.ndarray[source]¶ Convert sparse_matrix to a dense numpy array
Parameters: sparse_matrix (scipy.sparse.spmatrix) – The sparse scipy matrix Returns: dense_matrix – The dense (2d) numpy array Return type: numpy.ndarray
-
pyllars.matrix_utils.
sparse_matrix_to_list
(sparse_matrix: scipy.sparse.base.spmatrix) → List[source]¶ Convert sparse_matrix to a list of “sparse row vectors”.
In this context, a “sparse row vector” is simply a sparse matrix with dimensionality (1, sparse_matrix.shape[1]).
Parameters: sparse_matrix (scipy.sparse.spmatrix) – The sparse scipy matrix Returns: list_of_sparse_row_vectors – The list of sparse row vectors Return type: typing.List[scipy.sparse.spmatrix]
-
pyllars.matrix_utils.
write_sparse_matrix
(target: str, a: scipy.sparse.base.spmatrix, compress: bool = True, **kwargs) → None[source]¶ Write a to the file target in matrix market format
This function is a drop-in replacement for scipy.io.mmwrite. The only difference is that it gzip compresses the output by default. It does not alter the file extension, which should likely end in “mtx.gz” except in special circumstances.
If compress is True, then this function imports gzip.
Parameters: - target (str) – The complete path to the output file, including file extension
- a (scipy.sparse.spmatrix) – The sparse matrix
- compress (bool) – Whether to compress the output
- **kwargs (<key>=<value> pairs) – These are passed through to
scipy.io.mmwrite()
. Please see the scipy documentation for more details.
Returns: Return type: None, but the matrix is written to disk