联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> javajava

日期:2018-10-23 10:04

Python Project: Matrix Decomposition

Instructor: Qi Deng

Deadline 10/21/2018 11:59am (Noon)

Update: 10/14/2018

Background

In this project, you will develop efficient algorithms for matrix decomposition and direct solver for linear

system . Walking through this project, you will be familiar with algorithm design in the world of

numerical computing.

For many applications of numerical linear algebra, we would rather keep in the decomposed

form. Here are some "simpler" matrices retaining certain structures. For example, we can view

where and are lower and upper triangular matrices, respectively. If we want to solve

, the LU decomposition allows us to solve two easier problems:

This is precisely the idea underlying Gaussian elimination we learned in the first year, but in a different

view. LU decomposition has some other advantages. Once we obtain the decomposition, the two

triangular systems only cost , which is significantly better than . However, we don't get

reduced complexity for free, as it could take to compute LU in the first place. That being said, LU is

still a great choice for practical use. Suppose now we want to solve a sequence of linear systems for

vectors while using the same . Then we only need to perform one LU while still

solving cheap triangular system each time.

In this project you will implement for general matrix and a variant called Cholesky factorization for

symmetric matrix. Note that there are many other decompositions for various purposes. For instance, QR

decomposition lets us write where is orthogonal and is upper-triangular. can be very

useful in Krylov subspace methods and eigen decomposition. One who is interested in this can find more

in courses on numerical computing 1 . The pseudocode of LU or Cholesky can be found in the book as

well.

Rules

1. You can exchange ideas with other teams but you are not allowed to exchange code and any

writings.

2. For your algorithm implementation (in src/), you are not allowed to use any external matrix

factorization such as Scipy.linalg.cholesky. However, you are allowed to examine the

correctness of your algorithm by comparing with output of those libraries (in tests/).

3. Submission after deadline will have 20 points penalty each passing day.

4. Compress your project in a single zip file for submission, remember again, don't include any

datasets.

5. Submission by one team meber is enough, if 张三 is your team leader, name the zip file

zhangsan.zip. But do include the names and IDs of all your members in the report.

Questions (95 points)

1. LU (20 points).

Implement LU decomposition for non-degenerate square matrix. We attempt to write our algorithm in an

OOP style. In the following, we provide a base class for any matrix decomposition algorithm. You will

need to complete the code for LU class. Althouth there are infinite ways to decompose into , in

your code, make sure the diagonal of is 1. (Refer to 3.2.8 of book 1 for implementations)

# decompose.py

class Decompose(object):

""" Matrix in decomposed form

"""

def __init__(self):

pass


def factorize(self, A):

# Abstract methods for matrix factorization

pass


# lu.py


class LU(Decompose):

def factorize(self, A):

# A is a non-degenerate matrix

# Return P, L, U, here P is a permutation;

# it will be the permutation list in the pivoted LU

L = np.zeros(A.shape)

U = np.zeros(A.shape)

# Todo

P = list(range(A.shape[0]))

You can use the following code to generate synthetic matrices to test your LU.

2. Cholesky. (20 points)

The above LU can be applied to any non-degenerate square matrix. If we further assume that matrix is

symmetric positive definite, it immediately follows that . The advantage of exploiting symmetry is

that now we can obtain for free. In such case, we can obtain by Cholesky decomposition.

(Refer to 4.2.5 of book 1 ).

3. Pivoting (20 points).

Let us walk through an example to see the importance of pivoting. Even though this is an artificial

example, it may still help us to understand the importance of pivoting better. Consider solving the

following equations where

return P, L, U



def testcase(p=100):

L = np.tril(np.random.normal(size=[p, p]))

np.fill_diagonal(L, np.abs(L.diagonal()))

U = np.tril(np.random.normal(size=[p, p])).T

np.fill_diagonal(U, np.abs(U.diagonal()))

A = L.dot(U)

return A

# chol.py

class Chol(Decompose):

def factorize(self, A):

""" A is a SPD matrix

return P, L, here P is a permutation.

Find L such that A = LL'

"""

L = np.zeros(A.shape)

# TODO

P = list(range(A.shape[0]))

return P, L

A = np.asarray([[1e-10, 1], [1, 0]])

b=np.array([1, 1e-7])

If we do LU without pivoting, make sure your code will generate:

In Python, we can solve the triangular system by function scipy.linalg.solve_triangular. Thus the

following two lines should give us the solution of .

This will show

The solution is wrong in sense that the second equation gets the wrong answer.

Now switch the rows of A and b in th following way, use the triangular solver to find , and x again, do

you get the same answer or not? And what is your observation?

You probably see why we need large pivots, the point is that good pivoting rule can stablize and ,

which means stronger stability in solving the linear systems. Now implement a simple pivoting rule

in LU to realize this idea. That is to say you will need to permutate rows of when you find small

number in the diagonal. Consequently, you will generate such that

L = array([[1.e+00, 0.e+00],

[1.e+10, 1.e+00]])

U = array([[ 1.e-10, 1.e+00],

[ 0.e+00, -1.e+10]])

y = solve_triangular(L, b, lower=True)

x = solve_triangular(U, y, lower=False)

print(x)

[0. 1.]

A = np.asarray([[1, 0], [1e-10, 1]])

b=np.array([ 1e-7, 1])

# pivoted lu.py

def __init__(self, permute_row=False):

"""Todo

self.permute_row = permute_rowd

permute_row is a boolean variable, if true, compute PA=LU;

otherwise compute A = LU;

"""

pass

def factorize(self, A):

# Return additional P matrix where P is a permutation matrix,

# P should be stored in a a permutation list (list or 1-dim array),

# Todo

return P, L, U

In general, a permutation matrix 2 is a sparse matrix of 0 and 1, and each column and row sums up to 1.

For example,

In practice, it is more efficient to store it as a permutation list: . In your implementation,

make sure that the returned permutation is stored in a list. Please check wikipedia 2 for more

information.

4. Sparsity (35 points).

In many applications (in fact, most of the intersting ones), is an extremely sparse matrix, their nnzs

grows sublinearly with the dimension of the matrix ( , often ).

Part a) (15 points) An important question we mentioned is how to perform pivoting and reduce "fill in"

in matrix decomposition. That is to say, we want to avoid adding new nonzeros to the factorized matrix.

Suppose we generate in the following way. Clearly it is a sparse matrix. Apply your cholesky

decomposition to to obtain . Run the following code and how many nonzeros are there in ? Is

still as sparse as ? Here np.count_nonzero counts the number of nonzeros in ndarray.

Suppose we permute to , apply your Chol code to obtain and how many number of nonzeros are

there in , how is the sparsity compared with ?

Now you may have an impression that the ordering of rows and columns affects the sparsity pattern. In

practice we would like to have decomposition as sparse as possible. Try to implement some pivoting

heuristics so that you can reduce the "fill in" . Don't worry about efficiency of ndarray in this part.

p = 100

A = np.zeros((p, p), dtype=float)

A[:, 0] = 0.1

A[0, :] = 0.1

A[np.arange(p-1), np.arange(p-1) + 1] = 0.1

A[np.arange(p-1) + 1, np.arange(p-1)] = 0.1

np.fill_diagonal(A, 2)

mychol = Chol()

La = mychol.factorize(A)

print(np.count_nonzero(La))

B = A[::-1, ::-1]

part b) (20 points) It is prohibitive to perform an algorithm for large . Try to improve the

efficiency of Chol by replacing ndarray with sparse format. Even better, use C level programming in

Cython which may significantly improve the speed. In your code, implement one or few advanced direct

methods for sparse linear system 3 . A simple strategy of your code can be that:

1. Observe the nnz pattern of ; build adjacency graph

2. Find good ordering by heurstics

3. Predict the nonzeros of (Symbolic)

4. Perform factorization

Our TA Shengyu provides the following code to test some benchmarks by calling sksparse, which is the

Python interface for SuiteSparse. You can see how the state of the art solver performs. The data are in

mm format.

You can carry out a your test in the same way, make sure that the reconstructed matrix is as close to the

original one as possible: . We will consider two perspectives when evaluating the code

performance.

1. Correctness in solving , you will solve triangular systems and , you can

class Chol(Decompose):


def __init__(self, pivoting):

"""Todo: pivoting (bool) is a flag for enable pivoting

"""

def factorize(self, A):

""" Todo: Find P, L such that PAP' = LL'

"""

return P, L

# sparsechol.py

class SparseChol(Decompose):


def __init__(self, pivoting):

"""Todo: pivoting (bool) is a flag for enable pivoting

"""

pass

def factorize(self, A):

"""A is a sparse matrix, csc, csr or coo."""

# Todo, perform sparse cholesky factorization,

# You may use it as an interface and leave the implementation details in external

pyx code

return P, L

find the corresponding functions in scipy.sparse.

2. If your code is correct, then we will check efficiency of factorization. How long does it take to run

the factorize method, and does it generate a reasonably sparse ?

About Cython

1. Here I provide a simple setup file, you don't have to use the same as mine. Suppose you have two

cython file fast_chol and cy_utils.cy under src/. You need the Extension class to wrap up

your cython configuration. In this example I need C interface of numpy and blas which is installed

in openblas. This may not be necessary for you.

import numpy as np

from scipy.io import mmread

from sksparse.cholmod import cholesky

from scipy.sparse.linalg import norm as spnorm, spsolve_triangular as spsolve

from numpy.linalg import norm

import time

def test_norm_diff(A, L, P, tol=1e-8):

permA = A[P[:, np.newaxis], P[np.newaxis, :]]

assert spnorm(permA - L.dot(L.T)) / spnorm(permA) < tol

def test_normal_equation(A, L, P, tol=1e-8):

b = np.random.randn(A.shape[0])

PT = np.zeros(P.shape, dtype=int)

PT[P] = np.arange(P.shape[0])

y = spsolve(L.tocsr(), b[P])

z = spsolve(L.T, y, lower=False)

x = z[PT]

assert norm(A.dot(x) - b) / norm(b) < tol

mat = mmread("LFAT5/LFAT5.mtx").tocsc()

tstrt = time.time()

factor = cholesky(mat)

print("time cost: {:.2e} secs".format(time.time() - tstrt))

test_norm_diff(mat, factor.L(), factor.P())

test_normal_equation(mat, factor.L(), factor.P())

"""

Setup cython file

python setup.py build_ext --inplace

"""

from distutils.core import setup

2. Use Cython interface of numpy.ndarray. Also check memoryview and C raw pointer in the Cython

book if you want to squeeze even more running time.

Structures (5 points)

For submission, you should have your project organized, please stick to the following structure.

Readme.md: let the TA knows how to run your code.

report.ipynb or report.md . Leave your algorithm description, discussion and conclusion here.

setup.py and Makefile are for configuring your code, they are optional

src/ contains all the source code

test/ test scripts to run your code

datasets/ You can put downloaded data or generated data here but do not include them for

submission!!! (Don't lose 5 points for this.)

You can make any auxiliary folder or file if necessary.

from distutils.extension import Extension

from Cython.Build import cythonize

import numpy as np

from numpy.distutils.system_info import get_info

from sklearn._build_utils import get_blas_info

blas_include = "/usr/local/opt/openblas/include"

extensions = [

Extension("src.fast_chol", sources=['src/fast_chol.pyx'],

include_dirs=[np.get_include(), blas_include]),

Extension("src.cy_utils", sources=['src/cy_utils.pyx'],

include_dirs=[np.get_include(), blas_include]),

]

setup(

name='spchol',

ext_modules=cythonize(extensions),

)

yourproject/

Readme.md

report.ipynb

setup.py

Makefile

src/

__init__.py

decompose.py

lu.py

chol.py

spchol.py

my_secret_code.pyx

1. Golub, Gene H., and Charles F. Van Loan. 1996. “Matrix Computations (3rd Ed.).”

2. Permutation matrix. https://en.wikipedia.org/wiki/Permutation_matrix

3. Davis, Timothy A., Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar. 2016. “A Survey of Direct Methods for Sparse Linear Systems.” Acta

Here I provide a simple script to load data in .mat format. I only test it on a few datasets and it seems

working.

Testing dataset

We will consider two types of datasets

1. Dense random data.

2. Sparse data from SuiteSparse matrix collection. They provide a convenient Java GUI for

downloading data.

Reference


版权所有:留学生程序网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。