當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python numpy矩阵索引_python-为什么scipy csr矩阵的行索引比numpy数组...

發(fā)布時(shí)間：2025/3/21 python 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 python numpy矩阵索引_python-为什么scipy csr矩阵的行索引比numpy数组... 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

我在下面演示的簡短答案是,構(gòu)造新的稀疏矩陣非常昂貴.開銷很大,不依賴于行數(shù)或特定行中非零元素的數(shù)量.

稀疏矩陣的數(shù)據(jù)表示形式與密集陣列的數(shù)據(jù)表示形式完全不同.數(shù)組將數(shù)據(jù)存儲(chǔ)在一個(gè)連續(xù)的緩沖區(qū)中,并有效地使用形狀和步幅來迭代選定的值.這些值加上索引定義了將在緩沖區(qū)中找到數(shù)據(jù)的確切位置.將這N個(gè)字節(jié)從一個(gè)位置復(fù)制到另一個(gè)是整個(gè)操作的相對(duì)較小的部分.

稀疏矩陣將數(shù)據(jù)存儲(chǔ)在幾個(gè)包含索引和數(shù)據(jù)的數(shù)組(或其他結(jié)構(gòu))中.然后,選擇一行需要查找相關(guān)索引,并使用選定的索引和數(shù)據(jù)構(gòu)造一個(gè)新的稀疏矩陣.稀疏包中有已編譯的代碼,但是底層代碼不如numpy數(shù)組那么多.

為了說明這一點(diǎn),我將制作一個(gè)小的矩陣,而不是那么密集,因此我們沒有很多空行：

In [259]: A = (sparse.rand(5,5,.4,'csr')*20).floor()

In [260]: A

Out[260]:

<5x5 sparse matrix of type ''

with 10 stored elements in Compressed Sparse Row format>

密集等效項(xiàng),以及一個(gè)行副本：

In [262]: Ad=A.A

In [263]: Ad

Out[263]:

array([[ 0., 0., 0., 0., 10.],

[ 0., 0., 0., 0., 0.],

[ 17., 16., 14., 19., 6.],

[ 0., 0., 1., 0., 0.],

[ 14., 0., 9., 0., 0.]])

In [264]: Ad[4,:]

Out[264]: array([ 14., 0., 9., 0., 0.])

In [265]: timeit Ad[4,:].copy()

100000 loops, best of 3: 4.58 ?s per loop

矩陣行：

In [266]: A[4,:]

Out[266]:

<1x5 sparse matrix of type ''

with 2 stored elements in Compressed Sparse Row format>

查看此csr矩陣(3個(gè)1d數(shù)組)的數(shù)據(jù)表示形式：

In [267]: A.data

Out[267]: array([ 0., 10., 17., 16., 14., 19., 6., 1., 14., 9.])

In [268]: A.indices

Out[268]: array([3, 4, 0, 1, 2, 3, 4, 2, 0, 2], dtype=int32)

In [269]: A.indptr

Out[269]: array([ 0, 2, 2, 7, 8, 10], dtype=int32)

這是選擇行的方式(但在已編譯的代碼中)：

In [270]: A.indices[A.indptr[4]:A.indptr[5]]

Out[270]: array([0, 2], dtype=int32)

In [271]: A.data[A.indptr[4]:A.indptr[5]]

Out[271]: array([ 14., 9.])

“行”是另一個(gè)稀疏矩陣,具有相同類型的數(shù)據(jù)數(shù)組：

In [272]: A[4,:].indptr

Out[272]: array([0, 2])

In [273]: A[4,:].indices

Out[273]: array([0, 2])

In [274]: timeit A[4,:]

是的,稀疏矩陣的時(shí)序很慢.我不知道實(shí)際選擇數(shù)據(jù)要花費(fèi)多少時(shí)間,以及構(gòu)造新矩陣要花費(fèi)多少時(shí)間.

10000 loops, best of 3: 145 ?s per loop

In [275]: timeit Ad[4,:].copy()

100000 loops, best of 3: 4.56 ?s per loop

lil格式可能更容易理解,因?yàn)閿?shù)據(jù)和索引存儲(chǔ)在子列表中,每行一個(gè).

In [276]: Al=A.tolil()

In [277]: Al.data

Out[277]: array([[0.0, 10.0], [], [17.0, 16.0, 14.0, 19.0, 6.0], [1.0], [14.0, 9.0]], dtype=object)

In [278]: Al.rows

Out[278]: array([[3, 4], [], [0, 1, 2, 3, 4], [2], [0, 2]], dtype=object)

In [279]: Al[4,:].data

Out[279]: array([[14.0, 9.0]], dtype=object)

In [280]: Al[4,:].rows

Out[280]: array([[0, 2]], dtype=object)

這樣的速度比較在處理緊密的編譯代碼時(shí)是有意義的,在這種情況下,字節(jié)從內(nèi)存的一部分到另一部分的移動(dòng)是大量的時(shí)間消耗者.在numpy和scipy中混合使用Python和已編譯的代碼,您不能僅僅計(jì)算O(n)操作.

=============================

這是從A中選擇行所需的時(shí)間,以及返回新的稀疏矩陣所需的時(shí)間：

只需獲取數(shù)據(jù)：

In [292]: %%timeit

d1=A.data[A.indptr[4]:A.indptr[5]]

i1=A.indices[A.indptr[4]:A.indptr[5]]

.....:

100000 loops, best of 3: 4.92 ?s per loop

加上制作矩陣所需的時(shí)間：

In [293]: %%timeit

d1=A.data[A.indptr[4]:A.indptr[5]]

i1=A.indices[A.indptr[4]:A.indptr[5]]

sparse.csr_matrix((d1,([0,0],i1)),shape=(1,5))

.....:

1000 loops, best of 3: 445 ?s per loop

嘗試一個(gè)更簡單的Coo矩陣

In [294]: %%timeit

d1=A.data[A.indptr[4]:A.indptr[5]]

i1=A.indices[A.indptr[4]:A.indptr[5]]

sparse.coo_matrix((d1,([0,0],i1)),shape=(1,5))

.....:

10000 loops, best of 3: 135 ?s per loop

總結(jié)

以上是生活随笔為你收集整理的python numpy矩阵索引_python-为什么scipy csr矩阵的行索引比numpy数组...的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python中json模块博客园_Pyt
下一篇： python的数值可以转换为字符串_py