Posted 2021-03-14Updated 2021-03-14Python / Numpy6 minutes read (About 907 words)

NDArray 科学计算

NumPy Numeric

矩阵、向量乘积

Function	Desc
`dot(a,b[,out])`	`a`最后轴与`b`倒数第二轴的点积，即shape满足线代要求
`inner(a,b[,out])`	`a`最后轴与`b`最后轴的点积
`vdot(a,b)`	向量点积，多维将被展平
`outer(a,b[,out])`	向量外积，多维将被展平
`matmul(x1,x2,/[,out,casting,order,...])`	矩阵乘积
`tensordot(a,b[,axes])`	沿指定轴计算张量积
`einsum(subscripts,*operands[,out,dtype,...])`	Einstein求和约定
`einsum_path(subscripts,*operands[,optimize])`	考虑中间数组情况下评估计算表达式最小代价
`linalg.matrix_power(a,n)`	方阵幂
`kron(a,b)`	Kronecker积（矩阵外积，分块）
`trace(a[,offset,axis1,axis2,dtype,out])`	迹

Einstein求和约定：简化求和式中的求和符号

a = np.arange(0,15).reshape(3,5)
b = np.arange(1,16).reshape(3,5)
# Transpose
np.einsum("ij->ji", a)
# Sum all
np.einsum("ij->", a)
# Sum along given axis
np.einsum("ij->i", a)
np.einsum("ij->j", a)
# Multiply
np.einsum("ij,ij->",a,b)
# Inner product
np.einsum("ik,jk->",a,b)

np.tensordot：张量积，类似普通内积，仅有结构
- axes为整形
  - axes>0：a末尾axes维度、b开头axes维度内积
  - axes=0：Kronecker积
- axes为2-Tuple：分别指定a、b内积的轴

其他

`np.linalg`

NumPy的线代基于BLAS、LAPACK提供高效的标准底层实现
- 依赖库可以是NumPy提供的C版本子集
- 也可是针对特定平台优化的库（更好）
  - OpenBLAS
  - MKL
  - ATLAS

`np.linalg`

Function	Desc
`multi_dot(arrays)`	自动选择最快的计算顺序计算内积
`cholesky(a)`	cholesky分解
`det(a)`	行列式
`eig(a)`	特征值、特征向量（右乘）
`eigh(a[,UPLO])`	Hermitian（共轭对称）或实对称矩阵特征值、特征向量
`eigvals(a)`	特征值
`eigvalsh(a[,UPLO])`	Hermitian（共轭对称）或实对称矩阵特征值
`inv(a)`	矩阵逆
`lstsq(a,b[,rcond])`	最小二乘解
`norm(x[,ord,axis,keepdims])`	矩阵、向量范数
`pinv(a[,rcond,hermitian])`	Moore-Penrose伪逆
`solve(a,b)`	线程方程组求解
`tensorsolve(a,b[,axes])`	张量方程组求解
`tensorrinv(a[,ind])`	张量逆
`svd(a[,full_matrices,compute_uv,hermitian])`	奇异值分解
`qr(a[,mode])`	QR分解
`matrix_rank(M[,tol,hermitian])`	使用SVD方法计算矩阵秩
`slogdet(a)`	行列式的符号、自然对数

部分线代函数支持传入高维数组、数组序列，同时计算结果
- 对高维数组，要求数组最后2、1维度满足计算要求

（快速）傅里叶变换`np.fft`

Standard FFTs

Function	Desc
`fft(a[,n,axis,norm])`	1维离散傅里叶变换
`fft2(a[,n,axes,norm])`	2维离散FFT
`fftn(a[,n,axes,norm])`	N维离散FFT
`ifft(a[,n,axis,norm])`	1维离散逆FFT
`ifft2(a[,n,axes,norm])`	2维离散逆FFT
`ifftn(a[,n,axes,norm])`	N维离散逆FFT

Real FFTs

Function	Desc
`rfft(a[,n,axis,norm])`	1维离散傅里叶变换
`rfft2(a[,n,axes,norm])`	2维离散FFT
`rfftn(a[,n,axes,norm])`	N维离散FFT
`irfft(a[,n,axis,norm])`	1维逆离散FFT
`irfft2(a[,n,axes,norm])`	2维离散逆FFT
`irfftn(a[,n,axes,norm])`	N维离散逆FFT

Hermitian FFTs

Function	Desc
`hfft(a[,n,axis,norm])`	Hermitian对称（实谱）的信号的FFT
`ihfft(a[,n,axis,norm])`	Hermitian对称（实谱）的信号的逆FFT

其他

Function	Desc
`fftfreq(n[,d])`	离散FFT样本频率
`rfftfreq(n[,d])`
`fftshift(x[,axes])`	平移0频成分到频谱中间
`ifftshift(x[,axes])`

`np.lib.scimath`

np.lib.scimath中包含一些顶层命名空间的同名函数
- 相较于顶层空间，其定义域被扩展，相应其值域也扩展到复数域
  1
  np.emath.log(-np.e) == 1 + np.pi * 1j

np.emath是np.lib.scimath模块的推荐别名

Posted 2021-03-11Updated 2021-03-11Python / Numpy35 minutes read (About 5293 words)

NDArray Routine

Array Manipulation

Shape Only

Routine	Function Version	Method Version
`reshape(a,newshape[,order])`
`resize(a,new_shape)`	大小可不同，重复`a`补不足	`0`补不足
`ravel(a[,order])`	展平视图
`.flatten([order])`	无	展平副本
`shape(a)`
`size(a)`

Order Alteration

Routine	Function Version	Method Version
`transpose(a[,axes])`	调整轴顺序，缺省逆序即转置
`moveaxis(a,source,destination)`	移动数组轴到新位置	无
`rollaxis(a,axis[,start])`	将指定后向插入至指定位置（缺省0）	无
`swapaxes(a,axis1,axis2)`	交换轴
`flip(m[,axis])`	沿指定轴反向，缺省所有轴	无
`fliplr(m)`	左右反向（沿第2轴）	无
`flipud(m)`	上下反向（沿第1轴）	无
`roll(a,shift[,axis])`	沿轴滚动`shift`	无
`rot90(m[,k,axes])`	在`axes`指定的平面中旋转`k`次90度	无
`lib.stride_tricks.as_strided(x[,shape,...])`	利用给定shape、stride在`x`上创建视图

维数改变

Routine	Function Version	Method Version
`atleast_1d(*arys)`	prepend维度直至维度至少维数至少1	无
`atleast_2d(*arys)`		无
`atleatt_3d(*arys)`		无
`broadcast(*arys)`	广播、打包输入对应元素的元组迭代器，类似`zip`	无
`broadcast_to(array,shape[,subok])`	广播为指定shape	无
`boradcast_arrays(args,*kwargs)`	输入的广播结果列表	无
`expand_dims(a,axis)`	在指定位置插入新轴	无
`squeeze(a[,axis])`	删除大小为1维度

插入、删除元素

Routine	Function Version
`delete(arr,obj[,axis])`	删除`obj`指定部分，缺省按展平数组删除
`insert(arr,obj,values[,axis])`	缺省按展平数组插入
`append(arr,values[,axis])`	缺省`arr`、`values`展平再添加
`trim_zeros(filt[,trim])`	trim前导、尾随0，缺省两边

改变类型

Routine	Function Version	Method Version
`asarray(a[,dtype,order])`	转换为数组	无
`asarray_chkfinite(a[,dtype,order])`	检查`NaN`、`inf`	无
`asanyarray(a[,dtype,order])`	转换为数组，数组子类则不变	无
`ascalar(a)`	将大小为1的数组转换为等效标量
`require(a[,dtype,requirements])`	创建满足要求`ndarray.flags`数组	无
`asfortranarray(a[,dtype])`	转换为Fortran-contiguous风格内存布局	无
`ascontiguousarray(a[,dtype])`	转换为C-contiguous风格内存布局	无
`asmatrix(data[,dtype])`		无
`asfarray(a[,dtype])`	转换为浮点类型	无
`.astype(dtype[,order,casting,...])`	无	转换为指定类型

numpy中数组不是仅有C、Fortran风格内存布局，对数组的形态变换会导致内存布局不为任何风格内存布局

组合数组

Routine	Function Version
`concatenate((a1,a2,...)[,axis,out])`	沿现有轴连接数组
`stack(arrays[,axis,out])`	创建给定（新）轴堆叠数组
`row_stack(tup)/vstack(tup)`	沿第1（竖直）轴堆叠
`column_stack(tup)/hstack(tup)`	沿第2（水平）轴堆叠
`dstack(tup)`	沿第3轴堆叠
`block(arrays)`	按照`arrays`中给定数组块、位置组装

拆分数组

Routine	Function Version
`split(ary,indices_or_sections[,axis])`	沿轴拆分成视图
`array_split(ary,indices_or_sections[,axis])`	同`split`，但可处理不整除拆分
`vsplit(ary,indices_or_sections)`	沿第1（竖直）轴拆分
`hsplit(ary,indices_or_sections)`	沿第2（水平）轴拆分
`dsplit(ary,indices_or_sections)`	沿第3轴拆分

Padding

Function	Desc
`pad(array,pad_width[,mode])`

Index Routine

结果数组shape考虑逻辑链
- 确定输出数组的维数ndim
- 确定参数数组原维度轴位置、补1轴位置，参数维度轴对齐
- 修正各维度大小
  - 沿轴操作：保持不变
  - 沿轴采样：采样数目
  - 沿轴concate：维度相加
  - 沿轴聚集：删除维度
  - 沿轴切片聚集：删除其余维度
numpy中（多维）索引往往使用整数高级索引的方式返回
- np.ndarray数组：首维度各分量分别表示各维度的高级索引
- list、tuple：各元素分别为各维度的高级索引

数组无关切片、高级索引

Routine	Function Version	返回值类型
`s_[]`	支持多维切片生成，类`slice()`	切片、元组
`index_exp[]`	同`s_`，但总返回元组	元组
`r_[]`	沿第1轴concate切片、数组、标量	数组
`c_[]`	沿第-1轴concate切片、数组、标量（1维则被视为列向量）	数组
`ravel_multi_index(multi_index,dims[,mode,order])`	计算高级索引`multi_index`在`dims`数组展平后的位置	数组
`unravel_index(indices,shape[,order])`	`ravel_multi_index`逆向	元组

np.r_[]、np.c_[]除可concate切片方便生成数组，还可以传递两个参数修改行为
- r/c字符被设置时，返回矩阵
  - 1维数组，r被设置时返回1 N矩阵，c被设置时返回N 1矩阵
  - 2维数组，r、c被设置时，结果矩阵相同
- <axis>[,<ndim>,<ori_pos>]三个整形，决定shape
  
  |参数|说明|np.r_[]缺省值|np.c_[]缺省值| |——-|——-|——-|——-| |<axis>|concate执行轴|0|-1| |<ndim>|目标维数，仅在其大于结果维数时才生效|1|2| |<ori_pos>|原数据轴所在的位置|-1，即prepend全1轴|0，即postpend全1轴|
- 相同参数时，两者结果相同，可根据不同数组设置合适的参数相互实现
  - np.r_[]可视为参数缺省为0,1,-1
  - np.c_[]可视为参数缺省为-1,2,0

np.r_、np.c_分别是np.lib.index_tricks.RClass、 np.lib.index_tricks.CClass实例

np.s_、np.index_exp均是 np.lib.index_tricks.IndexExpression实例，仅初始化参数不同

网格

Routine	Function Version	返回值类型
`ix_(*args)`	以`args`为基点创建开网格（仅设置基点、维度）	元组
`meshgrid(xi,*kwargs)`	以`xi`作为基点创建稠密网格（所有网格点高级索引）	列表
`mgrid[]`	根据切片创建稠密网格	数组
`ogrid[]`	根据切片创建开网格	列表
`indices(dimensions[,dtype,sparse])`	以`dimensions`作为各维度长创建网格	数组、元组

开网格广播即可得到稠密网格

值相关索引

Routine	Function Version	Method Version
`nonzero(a)`	非0元素整形高级索引
`where(condition,[x,y])`	`condition`对应整形高级索引，给出`x,y`时则从中抽取元素	无
`flatnonzero(a)`	展平非0位置	无

特殊位置索引

Routine	Function Version
`diag_indices(n[,ndim])`	`ndim`维长为`n`数组对角索引
`diag_indices_from(arr)`	获取`arr`对角索引
`mask_indices(n,mask_func[,k])`	根据`mask_func`获取n * n数组索引
`tril_indices(n[,k,m])`	n * m的下三角索引
`triu_indices(n[,k,m])`	n * m的上三角索引
`tril_indices_from(arr[,k])`	`arr`的下三角索引
`triu_indices_from(arr[,k])`	`arr`的下三角索引

np.ndindex(*args) == np.broadcast(*np.indices(*args))

Searching 索引

Routine	Function Version	Method Version
`argwhere(a)`	非0点坐标数组	无
`argmax(a[,axis,out])`	展平后位置，存在`NaN`则返回`0`
`argmin(a[,axis])`
`nanargmax(a[,axis])`	忽略`NaN`
`nanargmin(a[,axis])`
`searchsorted(a,v[,side,sorter])`	应插入（保持有序）位置

Value Manipulation

Value Extraction

Routine	Function Version	Method Version
`take(a,indices[,axis,out,mode])`	按`indices`沿给定轴获取超平面（缺省将数组展平）
`take_along_axis(arr,indices,axis)`	将`arr`、`indices`沿`axis`匹配，选取元素	无
`compress(condition,a[,axis,out])`	按bool数组`condition`沿给定轴`axis`选取超平面（缺省将数组展平）
`extract(condition,arr)`	在展平数组上抽取元素	无
`choose(a,choices[,out,mode])`	根据`a`广播后元素值选择`choices`中数组填充对应位置
`select(condlist,choicelist[,default])`	`condlist`中首个真值对应的`choicelist`数组填充对应位置	无
`diag(v[,k])`	从2维`v`抽取对角、或以1维`v`作为对角	无
`diagonal(a[,offset,axis1,axis2])`	返回给定对象

take：沿给定轴从数组中获取元素
- axis为None时，按展平后获取indices指定元素，非None时
  - 函数行为同高级索引
  - 指定axis可以简化通过高级索引获取指定轴的元素
- 基本元素为数组在该轴的切片
1
2
3
4
5
6
Ni, Nk = a.shape[:axis], a.shape[axis+1:]
Nj = indices.shape
for ii in np.ndindex(Ni):
for jj in np.ndindex(Nj):
for kk in np.ndindex(Nk):
out[ii+jj+kk] = a[ii+(indices[jj],)+kk]

take_along_axis：匹配给定轴方向的1维索引、数据切片，获取元素

基本元素为单个元素
- 将indices和arr对齐，除给定维度外，其余维度大小均须相同
- 其余维度给定下，按照indices在超平面上给出的位置获取对应的元素
- 即take以超平面为单位获取整个超平面的元素，而 take_along_axis按元素为单位，沿给定轴方向调整元素顺序
np.argsort、np.argpartition等函数能够返回适合此函数的索引

N1, M, Nk = arr.shape[:axis], arr.shape[axis], arr.shape[axis+1:]
J = indices.shape[axis]
out = np.empty(Ni + (J,) + Nk)
for ii in np.ndindex(Ni):
	for kk in np.ndindex(Nk):
		a_1d = arr[ii + np.s_[:,] + kk]
		indices_1d = indices[ii + np.s_[:,] +kk]
		out_1d = out[ii + np.s_[:,] + kk]
		out_1d = a_1d[indices_1d[j]]

np.choose
- choices：数组序列，其中数组和a需广播兼容
  - 若本身为数组，则其最外层被视为序列
- 逻辑
  - a、choices中数组共同广播
  - 广播结果的shape即为结果shape，其中a取值为n 处用数组choices[n]填充
1
np.choose(a,choices) == np.array([choices[a[I]][I] for I in np.ndindex(a.shape)])
np.select
- 使用各位置condlist首个真值出现的位序值构建a，则等价于np.choose(a,choicelist) （不考虑缺省值）
np.extract
- 等价于np.compress(np.ravel(condition), np.ravel(arr))
- 若condition为bool数组，也等价于arr[condition]

Value Modification

Routine	Function Version	Method Version
`place(arr,mask,vals)`	按照`mask`循环使用`vals`中值替换`arr`中元素	无
`put(a,ind,v[,mode])`	同`place`，但根据展平索引`ind`替换
`put_along_axis(arr,indices,values,axis)`	匹配`indices`和`arr`沿`axis`分量，替换值	无
`copyto(dst,src[,casting,where])`	根据bool数组`where`替换`dst`中元素	无
`putmask(a,mask,values)`	同`copyto`	无
`fill_diagonal(a,val[,wrap])`	用`val`填充`a`的主对角	无
`clip(a,a_min,a_max[,out=None,**kwargs])`	裁剪值

where、mask、condition缺省为、等价为bool数组

np.clip是ufunc

Sorting

Routine	Function Version	Method Version
`sort(a[,axis,kind,order,])`		在位排序
`lexsort(keys[,axis])`	根据`keys`中多组键沿`axis`轴排序（靠后优先级高）	无
`msort(a)`	沿第1轴排序	无
`argsort(a[,axis,kind,order])`	沿`axis`方向间接排序
`sort_complex(a)`	先实、后虚排序
`partition(a,kth[,axis,kind,order])`	以第`kth`大小数划分
`argpartition(a,kth[,axis,kind,order])`	间接分段

lexsort：按照axis方向、以keys中数组顺序作为权重进行间接排序
- keys：数组序列或2维以上数组
  - 数组最高维视为序列
  - keys为数组时，最高维被省略
  - 多个数组视为权重不同的排序依据，靠后优先级高
- axis：排序所沿轴方向，缺省为-1，沿最低维轴排序
  - 可视为按keys中数组逆序优先级，取用各数组沿轴方向的间接排序结果
  - 即对每个第1轴、axis构成平面，优先考虑第1轴末尾 axis方向数组进行排序，再依次考虑前序
- lexsort、argsort排序方向相同时，lexsort结果中最后子数组和argsort结果应差别不大（排序方向相同而不是axis参数取值相同）

Logical Test

真值测试

Routine	Function Version	Method Version
`all(a[,axis,out,keepdims])`	给定轴方向所有元素为真
`any(a[,axis,out,keepdims])`	给定轴方向存在元素为真

数组内容

Routine	Function Version
`isfinite(x,/[,out,where,casting,order,...])`	逐元素是否有限
`isinf(x,/[,out,where,casting,order,...])`
`isnan(x,/[,out,where,casting,order,...])`
`isnat(x,/[,out,where,casting,order,...])`	逐元素是否`NaT`
`isneginf(x,/[,out])`
`isposinf(x,/[,out])`

isneginf、isposinf行为类似ufunc，但不是

类型测试

Routine	Function Version
`iscomplex(x)`
`iscomplexobj(x)`	复数类型或复数值
`isfortran(a)`	Fortran contiguous
`isreal(x)`
`isrealobj(x)`	实数类型或实数值
`isscalar(x)`

Mathmatics

部分数学函数为ufunc

UFunc初等运算

Function	Desc
`add(x1,x2,/[out,where,casting,order,...])`
`subtract(x1,x2,/[,out,where,casting,...])`
`multiply(x1,x2,/[,out,where,casting,...])`
`divide(x1,x2,/[,out,where,casting,...])`
`true_devide(x1,x2,/[,out,where,casting,...])`
`floor_devide(x1,x2,/[,out,where,casting,...])`
`logaddexp(x1,x2,/[,out,where,casting,...])`	`ln(x1+x2)`
`logaddexp2(x1,x2,/[,out,where,casting,...])`	`log_2 (x1+x2)`
`negative(x,/[,out,where,casting,order,...])`
`positive(x,/[,out,where,casting,order,...])`
`power(x1,x2,/[,out,where,casting,order,...])`	`x1^x2`
`float_power(x1,x2,/[,out,where,casting,...])`	`x1^x2`
`remainder(x1,x2,/[,out,where,casting,...])`	求余/取模
`mod(x1,x2,/[,out,where,casting,order,...])`	求余/取模
`fmod(x1,x2,/[,out,where,casting,order,...])`	求余/取模
`divmod(x1,x2,/[,out1,out2],/[out,...])`
`absolute(x,/[,out,where,casting,order,...])`/`abs`
`rint(x,/[,out,where,casting,order,...])`
`sign(x,/[,out,where,casting,order,...])`
`heaviside(x1,x2,/[,out,where,casting,...])`	阶跃函数
`conj(x,/[,out,where,casting,...])`	对偶
`exp(x,/[,out,where,casting,order,...])`
`exp2(x,/[,out,where,casting,order,...])`
`log(x,/[,out,where,casting,order,...])`
`log2(x,/[,out,where,casting,order,...])`
`log10(x,/[,out,where,casting,order,...])`
`expm1(x,/[,out,where,casting,order,...])`	计算`exp(x)-1`
`log1p(x,/[,out,where,casting,order,...])`	计算`ln(x+1)`
`sqrt(x,/[,out,where,casting,order,...])`	非负平方根
`square(x,/[,out,where,casting,order,...])`
`cbrt(x,/[,out,where,casting,order,...])`	立方根
`reciprocal(x,/[,out,where,casting,order,...])`	倒数
`gcd(x,/[,out,where,casting,order,...])`	最大公约数
`lcm(x,/[,out,where,casting,order,...])`	最小公倍数

out参数可用于节省内存，如：G=A*B+C
- 等价于：t1=A*B; G=t1+C; del t1;
- 可利用out节省中间过程内存：G=A*B; np.add(G,C,G)

UFunc Floating函数

Routine	Function Version
`fabs(x,/[,out,where,casting,order,...])`	不可用于复数
`signbit(x,/[,out,where,casting,order,...])`	signbit是否设置，即`<0`
`copysign(x1,x2,/[,out,where,casting,order,...])`	根据`x1`设置`x2`的signbit
`nextafter(x1,x2,/[,out,where,casting,order,...])`	`x1`朝向`x2`的下个浮点数，即变动最小精度
`spacing(x,/[,out,where,casting,order,...])`	`x`和最近浮点数距离，即取值的最小精度
`modf(x[,out1,out2],/[,out,where],...)`	返回取值的整数、小数部分
`ldexp(x1,x2,/[,out,where,casting,...])`	计算`x12*x2`，即还原2为底的科学计数
`frexp(x[,out1,out2],/[,out,where],...)`	返回2为底的科学计数的假数、指数
`floor(x,/,out,*,where,...)`
`ceil(x,/,out,*,where,...)`
`trunc(x,/,out,*,where,...)`
`rint(x,/[,out,where,casting,order,...])`	最近整数
`around(a[,decimals,out])`/`round`/`round_`
`fix(x[,out])`	向零点取整

np.fix不是ufunc，但行为类似

比较函数

数值比较
- np.equal()更多应用于整形比较，比较浮点使用 np.isclose()更合适
- np.allclose()则是判断数组整体是否相同
- array_equal(a1,a2)数组a1、a2相同
- array_equiv(a1,a2)数组a1、a2广播后相同
逻辑运算符
- &、|、~：逐元素逻辑运算
  - 优先级高于比较运算符
- and、or、not：整个数组的逻辑运算
np.maximum()、np.minimum()函数
- max()寻找最大值效率比np.maximum.reduce()低，同样 min()效率也较低

UFunc比较函数

Routine	Function Version	Method Version
`greater(x1,x2,/[,out,where,casting,...])`		`>`
`greater_equal(x1,x2,/[,out,where,casting,...])`	`>=`
`less(x1,x2,/[,out,where,casting,...])`		`<`
`less_equal(x1,x2,/[,out,where,casting,...])`		`<=`
`not_equal(x1,x2,/[,out,where,casting,...])`		`!=`
`equal(x1,x2,/[,out,where,casting,...])`		`==`
`logical_and(x1,x2,/[,out,where,casting,...])`	逐元素`and`	`and`
`logical_or(x1,x2,/[,out,where,casting,...])`		`or`
`logical_xor(x1,x2,/[,out,where,casting,...])`		无
`logical_not(x1,x2,/[,out,where,casting,...])`		`not`
`maximum(x1,x2,/[,out,where,casting,...])`	逐元素选择较大者
`minimum(x1,x2,/[,out,where,casting,...])`	逐元素选择较小者
`fmax(x1,x2,/[,out,where,casting,...])`	逐元素选择较大者，忽略`NaN`
`fmin(x1,x2,/[,out,where,casting,...])`	逐元素选择较小者，忽略`NaN`

非UFunc

Routine	Function Version
`isclose(a,b[,rtol,atol,equal_nan])`	逐元素容忍度范围内相等
`allclose(a,b[,rtol,atol,equal_nan])`	`all(isclose())`
`array_equal(a1,a2[,equal_nan])`	数组整体
`array_equiv(a1,a2)`	广播后相等

UFunc Bit-twiddling函数

Routine	Function Version
`bitwise_and(x1,x2,/[,out,where,...])`
`bitwise_or(x1,x2,/[,out,where,...])`
`bitwise_xor(x1,x2,/[,out,where,...])`
`invert(x,/[,out,where,casting,...])`
`left_shift(x1,x2,/[,out,where,casting...])`
`left_shift(x1,x2,/[,out,where,casting...])`

UFunc 三角函数

Routine	Function Version
`sin(x,/[,out,where,casting,order,...])`
`cos(x,/[,out,where,casting,order,...])`
`tan(x,/[,out,where,casting,order,...])`
`arcsin(x,/[,out,where,casting,order,...])`
`arccos(x,/[,out,where,casting,order,...])`
`arctan(x,/[,out,where,casting,order,...])`
`arctan2(x1,x2,/[,out,where,casting,order,...])`	考虑象限下，`arctan(x1/x2)`
`hypot(x1,x2,/[,out,where,casting,order,...])`	计算斜边
`sinh(x,/[,out,where,casting,order,...])`	双曲正弦
`cosh(x,/[,out,where,casting,order,...])`
`tanh(x,/[,out,where,casting,order,...])`
`arcsinh(x,/[,out,where,casting,order,...])`
`arccosh(x,/[,out,where,casting,order,...])`
`arctanh(x,/[,out,where,casting,order,...])`
`deg2rad(x,/[,out,where,casting,order,...])`	角度转换为弧度
`rad2deg/degrees(x,/[,out,where,casting,order,...])`	弧度转换为角度

基本数学

Routine	Function Version	Method Version
`prod(a[,axis,dtype,out,keepdims,...])`
`nanprod(a[,axis,dtype,out,keepdims,...])`		无
`sum(a[,axis,dtype,out,keepdims,...])`
`nansum(a[,axis,dtype,out,keepdims,...])`		无
`cumprod(a[,axis,dtype,out,keepdims,...])`	累乘（也可用ufunc.accumulate）
`cumsum(a[,axis,dtype,out,keepdims,...])`	累加
`nancumprod(a[,axis,dtype,out,keepdims,...])`	`NaN`视为`1`	无
`nancumsum(a[,axis,dtype,out,keepdims,...])`	`NaN`视为`0`	无
`diff(a[,n,axis,prepend,append,...])`	沿给定轴1阶差分（保持类型不变，注意溢出）	无
`ediff1d(ary[,to_end,to_begin]`	沿展平顺序1阶差分	无
`gradient(f,varargs,*kwargs)`	梯度	无
`cross(a,b[,axisa,axisb,axisc,axis])`	向量叉积	无
`trapz(y[,x,dx,axis])`	梯形法则定积分	无

复数运算

Routine	Function Version	Method Version
`angle(z[,deg])`	角度	无
`real(val)`	实部
`imag(val)`	虚部
`conj/conjugate(x,/[,out,where,casting,order,...])`	复共轭

Miscellaneous

Routine	Function Version
`nan_to_num(x[,copy,nan,posinf,neginf])`	替换`NaN`、`inf`为数值
`real_if_close(a[,to])`	虚部接近0则省略
`interp(x,xp,fp[,left,right,period])`	1维线性插值
`polyfit(x,y,deg[,rcond,full,w,cov])`	最小二乘多项式拟合

Statistics

axis=None：默认值None，表示在整个数组上执行操作

Count

Routine	Function Version
`count_nonzero(a[,axis])`

顺序

Routine	Function Version	Method Version
`amin/min(a[,axis,out,keepdims,initial,where])`
`amax/max(a[,axis,out,keepdims,initial,where])`
`nanmin(a[,axis,out,keepdims,initial,where])`	忽略`NaN`
`nanmax(a[,axis,out,keepdims,initial,where])`
`ptp(a[,axis,out,keepdims])`	极差
`percentile(a,q[,axis,out,...])`	`q`取值`[0-100]`	无
`nanpercentile(a,q[,axis,out,...])`		无
`quantile(a,q[,axis,out,overwrite_input,...])`	`q`取值`[0,1]`	无
`nanquantile(a,q[,axis,out,...])`		无

均值、方差

Routine	Function Version	Method Version
`median(a[,axis,out,overwrite_input,keepdims])`		无
`average(a[,axis,weights,returned])`		无
`mean(a[,axis,dtype,out,keepdims])`
`std(a[,axis,dtype,out,ddof,keepdims])`	标准差
`var(a[,axis,dtype,out,ddof,keepdims])`	方查
`nanmedian(a[,axis,out,overwrite_input,...])`	无
`nanmean(a[,axis,dtype,out,keepdims])`	无
`nanstd(a[,axis,dtype,out,ddof,keepdims])`	无
`nanvar(a[,axis,dtype,out,ddof,keepdims])`	无

Routine	Function Version
`corrcoef(x[,y,rowvar,bias,ddof])`	Pearson积差相关系数
`correlate(a,v[,mode])`	卷积
`convolve(a,v[,mode])`	离散、线性卷积
`cov(m[,y,rowvar,bias,ddof,fweights,...])`	方差

Array Creation

Ones and Zeros

Routine	Function Version
`empty(shape[,dtype,order])`	无初始化
`empty_like(prototype[,dtype,order,subok,...])`	shape、类型同`prototype`
`eye(N[,M,k,dtype,order])`	对角为1的2D数组
`identity(n[,dtype])`	单位矩阵数组
`ones(shape[,dtype,order])`
`ones_like(a[,dtype,order,subok,shape])`
`zeros(shape[,dtype,order])`
`zeros_like(a[,dtype,order,subok,shape])`
`full(shape,fill_value[,dtype,order])`	全`full_value`数组
`full_like(a,fill_value[,dtype,order,...])`

Numerical Ranges

Routine	Function Version
`arange([start,]stop[,step][,dtpye])`	给定间距
`linspace(start,stop[,num,endpoint])`	给定数量，等差均分
`geomspace(start,stop[,num,endpoint,base,...])`	等比均分
`logspace(start,stop[,num,endpoint,base,...])`	在log10尺度上均分，同`np.power(10, np.linspace(start,stop))`

Repetition

Routine	Function Version	Method Version
`tile(A,reps)`	重复`A`（可是数组）创建一维数组	无
`repeat(a,repeats[,axis])`	沿已有轴重复`a`创建

Matrix-Relative

Routine	Function Version
`diag(v[,k])`	从2维`v`抽取对角、或以1维`v`作为对角
`diagflat(v[,k])`
`tri(N[,M,k,dtype])`	对角线及以下为1、其余为0矩阵
`tril(m[,k])`	下三角
`triu(m[,k])`	上三角
`vander(x[,N,increasing])`	Vandermonde矩阵

From Existing Data

Routine	Function Version
`array(object[,dtype,copy,order,subok,ndmin])`
`copy(a[,order])`
`frombuffer(buffer[,dtype,count,offset]`	从缓冲（如字节串）创建数组
`fromfunction(function,shape,**kwargs)`	以坐标为参数，从函数创建数组
`fromiter(iterable,dtype[,count])`

改变数组数据类型也可以视为是创建新数组

转入、转出

类型转出

Routine	Method Version
`.item(*args)`	根据`args`选择元素复制至标准python标量
`.tolist()`	转换为`.ndim`层嵌套python标量列表
`.itemset(*args)`	插入元素（尝试转换类型）
`.byteswap([inplace])`	反转字节序
`.view([dtype,type])`	创建新视图
`.getfield(dtype[,offset])`	设置数据类型为指定类型
`.setflags([write,align,uic])`	设置标志
`.fill(value)`	使用标量填充

打包二进制

Function	Desc
`packbits(a[,axis,bitorder])`	元素打包为标志位，`0`补足，返回`uint8`数组
`upackbits(a[,axis,bitorder])`

输入、输出

Routine	格式	输入	输出
`dump(file)`	pickle	无	文件
`tofile(fid[,sep,format])`	内存内容（`sep=""`）、分割符串	无	文件
`fromfile(file[,dtype,count,sep,offset])`	字节串、分割符串	文件	数组
`save(file,arr[,allow_pickle,fix_imports])`	`.npy`	数组	文件
`savez(file,args,*kwds)`	非压缩的`.npz`	（多个）数组	文件
`savez_compressed(file,args,*kwds)`	压缩的`.npz`	（多个）数组	无
`load(file[,mmap_mode,allow_pickle,...])`	`.npy`、`.npz`、pickle	文件	数组
`savetxt(fname,X[,fmt,delimiter,newline,...])`	分割符串	二维以下数组	文件
`loadtxt(fname[,dtype,comments,delimiter,...])`	分割符串	文件	数组
`genfromtxt(fname[,dtype,comments,...])`	分割符串	文件	数组
`fromregex(file,regexp,dtype[,encoding])`	正则表达式结构	文件	数组

串

Routine	Function Version	Method Version
`array2string(a[,max_line_width,precision,...])`		`__str__`
`array_repr(arr[,max_line_width,precision,...])`		`__repr__`
`array_str(arr[,max_line_width,precision,...])`		`__str__`
`dumps()`	无	pickle序列化
`loads(args,*kwargs)`	pickle	字节串	数组
`tobytes([order])`/`tostring`	内存内容字节串
`fromstring(string[,dtype,count,sep])`	从字符串、字节串（`sep=""`，且缺省）创建1维数组

np.loads即pickle.loads，不建议使用
np.fromstring
- sep=""：从二进制字节串中创建数组，类frombuffer
- sep置为分割符时，只能指定一种元素分隔符，也只能解析1维数组的字符串

字符串输出格式

Routine	Function Version
`format_float_positional(x[,precision,...])`	格式化位置计数
`format_float_scientific(x[,precision,...])`	格式化科学计数
`set_printoptions([precision,threshold,...])`
`get_printoptions()`
`set_string_function(f[,repr])`
`printoptions(args,*kwargs)`	设置打印选项的上下文管理器
`binary_repr(num[,width])`	二进制字符串
`base_repr(number[,base,padding])`

Data Source

Function	Desc
`DataSource([destpath])`	通用数据源文件（file，http，ftp等）

Posted 2021-03-11Updated 2021-03-11Python / Numpy2 minutes read (About 290 words)

Numpy 性能

Miscellaneous

性能调优

Function	Desc
`setbufsize(size)`	设置ufunc使用的缓冲区大小
`getbufsize()`
`shares_memory(a,b[,max_work])`
`may_share_memory(a,b[,max_work])`
`byte_bounds(a)`	返回指向数组结尾的指针

Array Mixin

Function	Desc
`lib.mixins.NDArrayOperatorsMixin`	定义了所有使用`array_ufunc`特殊方法
`lib.NumpyVersion(vstring)`	解析、比较NumPy版本
`get_include()`	返回头文件目录
`deprecate(args,*kwargs)`	废弃警告
`deprecate_with_doc(msg)`
`who([vardict])`	在指定字典中打印数组
`disp(mesg[,device,linefee])`	展示信息

浮点错误处理

错误处理
- 设置硬件平台上注册的错误处理，如：除零错误
- 基于线程设置

Function	Desc
`seterr([all,divide,over,under,invalid])`	设置浮点错误处理
`seterrcall(func)`	设置浮点错误回调或log
`geterr()`	获取当前处理浮点错误的方法
`geterrcall()`	获取当前处理浮点错误回调函数
`errstate(**kwargs)`	浮点错误处理上下文
`seterrobj(errobj)`	设置定义浮点错误处理的对象
`geterrobj()`	获取定义浮点错误处理的对象

NumPy帮助

Function	Desc
`lookfor(what[,module,import_modules])`	在文档中搜索关键词
`info([object,maxwidth,output,toplevel])`	获取帮助信息
`source(object[,output])`	获取源码

Posted 2021-03-11Updated 2021-03-11Python / Numpy2 minutes read (About 327 words)

Numpy 附加库

财金

Function	Desc
`fv(rate,nper,pmt,pv[,when])`	未来值
`pv(rate,nper,pmt[,fv,when])`	现值
`npv(rate,values)`	净现值
`pmt(rate,nper,pv[,fv,when])`	等额本息，每期付款
`ppmt(rate,per,nper,pv[,fv,when])`	等额本息中第`per`期本金
`ipmt(rate,per,nper,pv[,fv,when])`	等额本息中第`per`期利息
`irr(values)`	内部收益率
`mirr(values,finance_rate,reinvest_rate)`	考虑期内再融资成本`finance_rate`、收益再投资收益`reinvest_rate`
`nper(rate,pmt,pv[,fv,when])`	每期付款
`rate(nper,pmt,pv,fv[,when,guess,tol,...])`	每期间的利率

参数说明
- pv：现值
- fv：未来值
- when：期初或期末付款
  - 0/end
  - 1/begin
- pmt：Payment，每期付款
- ppmt：Principle of Payment，每期付款中本金
- ipmt：Interest of Payment，每期付款中利息
值说明
- 正值：收入
- 负值：支出

Histogram

Function	Desc
`histogram(a[,bins,range,normed,weights,...])`
`histogram2d(x,y[,bins,range,normed,weights,...])`
`histogramdd(sample[,bins,range,normed,weights,...])`
`bincount(x[,weights,minlength])`
`histogram_bin_edges(a[,bin,range,weights])`
`digitize(x,bins[,right])`

Set

Operation

Routine	Function Version
`in1d(ar1,ar2[,assume_unique,invert])`	是否包含，始终返回1维数组
`isin(element,test_element[,...])`	保持`element`shape返回
`intersect1d(ar1,ar2[,assume_unique,...])`	交集
`union1d(ar1,ar2[,assume_unique,...])`	并集
`setdiff1d(ar1,ar2[,assume_unique,...])`	`ar1`-`ar2`
`setxor1d(ar1,ar2[,assume_unique,...])`	差集

Unique

Routine	Function Version
`unique(ar[,return_index,return_inverse,return_counts,axis])`	返回唯一值

Posted 2021-03-11Updated 2021-03-11Python / Numpy8 minutes read (About 1151 words)

Numpy 索引

索引、切片

基本切片、索引

基本切片[Slice]start:stop:step（基本同原生类型切片）
- start、stop负值时，按维度长取正模
- step>0时，start缺省为0、stop缺省为维度长N
- step<0时，start缺省为N-1、stop缺省为-N-1
- stop、start可以超过维度长N
Ellipsis/...：放在切片中表示选择所有
- ...存在的场合，结果总是数组而不是数组标量，即使其没有大小
np.newaxis/None：为切片生成数组在所在位置添加长度为 1的维度
切片可以用于设置数组中的值

基本切片可认为是依次对各维度切片，若靠前维度为索引，则可以把靠前维度独立出来

基本切片生成的所有数组始终是原始数组的视图，也因此存在切片引用的数组内存不会被释放

注意：基本索引可用于改变数组的值，但是返回值不是对数组中对应值的引用

高级索引

选择对象为以下类型时会触发高级索引
- 非元组序列
- ndarray（整形或boolean类型）
- 包含至少一个序列、ndarray（整型或boolean类型）的元组
高级索引总是返回数据的副本
- 高级索引结果不能保证任何内存布局

整数索引

整数索引X[obj]允许根据其各维度索引选择数组X任意元素
- 各整数索引（数组）表示对应维度的索引
- 各维度索引迭代、连接得到各元素位置：zip(obj*)
- 索引维数小于数组维数时，以子数组作为元素（可以理解为索引和数组高维对齐后广播）
整数索引结果shape由obj中各维度索引shape决定
- 整数索引obj中各维度索引数组会被广播
  - 各维度索引shape可能不同
  - 为保证各维度索引能正常迭代选取元素，各维度索引 shape需要能被广播、符合广播要求
- 则高级索引出现场合
  - “普通索引（标量值）”不存在，必然被广播
  - 切片能够共存
切片（包括np.newaxis）和高级索引共存时
- 高级索引特点导致其结果维度不可割
  - “标量索引”本应削减该维度
  - 而高级索引整体（广播后）决定唯一shape
- 高级索引结果维度应整体参与结果构建
  - 高级索引被切片分割：高级索引结果维度整体提前
  - 高级索引相邻：高级索引结果维度填充至该处

高级索引操作结果中无元素，但单个维度索引越界的错误未定义

高级索引结果内存布局对每个索引操作有优化，不能假设特定内存顺序

X = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])
rows = [0, 3]
cols = [0, 2]
 # 整数索引
X[np.ix_(rows, cols)]
 # 整数索引数组
X[[[1,2],[2,1]],:]
X.take([[1,2],[2,1]], axis=0)

Boolean索引

Boolean索引obj选择其中True处位置对应元素
- 索引obj维数较数组X小，直接抽取子数组作为元素（可以理解为索引和数组高维对齐后广播）
- 索引obj在超出数组X.shape范围处有True值，会引发索引错误
- 索引obj在X.shape内未填充处等同于填充False
Boolean索引通过.nonezero方法转换为高级整数索引实现
- Boolean索引等价于True数量长的1维整数索引
  - X[..,bool_obj,..]等价于 X[..,bool_obj.nonzero(),..]
  - Boolean索引总是削减对应索引，展开为1维
- Boolean索引、高级整数索引共同存在场合行为诡异
  - Boolean索引转换为等价的整数索引
  - 整数索引需要广播兼容转换后整数索引
  - 整数索引、转换后整数索引整体得到结果

索引obj和数组X形状相同计算速度更快

字段名称形式访问

ndarray中元素为结构化数据类型时，可以使用字符串索引访问
- 字段元素非子数组时
  - 其shape同原数组
  - 仅包含该字段数据
  - 数据类型为该字段数据类型
- 字段元素为子数组时
  - 子数组shape会同原数组shape合并
- 支持字符串列表形式访问
  - 返回数组视图而不是副本（Numpy1.6后）

Posted 2021-02-20Updated 2021-02-20Python / Numpy8 minutes read (About 1242 words)

Universal Functions

UFunc：在数组上执行逐元素运算函数
- 支持广播、类型映射等
- 可视为是函数的向量化包装
- 基本ufunc在标量上执行操作，更泛化的ufunc也可以在以子数组为基本元素进行操作
numpy中的ufunc是np.ufunc的实例
- 许多内建的ufunc是通过C编译实现的
- 可以通过np.frompyfunc工厂方法自定义ufunc实例
- numpy中包含超过60种ufunc
  - 部分ufunc在相关运算标记调用时，会被自动调用

内部缓冲

Internal Buffers
- 用于数据非对齐、数据交换、数据类型转换场合
- .setbufsize(size)：基于线程设置内部缓冲，缺省为 10,000元素

类型转换规则

各ufunc内部维护列表，给出适用的输入类型（组合）、相应的输出类型（可通过.types属性查看）
当ufunc内部列表中没有给定的输入类型组合时，则需要进行safely类型转换（可通过np.can_cast函数判断）
- "S", "U", "V"类型不能支持ufunc运算
- 标量-数组操作使用不同类型转换规则确保标量不会降低数组精度，除非标量和数组属于同一类型体系

UFunc维度说明

core dimension：核心维度，ufunc执行操作所在的维度
- 核心维度一般使用元组表示
  - 对一般ufunc：核心维度为空元组
  - 对广义ufunc：核心维度为非空元组、空元组
- signature：签名，包含ufunc涉及的输出操作数和输出操作数的核心维度字符串，如：(i,),(j,)->()
- 签名中各输入操作数的对应核心维度大小必须相同，移除后剩余的循环维度共同广播，加上输出操作数的核心维度得到输出结果shape
loop dimension：循环维度，除核心维度之外的维度

这些术语来自Perl Vector Library

https://numpy.org/doc/1.17/reference/c-api.generalized-ufuncs.html

UFunc原型

NDA = def numpy.<ufunc>(
	x1 [,x2], /,
	[out1, out2,], out, *,
	where=True,
	casting="same_kind",
	order="K",
	dtype=None,
	subok=True,
	[signature, extobj]
)

where=True/False/Array[bool]
- 此参数不用于对子数组做操作的广义ufunc
keepdims=False/True
- 对广义ufunc，只在输入操作数上有相同数量核心维度、输出操作数没有核心维度（即返回标量）时使用
axes=tuple/int
- 含义：广义ufunc执行操作、存储结果所在的轴序号
  - [tuple]：各元组为各输入操作数应被执行操作、输出操作数存储结果的轴的序号
  - [int]：广义ufunc在1维向量上执行操作时，可以直接使用整形
- 若广义ufunc的输出操作数均为标量，可省略其对应元组
axis=int
- 含义：广义ufunc执行操作所在的single轴序号
  - int：广义ufunc在相同的轴axis上执行操作，等价于axes=[(axis,),(axis,),...]
signature=np.dtype/tuple[np.dtype]/str
- 含义：指示ufunc的输入、输出的数据类型，
- 对于底层计算1维loop，是通过比较输入的数据类型，找到让所有输入都能安全转换的数据类型
  - 此参数允许绕过查找，直接指定loop
- 可通过ufunc.types属性查看可用的signature列表
extobj=list
- 含义：指定ufunc的缓冲大小、错误模式整数、错误处理回调函数
  - list：长度为1、或2、或3的列表
- 默认这些值会在对应线程字典中查找，此参数可以通过更底层的控制
  - 可优化在小数组上大量ufunc的调用

部分参数含义通用，参见README

UFunc属性

Attr	Desc
`ufunc.nin`	输入数量
`ufunc.nout`	输出数量
`ufunc.nargs`	参数数量
`ufunc.ntypes`	类型数量
`ufunc.types`	input->output列表
`ufunc.identity`	标志值
`ufunc.signature`	广义ufunc执行操作所在的核心元素的定义

UFunc方法

Method	Desc
`ufunc.reduce(a[,axis,dtype,out,...])`	通过沿轴应用ufunc缩减维度
`ufunc.accumulate(array[,axis,dtype,out])`	累加所有元素的计算结果
`ufunc.reduceat(a,indice[,axis,dtype,out])`	在single轴指定切片上执行reduce
`ufunc.outer(A,B,**kwargs)`	在分属`A,B`的元素对上应用ufunc
`ufunc.at(a,indices[,b])`	在`indices`处在位无缓冲执行操作

所有ufunc都有4个方法，但是这些方法只在标量ufunc、包含2输入参数、1输出参数里有价值，否则导致ValueError

UFunc相关函数

Function	Desc
`apply_along_axis(func1d,axis,arr,*args,...)`	沿给定轴应用函数
`apply_over_axes(func,a,axes)`	依次沿给定轴应用函数`func(a,axis)`
`frompyfunc(func,nin,nout[,identity])`	创建ufunc，指定输入、输出数量
`vertorize(pyfunc[,otypes,doc,excluded,cache,signature])`	创建ufunc，较`frompyfunc`提供更多特性
`piecewise(x,condlist,funclist,args,*kw)`	按照`condlist`中索引，对应应用`funclist`中函数

Posted 2021-02-19Updated 2021-02-19Python / Numpy5 minutes read (About 801 words)

NDArray开发

NDArray Interface/Protocol

数组接口（规范）：为重用数据缓冲区设计的规范
- 接口描述内容
  - 获取ndarray内容的方式
  - 数组需为同质数组，即其中各元素数据类型相同
- 接口包含C和Python两个部分
  - Python-API：对象应包含属性__array_interface__字典
  - C-API：结构体__array_struct__

https://www.numpy.org.cn/en/reference/arrays/interface.html#python-side

Python API

__array_interface__：由3个必须字段和5个可选字段构成

shape：各维度长度（使用时注意取值范围）
typestr：指明同质数组数据类型的字符串
- 格式、含义基本同Array-Protocol，但有部分字符含义不同
- 但不同于自定义数据类型字符串，不指定结构化数据、 shape，非基本类型就是void，具体含义由descr 给出
|代码|类型| |——-|——-| |'t'|bit| |'b'|boolean| |'B'|unsigned byte| |'i'|(signed) integer| |'u'|unsigned integer| |'f'|floating-point| |'c'|complex-floating point| |'m'|timedelta| |'M'|datetime| |'O'|(Python) objects| |'S'/'a'|zero-terminated bytes (not recommended)| |'U'|Unicode string| |'V'|raw data (void)|
descr：给出同质数组中各元素中内存布局的详细描述的列表
- 各元素为包含2、3个元素的元组
  - 名称：字符串、或(<fullname>,<basicname>) 形式的元组
  - 类型：描述基础类型字符串、或嵌套列表
  - shape：该结构的重复次数，若没有给出则表示无重复
- 一般此属性在typestr为取值为V[0-9]+时使用，要求表示的内存字节数相同
- 缺省为[(''), typestr]
data：给出数据位置的2元素元组或暴露有缓冲接口的对象
- 元组首个元素：表示存储数组内容的数据区域，指向数据中首个元素（即offset被忽略）
- 元素第二个元素：只读标记
- 缺省为None，表示内存共享通过缓冲接口自身实现，此时offset用于指示缓冲的开始
strides：存储各维度跃迁的strides的元组
- 元组各元素为各维度跃迁字节数整形值，注意取值范围
- 缺省为None，C-contiguous风格
mask：指示数据是否有效的暴露有缓冲接口的对象
- 其shape需要同原始数据shape广播兼容
- 缺省为None，表示所有数据均有效
offset：指示数组数据区域offset的整形值
- 仅在数据为None或为buffer对象时使用
- 缺省为0
version：指示接口版本

C API

__array_struct__：ctype的PyCObject，其中voidptr 指向PyArrayInterface

PyCObject内存空间动态分配
PyArrayInterface有相应的析构，访问其之后需要在其上调用Py_DECREF

typedef struct{
	int two;				// 值为2，sanity check
	int nd;					// 维数
	char typekind;			// 数组中数据类型
	int itemsize;			// 数据类型size
	int flags;				// 指示如何解释数据的标志
							// 5bits指示数据解释的5个标志位
								// `CONTIGUOUS`	0x01
								// `FROTRAN`	0x02
								// `ALIGNED`	0x100
								// `NOTSWAPPED` 0x200
								// `WRITABLE`	0X400
							// 1bit指示接口解释（是否包含有效`descr`字段）
								// `ARR_HAS_DESCR` 0x800
	Py_intptr_t *shape;		// shape
	Py_intptr_t *strides;	// strides
	void *data;				// 指向数组中首个元素
	PyObject *descr;		// NULL或数据描述（需设置`flags`中的`ARR_HAS_DESCR`，否则被忽略）
} PyArrayInterface;

Posted 2021-02-18Updated 2021-02-18Python / Numpy21 minutes read (About 3088 words)

NDArray子类

子类相关钩子属性、方法

`array`方法

class.__array_ufunc__(ufunc, method, *inputs, **kwargs)
- 功能：供自定义以覆盖numpy中ufunc行为
  - 返回操作结果，或NotImplemented （将此方法置None）
- 参数
  - ufunc：被调用的ufunc对象
  - method：字符串，指示调用的ufunc对象的方法
  - inputs：ufunc顺序参数元组
  - kwargs：ufunc关键字参数字典
- Ufunc、与__array_ufunc__关系参见ufunc部分
class.__array_function__(func,types,args,kwargs)
- 参数
  - func：任意callable，以func(*args, **kwargs) 形式调用
  - types：来自实现`
  - args、kwargs：原始调用的参数
class.__array__finalize(obj)
- 功能：构造之后更改self的属性
  - 在为obj类型数组分配空间时调用
- 参数
  - obj：ndarray子类
class.__array_prepare__(array,context=None)
- 功能：在ufunc计算前，将ouput数组转换为子类实例、更新元数据
  - 调用任何ufunc前，在最高优先级的input数组，或指定的output数组上调用，返回结果传递给ufunc
  - 默认实现：保持原样
class.__array_wrap__(array,context=None)
- 功能：在将结果返回给用户前，将output数组转换为子类实例、更新元信息
  - ufunc计算结果返回给用户前，在最高优先级的output 数组、或指定output对象上调用
  - 默认实现：将数组转换为新
class.__array__([dtype])
- 功能：若output对象有该方法，ufunc结果将被写入其返回值中

若ufunc中所有__array_ufunc__返回NotImplemented，那么 raise TypeError

`array`属性

class.__array_priority__
- 功能：决定返回对象的数据类型（有多种可能性时）
  - 默认值：0.0

Matrix

`np.matrix`

Matrix对象：继承自ndarray，具有ndarray的属性、方法

Matrix对象的特殊行为
- 维数始终为2
  - .ravel()仍然二维
  - item selection返回二维对象
- 数学操作
  - 覆盖乘法为矩阵乘法
  - 覆盖幂次为矩阵幂次
- 属性
  - 默认__array_priority__为10.0

Matrix类被设计用于与scipy.sparse交互，建议不使用

np.mat是np.matrix别名

Matrix对象property属性

Property	Desc
`matrix.T`	转置
`matrix.H`	复数共轭
`matrix.I`	逆矩阵
`matrix.A`	返回`ndarray`

Matrix创建

Routine	Desc
`np.mat(data[,dtype])`	创建矩阵
`np.matrix(data[,dtype,copy])`	不建议使用
`np.asmatrix(data[,dtype])`	将数据转换为矩阵
`np.bmat(obj[,ldict,gdict])`	从字符串、嵌套序列、数组中构建

mp.bmat：可使用Matlab样式字符串表示法创建Matrix
- 空格分割列
- ;分割行

`np.matlib`

numpy.matlib模块中包含numpy命名空间下所有函数
- 返回matrix而不是ndarray
- matrix被限制为小于2维，会改变形状的函数可能无法得到预期结果

np.matlib是为了方便矩阵运算的模块

`np.char`

`np.chararray`

np.chararray类：string_、unicode_数据类型的增强型数组，继承自ndarray
- 继承由Numarray引入的特性：项检索和比较操作中，数组元素末尾空格被忽略
- 定义有基于元素的+、*、%的操作
- 具有所有标准string、unicode方法，可以逐元素执行

Routine	Function Version
`char.array(obj[,itemsize,...])`
`char.asarray(obj[,itemsize,...])`	转换输入为`chararray`，必要时复制数据
`chararray(shape[,itemsize,unicode,...])`	不应直接使用此构造函数

np.chararray类是为了后向兼容Numarray，建议使用 object_、string_、unicode_类型的数组替代，并利用 numpy.char模块的自由函数用于字符串快速向量化操作

NDArray Char Routine

np.char/np.core.defchararray模块为np.string_、 np.unicode_类型的数组提供向量化的字符串操作
- 基于标准库中string、unicode的方法

字符串操作

Routine	Function Version
`char.add(x1,x2)`
`char.multiply(a,i)`
`char.mod(a,values)`	`%`格式化（`str.__mod__`为`%`调用方法）
`char.capialize(a)`	首字符大写
`char.title(a)`	单词首字符大写
`char.center(a,width[,fillchar])`	`a`居中、`fillchar`填充字符串
`char.ljust(a,width(,fillchar))`	`a`靠左
`char.rjust(a,width(,fillchar))`	`a`靠左
`char.zfill(a,width)`	`0`填充左侧
`char.char.decode(a[,encoding,errors])`
`char.char.encode(a[,encoding,errors])`
`char.char.expandtabs(a[,tabsize])`	替换tab为空格
`char.join(sep, seq)`
`char.lower(a)`
`char.upper(a)`
`char.swapcase(a)`
`char.strip(a[,chars])`
`char.lstrip(a[,chars])`
`char.rstrip(a[,chars])`
`char.partition(a,sep)`	从左至右切分一次，返回三元组
`char.rpartition(a,sep)`	从右至左切分一次
`char.split(a[,sep,maxsplit])`	从左至右切分`maxsplit`次，返回列表
`char.rsplit(a[,sep,maxsplit])`
`char.splitlines(a[,keepends])`	切分行，即`\n`为切分点
`char.replace(a,old,new[,count])`

Camparison

Function	Desc
`equal(x1,x2)`
`greater(x1,x2)`
`less(x1,x2)`
`not_equal(x1,x2)`
`greater_equal(x1,x2)`
`less_equal(x1,x2)`
`compare_chararrays(a,b,com_op,rstrip)`	`com_op`指定比较方法

字符串信息

Function	Desc
`count(a,sub[,start,end])`	统计不重叠`sub`出现次数
`startwith(a,prefix[,start,end])`
`endswith(a,suffix[,start,end])`
`find(a,sub[,start,end])`	返回首个`sub`位置，不存在返回`-1`
`rfind(a,sub[,start,end])`	从右至左`find`
`index(a,sub[,start,end])`	同`find`，不存在`ValueError`
`rindex(a,sub[,start,end])`	从右至左`index`
`isalpha(a)`
`iaalnum(a)`
`isdecimal(a)`
`isdigit(a)`
`islower(a)`
`isnumeric(a)`
`isspace(a)`
`istitle(a)`	是否各单词首字符大写
`isupper(a)`
`str_len(a)`

`np.rec`

np.rec/np.core.records

`np.recarray`

np.recarray类：允许将结构化数组的字段作为属性访问

Routine	Function Version
`np.recarray`	创建允许属性访问字段的`ndarray`
`np.record`	允许使用属性查找字段的数据类型标量

Record Arrays

Routine	Function Version
`core.records.array(obj[,dtype,shape,...])`	从多类型对象中创建
`core.records.fromarrays(arrayList[,dtype,...])`	从数组列表中创建
`core.records.fromrecords(recList[,dtype])`	从文本格式的records列表创建
`core.records.fromstring(datastring[,dtype,...])`	从二进制数据字符串中创建只读
`core.records.fromfile(fd[,dtype,shape,...])`	从二进制文件中创建

`np.ma`

`ma.MaskedArray`

ma.MaskedArray：掩码数组，是np.ma核心，ndarray子类
- ma.MaskedArray由标准np.ndarray和掩码组成
掩码数组.mask
- 掩码可以被设置为hardmask、softmask，由只读属性 hardmask指定
  - hardmask：无法修改被遮蔽值
  - softmask：可修改被遮蔽值，并恢复被遮蔽状态
- .mask可以被设置
  - 为bool数组，指示各位置元素是否被遮蔽
  - ma.maskded/ma.unmask/True/False，设置掩码数组整体是被遮蔽

ma.nomask是np.bool_类型的False，ma.masked是特殊常数

ma.MaskType是np.bool_别名

https://www.numpy.org.cn/reference/arrays/maskedarray.html

https://www.numpy.org.cn/reference/routines/ma.html

属性

Attr	Desc
`.hardmask`	硬掩码标志
`.data`	值数组
`.mask`	掩码数组、`ma.unmask`、`ma.masked`
`.recordmask`	项目中命名字段全遮蔽则遮蔽

创建掩码数组

Routine	Function Version	Method Version
`ma.MaskedArray(data[,mask,dtype,...])`	类	无
`ma.masked_array(data[,mask,dtype,...])`	`MaskedArray`别名	无
`ma.array(data[,dtype,copy,...])`	构造函数	无
`ma.frombuffer(buffer[,dtype,count,offset])`		无
`ma.fromfunction(function,shape,dtype)`		无
`ma.fromflex(fxarray)`	从有`_data`、`_mask`字段的结构化`fxarray`中创建	无
`copy(a[,order])`

Ones and Zeros

Routine	Function Version
`ma.empty(shape[,dtype,order])`	无初始化
`ma.empty_like(prototype[,dtype,order,subok,...])`	shape、类型同`prototype`
`ma.ones(shape[,dtype,order])`
`ma.zeros(shape[,dtype,order])`
`ma.masked_all(shape[,dtype])`	所有元素被屏蔽
`ma.masked_all_like(shape[,dtype])`

MaskedArray Routine

np.ma模块下的函数、ma.MaskedArray方法和ndarray 类似，但行为可能不同
- np命名空间下部分函数（hstack等）应用在 MaskedArray上
  - 操作时忽略mask（即会操作被遮罩元素）
  - 返回结果中mask被置为False

这里仅记录ma模块中额外、或需额外说明部分

数组检查

Routine	Function Version	Method Version
`ma.all(a[,axis,out,keepdims])`	全遮蔽时返回`ma.masked`
`ma.any(a[,axis,out,keepdims])`	存在遮蔽时返回`ma.masked`
`ma.count(arr,[axis,keepdims])`	沿给定轴统计未被遮罩元素数量
`ma.count_masked(arr,[axis])`	沿给定轴统计被遮罩元素数量
`ma.nonzero(a)`	非0、未屏蔽元素索引
`ma.is_mask(m)`	是否为标准掩码数组	无
`ma.is_masked(x)`	是否包含遮蔽元素

获取、创建、修改掩码

Routine	Function Version	Method Version
`ma.getmask(a)`	返回掩码、或`ma.nomask`、`ma.masked`	`.mask`属性
`ma.getmaskarray(arr)`	返回掩码、或完整`False`数组	无
`ma.make_mask(m[,copy,shrink,dtype])`	从数组创建掩码	无
`ma.make_mask_none(newshape[,dtype])`	创建给定形状掩码	无
`ma.make_mask_descr(ndtype)`	为给定类型的创建掩码类型	无
`ma.mask_rowcols(a[,axis])`	遮蔽包含遮蔽元素的`axis`方向分量	无
`ma.mask_rows(a[,axis])`	缺省为`0`的`mask_rowcols()`	无
`ma.mask_cols(a[,axis])`	缺省为`1`的`mask_rowcols()`	无
`ma.mask_mask_or(m1,m2[,copy,shrink])`	掩码或	无
`ma.harden_mask(a)`
`ma.soften_mask(a)`
`.shrink_mask()`	无	尽可能缩小掩码
`.share_mask()`	无	复制掩码，并设置`sharedmask=False`

获取、创建索引

索引非结构化掩码数组
- mask为False：返回数组标量
- mask为True：返回ma.masked
索引结构化掩码数组
- 所有字段mask均为False：返回np.void对象
- 存在字段mask为True：返回零维掩码数组
切片
- .data属性：原始数据视图
- .mask属性：ma.nomask或者原始mask视图

Routine	Function Version	Method Version
`ma.nonzero(a)`	未屏蔽、非0元素索引
`ma.mr_[]`	沿第1轴concate切片、数组、标量，类`np.r_[]`	无
`ma.flatnotmasked_contiguous(a)`	展平后未遮蔽切片	无
`ma.flatnotmasked_edges(a)`	展平后首个、末个未遮蔽位置	无
`ma.notmasked_contiguous(a[,axis])`	沿给定轴，未遮蔽切片	无
`ma.notmasked_edges(a[,axis])`	沿给定轴，首个、末个未遮蔽位置	无
`ma.clump_masked(a)`	展平后遮蔽切片	无
`ma.clump_unmasked(a)`	展位后未遮蔽切片	无

ma.mr_[]类似np.r_[]，但np.r_[]返回结果掩码被置为 False，而ma.mr_[]同时也操作掩码

获取、修改值

仅访问有效数据
- 对掩码mask取反作为索引~X.mask
- 使用.compressed方法得到一维ndarray
1
2
print(X[~X.mask])
print(X.compressed())
访问数据
- 通过.data属性：可能是ndarray或其子类的视图
  - 等同于直接在掩码数组上创建ndarray或其子类视图
- __array__方法：ndarray
- 使用ma.getdata函数

Routine	Function Version	Method Version
`ma.getdata(a[,subok])`	返回掩码数组数据	`.data`属性
`ma.fix_valid(a[,mask,copy,fill_value])`	替换`a`中无效值，并遮盖	无
`ma.masked_equal(x,value[,copy])`		无
`ma.masked_greater(x,value[,copy])`		无
`ma.masked_greater_equal(x,value[,copy])`		无
`ma.masked_inside(x,v1,v2[,copy])`		无
`ma.masked_outside(x,v1,v2[,copy])`		无
`ma.masked_invalid(x[,copy])`		无
`ma.masked_less(x,value[,copy])`		无
`ma.masked_less_equal(x,value[,copy])`		无
`ma.masked_not_equal(x,value[,copy])`		无
`ma.masked_values(x,value[,rtol,atol,...])`		无
`ma.masked_object(x,value[,copy,shrink])`	类`masked_values`，适合值类型为`object`时	无
`ma.masked_where(condition,a[,copy])`	按`condition`遮蔽指定值	无

其他属性、方法

Routine	Function Version	Method Version
`ma.common_fill_value(a,b)`	若`a,b`填充值相同则返回，否则返回`None`	无
`ma.default_fill_value(obj)`	默认填充值	无
`ma.maximum_fill_value(obj)`	对象类型决定的最大值	无
`ma.minimum_fill_value(obj)`		无
`ma.sef_fill_value(a,fill_value)`
`.get_fill_value()`/`.fill_value`	无
`ma.allequal(a,b[,fill_value])`	若`a,b`元素均相等，则使用`fill_value`填充

`np.ma`运算

掩码数组支持代数、比较运算
- 被遮蔽元素不参与运算，元素在运算前后保持不变
- 掩码数组支持标准的ufunc，返回掩码数组
  - 运算中任意元素被遮蔽，则结果中相应元素被遮蔽
  - 若ufunc返回可选的上下文输出，则上下文会被处理，且无定义结果被遮蔽
np.ma模块中对大部分ufunc有特别实现
- 对于定义域有限制的一元、二元运算，无定义的结果会自动mask
1
ma.log([-1, 0, 1, 2])

Routine	Function Version	Method Version
`ma.anom(a[,axis,dtype])`	沿给定轴计算与算数均值的偏差

`np.memmap`

np.memmap：内存映射文件数组，使用内存映射文件作为数组数据缓冲区
- 对大文件，使用内存映射可以节省大量资源
方法

|Method|Desc| |——-|——-| |np.memmap(filename[,dtype,mode,shape])|创建存储在磁盘文件的内存映射数组| |np.flush()|flush内存数据至磁盘|

标准容器类

np.lib.user_array.container
- 为向后兼容、作为标准容器类而引入
- 其中self.array属性是ndarray
- 比ndarray本身更方便多继承
类、方法、函数

|Method|Desc| |——-|——-| |np.lib.user_array.container(data[,...])|简化多继承的标准容器类|

Posted 2021-02-01Updated 2021-02-01Python / Numpy20 minutes read (About 3066 words)

NDArray标量

NDArray标量类型

numpy中定义了24种新python类型（NDArray标量类型）
- 类型描述符主要基于CPython中C语言可用的类型
标量具有和ndarray相同的属性和方法
- 数组标量不可变，故属性不可设置

numpy_dtype_hierarchy

内置标量类型

Routine	Desc
`iinfo(int_type)`	整数类型的取值范围等信息
`finfo(float_type)`	浮点类型的取值范围等信息

Python关联

NumPy类型	Python类型	64位NumPy定长类型	Desc
`int_`	继承自`int`（Python2）	`int64`
`float_`	继承自`float`	`float64`
`complex_`	继承自`complex`	`complex128`
`bytes_`	继承自`bytes`	`S#"`/`"a#"`	Python字节串
`unicode_`	继承自`str`	`"U#"`	Python字符串
`void`		`"V#"`	Python缓冲类型
`object_`	继承自`object`（Python3）	`"O"`	Python对象引用

np.bool_类似Python中bool类型，但不继承它
- Python中bool类型不允许被继承
- np.bool_大小和bool类型大小不同
np.int_不继承自int，因为后者宽度不再固定
- NumPy中数组没有真正np.int类型，因为宽度不再固定，各产品
bytes_、unicode_、void是可灵活配置宽度的类型
- 在指定长度后不能更改，赋长于指定长度的值会被截断
- unicode_：强调内容为字符串
- bytes_：强调内容为字节串
- void：类型强调内容为二进制内容，但不是字节串
object_存储的是python对象的引用而不对象本身
- 其中引用不必是相同的python类型
- 兜底类型

Python基本类型等在NumPy命名空间下都有同名别名，如： np.unicode == np.str == str

NumPy数组中数据类型无法被真正设置为int类型，为保证数组中元素宽度一致性，必然无法被设置为非定长类型

C类型关联

NumPy支持的原始类型和C中原始类型紧密相关

NumPy类型	C类型	64位定长别名	Desc	单字符代码	定长字符串代码
`bool_`	`bool`	`bool8`	存储为字节的bool值	`"?"`	无
`byte`	`signed char`	`int8`		`"b"`	`"i1"`
`short`	`short`	`int16`		`"h"`	`"i2"`
`intc`	`int`	`int32`		`"i"`	`"i4"`
`int_`	`long`	`int64`		`"l"`	`"i8"`
`longlong`	`long long`	无		`"q"`	无
`ubyte`	`unsigned char`	`uint8`		`"B"`	`"u1"`
`ushort`	`unsigned short`	`uint16`		`"H"`	`"u2"`
`uintc`	`unsigned int`	`uint32`		`"I"`	`"u4"`
`uint`	`usigned long`	`uint64`		`"L"`	`"u8"`
`ulonglong`	`unsigned long long`	无		`"Q"`	无
`half`	无	`float16`	半精度浮点：1+5+10	`"e"`	`"f2"`
`single`	`float`	`float32`	单精度浮点，通常为：1+8+23	`"f4"`
`double`	`double`	`float64`	双精度浮点，通常为：1+11+52	`"d"`	`"f8"`
`longdouble`/`longfloat`	`long double`	`float128`	平台定义的扩展精度浮点	`"g"`	`"f16"`
`csingle`	`float complex`	`complex64`	两个单精度浮点	`"F"`	`"c8"`
`cdouble`/`cfloat`	`double complex`	`complex128`	两个双精度浮点	`"D"`	`"c16"`
`clongdouble`/`clongfloat`	`long duoble complex`	`complex256`	两个扩展精度浮点	`"G"`	`"c32"`

float complex、double complex类型定义在complex.h中

C中的定长类型别名定义在stdint.h中

其他类型

Python类型	Desc	单字符代码	定长字符串代码
`timedelta64`	时间增量	`"m"`	`"m8"`
`datetime64`	日期时间	`"M"`	`"M8"`

属性、索引、方法

数组标量属性基本同ndarray
数组标量类似0维数组一样支持索引
- X[()]返回副本
- X[...]返回0维数组
- X[<field-name>]返回对应字段的数组标量
数组标量与ndarray有完全相同的方法
- 默认行为是在内部将标量转换维等效0维数组，并调用相应数组方法

定义数组标量类型

从内置类型组合结构化类型
子类化ndarray
- 部分内部行为会由数组类型替代
完全自定义数据类型，在numpy中注册
- 只能使用numpy C-API在C中定义

数据类型相关函数

数据类型信息

Function	Desc
`finfo(dtype)`	机器对浮点类型限制
`iinfo(type)`	机器对整型限制
`MachAr([float_conv,int_conv])`	诊断机器参数
`typename(char)`	对给定数据类型字符代码的说明

数据类型测试

Function	Desc
`can_cast(from_,to[,casting])`	是否可以类型转换
`issctype(rep)`	`rep`（不能为可转换字符串）是否表示标量数据类型
`issubdtype(arg1,arg2)`	`arg1`在数据类型层次中较低（即`dtype`的`issubclass`）
`issubsctype(arg1,arg2)`	同`issubdtype`，但支持包含`dtype`属性对象作为参数
`issubclass_(arg1,arg2)`	同内置`issubclass`，但参数非类时仅返回`False`，而不是`raise TypeError`

np.int64、np.int32在层次体系中不同、且层级一致，所以会出现issubdtype(np.int64, int) -> True，其他情况为 False

通过np.can_cast函数确定safely类型转换

def print_casting(ntypes):
	print("X")
	for char in ntypes:
		print(char, end=" ")
	print("")
	for row in ntypes:
		print(row, end=" ")
		for col in ntypes:
			print(int(np.can_cast(row, col)), end=" ")
		print("")
print_casting(np.typecodes["All"])

数据类型确定

Function	Params	ReturnType	ReturnDesc
`min_scalar_type(a)`	标量值	dtype实例	满足要求最小类型
`promote_types(type1,type2)`	dtype等	dtype实例	可安全转换的最小类型
`result_type(*array_and_dtypes)`	dtype等、标量值、数组	dtype实例	应用promotion rules得到类型
`find_common_type(array_types,scalar_types)`	dtype等列表	dtype实例	综合考虑标量类型、数组类型
`common_type(*arrays)`	数值型数组（有`dtype`属性）	预定义类型	满足要求类型中、最高精度类型
`maximum_sctype(t)`	dtype等、标量值、数组	预定义类型	满足要求类型中、最高精度类型
`obj2sctype(rep[,default])`	dtype等、标量值、数组	预定义类型	对象类型
`sctype2char(sctype)`	dtype等、标量值、数组	类型字符代码	满足要求的最小类型
`mintypecode(typechars[,typeset,default])`	dtype等、标量值、数组	类型字符代码	`typeset`中选择

除非标量和数组为不同体系内数据类型，否则标量不能up_cast 数组数据类型

https://numpy.org/devdocs/reference/generated/numpy.issubdtype.html#numpy.issubdtype

https://numpy.org/devdocs/reference/generated/numpy.issubsctype.html#numpy.issubsctype

https://numpy.org/devdocs/reference/generated/numpy.find_common_type.html#numpy.find_common_type

https://numpy.org/devdocs/reference/generated/numpy.result_type.html#numpy.result_type

https://numpy.org/devdocs/reference/generated/numpy.common_type.html#numpy.common_type

数据类型类`np.dtype`

1	class dtype(obj[,align,copy])

numpy.dtype类描述如何解释数组项对应内存块中字节
- 数据大小
- 数据内存顺序：little-endian、big-endian
- 数据类型
  - 结构化数据
    - 各字段名称
    - 各字段数据类型
    - 字段占用的内存数据块
  - 子数组
    - 形状
    - 数据类型
需numpy.dtype实例作为参数的场合，大部分场景可用等价、可转换为dtype实例的其他值代替
- python、numpy中预定义的标量类型、泛型类型
- 创建dtype实例类型的字符串、字典、列表
- 包含dtype属性的类、实例

数据类型元素

类型类

NumPy内置类型
- 24中内置数组标量类型
- 泛型类型
  
  |Generic类型|转换后类型| |——-|——-| |number,inexact,floating|float_| |complexfloating|complex_| |integer,signedinteger|int_| |unsignedinteger|uint| |character|string| |generic,flexible|void|
python内置类型，等效于相应数组标量
- 转换规则同NumPy内置数组标量类型
- None：缺省值，转换为float_
  
  |Python内置类型|转换后类型| |——-|——-| |int|int_| |bool|bool_| |float|float_| |complex|complex_| |bytes|bytes_| |str|unicode_| |unicode|unicode_| |buffer|void| |Others|object_|
带有.dtype属性的类型：直接访问、使用该属性
- 该属性需返回可转换为dtype对象的内容

可转换类型的字符串

numpy.sctypeDict.keys()中字符串
Array-protocal类型字符串，详细参见NumPy数组标量类型
- 首个字符指定数据类型
- 支持指定字节数的字符可在之后指定项目占用字节数
  - 定长类型只能指定满足平台要求的字节数
  - 非定长类型可以指定任意字节数
|代码|类型| |——-|——-| |'?'|boolean| |'b'|(signed) byte，等价于'i1'| |'B'|unsigned byte，等价于'u1'| |'i'|(signed) integer| |'u'|unsigned integer| |'f'|floating-point| |'c'|complex-floating point| |'m'|timedelta| |'M'|datetime| |'O'|(Python) objects| |'S'/'a'|zero-terminated bytes (not recommended)| |'U'|Unicode string| |'V'|raw data (void)|

结构化数据类型

Function	Desc
`format_parser(formats,names,titles[,aligned,byteorder])`	创建数据类型
`dtype(obj[,align,copy])`

结构化数据类型
- 包含一个或多个数据类型字段，每个字段有可用于访问的名称
- 父数据类型应有足够大小包含所有字段
- 父数据类型几乎总是基于void类型
仅包含不具名、单个基本类型时，数组结构会穿透
- 字段不会被隐式分配名称
- 子数组shape会被添加至数组shape

参数格式

可转换数据类型的字符串指定类型、shape
- 依次包含四个部分
  - 字段shape
  - 字节序描述符：<、>、|
  - 基本类型描述符
  - 数据类型占用字节数
    - 对非变长数据类型，需按特定类型设置
    - 对变长数据类型，指字段包含的数量
- 逗号作为分隔符，分隔多个字段
- 各字段名称只能为默认字段名称
- 对变长类型，仅设置shape时，会将其视为bytes长度
1
dt = np.dtype("i4, (2,3)f8, f4")
元组指定字段类型、shape
- 元组中各元素指定各字段名、数据类型、shape： (<field_name>, <dtype>, <shape>)
  - 若名称为''空字符串，则分配标准字段名称
- 可在列表中多个元组指定多个字段 [(<field_name>, <dtype>, <shape>),...]
- 数据类型dtype可以嵌套其他数据类型
  - 可转换类型字符串
  - 元组/列表
1
2
3
dt = np.dtype(("U10", (2,2)))
dt = np.dtype(("i4, (2,3)f8, f4", (2,3))
dt = np.dtype([("big", ">i4"), ("little", "<i4")])
字典元素为名称、类型、shape列表
- 类似format_parser函数，字典各键值对分别指定名称列表、类型列表等： {"names":...,"formats":...,"offsets":...,"titles":...,"itemsize":...}
  - "name"、"formats"为必须
  - "itemsize"指定总大小，必须足够大
- 分别指定各字段："field_1":..., "field_2":...
  - 不鼓励，容易与上一种方法冲突
1
2
3
4
dt = np.dtype({
"names": ['r', 'g', 'b', 'a'],
"formats": ["u1", "u1", "u1", "u1"]
})
解释基数据类型为结构化数据类型： (<base_dtype>, <new_dtype>)
- 此方式使得union成为可能
1
dt = np.dtype(("i4", [("r", "I1"), ("g", "I1"), ("b", "I1"), ("a", "I1")]))

属性

描述数据类型

|属性|描述| |——-|——-| |.type|用于实例化此数据类型的数组标量类型| |.kind|内置类型字符码| |.char|内置类型字符码| |.num|内置类型唯一编号| |.str|类型标识字符串|
数据大小

|属性|描述| |——-|——-| |.name|数据类型位宽名称| |.itemsize|元素大小|
字节顺序

|属性|描述| |——-|——-| |.byteorder|指示字节顺序|
字段描述

|属性|描述| |——-|——-| |.fields|命名字段字典| |.names|字典名称列表|
数组类型（非结构化）描述

|属性|描述| |——-|——-| |.subtype|(item_dtype,shape)| |.shape||
附加信息

|属性|描述| |——-|——-| |.hasobject|是否包含任何引用计数对象| |.flags|数据类型解释标志| |.isbuiltin|与内置数据类型相关| |.isnative|字节顺序是否为平台原生| |.descr|__array_interface__数据类型说明| |.alignment|数据类型需要对齐的字节（编译器决定）| |.base|基本元素的dtype|

方法

更改字节顺序

|方法|描述| |——-|——-| |.newbyteorder([new_order])|创建不同字节顺序数据类型|
Pickle协议实现

|方法|描述| |——-|——-| |.reduce()|pickle化| |.setstate()||

Datetime

Numpy种时间相关数据类型
- 支持大量时间单位
- 基于POSIX时间存储日期时间
- 使用64位整形存储值，也由此决定了时间跨度

https://www.numpy.org.cn/reference/arrays/datetime.html

`np.datetime64`

np.datetime64表示单个时刻
- 若两个日期时间具有不同单位，可能仍然代表相同时刻
- 从较大单位转换为较小单位是安全的投射

创建

创建规则
- 内部存储单元自动从字符串形式中选择单位
- 接受"NAT"字符串，表示“非时间”值
- 可以强制使用特定单位

基本方法：ISO 8601格式的字符串

1 2	np.datetime64("2020-05-23T14:23") np.datetime64("2020-05-23T14:23", "D")

从字符串创建日期时间数组

1
2
3

np.array(["2020-01-23", "2020-04-23"], dtype="datetime64")
np.array(["2020-01-23", "2020-04-23"], dtype="datetime64[D]")
np.arange("2020-01-01", "2020-05-03", dtype="datetime64[D]")

np.datetime64为向后兼容，仍然支持解析时区

`np.timedelta64`

np.timedelta64：时间增量

np.timedelta64是对np.datetime64的补充，弥补Numpy对物理量的支持

创建

创建规则
- 接受"NAT"字符串，表示“非时间”值数字
- 可以强制使用特定单位
直接从数字创建
1
np.timedelta64(100, "D")
从已有np.timedelta64创建，指定单位
- 注意，不能将月份及以上转换为日，因为不同时点进制不同
1
np.timedelta(a, "M")

运算

np.datetime64可以和np.timedelta64联合使用

1 2	np.datetime64("2020-05-14") - np.datetime64("2020-01-12") np.datetime64("2020-05-14") + np.timedelta64(2, "D")

NDArray

1	class ndarray(shape[,dtype,buffer,offset])

ndarray：具有相同类型、大小（固定大小）项目的多维容器
- ndarray由计算中内存连续的一维段组成，并与将N个整数映射到块中项的位置的索引方案相结合
- 可以共享相同数据段，即可以是其他数据区的视图
  - 另一个ndarray
  - 实现buffer的对象
属性
- shape：指定尺寸、项目数量
- dtype（data-type object）：指定项目类型
- strides：存储各维度步幅，用于计算连续数据段中偏移

https://www.numpy.org.cn/reference/arrays/ndarray.html /https://www.numpy.org.cn/reference/arrays/ndarray.html

Broadcast 广播规则

Broadcasting：4条广播规则用于处理不同shape的数组

非维数最大者在shape前用1补足
输出的shape中各维度是各输入对应维度最大值
各输入的维度同输出对应维度相同、或为1
输入中维度为1者，对应的（首个）数据被用于沿该轴的所有计算（即对应的stride为0，ufunc不step along该维度）

shape(3, 2, 2, 1) + shape(1, 3)
	-> shape(3, 2, 2, 1) + shape(1, 1, 1, 3)
	-> shape(3, 2, 2, 3) + shape(1, 1, 2, 3)
	-> shape(3, 2, 2, 3) + shape(1, 2, 2, 3)
	-> shape(3, 2, 2, 3) + shape(3, 2, 2, 3)

数组属性

内存布局

属性	描述
`ndarray.flags`	有关数组内存布局的信息
`ndarray.shape`	数组维度（元组）
`ndarray.strides`	遍历数组时每个维度中的字节数量（元组）
`ndarray.ndim`	数组维数
`ndarray.data`	Python缓冲区对象指向数组的数据的开头
`ndarray.size`	数组中的元素数
`ndarray.itemsize`	数组元素的长度，以字节为单位
`ndarray.nbytes`	数组元素消耗的总字节数
`ndarray.base`	如果内存来自其他对象，则为基础对象

数据类型

属性	描述
`ndarray.dtype`	元素数据类型

其他属性

属性	描述
`ndarray.T`	转置
`ndarray.real`	实数部分
`ndarray.imag`	虚数部分
`ndarray.flat`	数组的一维迭代器

数组接口

属性	描述
`__array_interface__`	数组接口python端
`__array_struct__`	数组接口C语言端

`ctypes`外部函数接口

属性	描述
`ndarray.ctypes`	简化数组和`ctypes`模块交互的对象

`np.nditer`

ndarray对象的默认迭代器是序列类型的默认迭代器
- 即以对象本身作为迭代器时，默认行为类似
  1
  2
  for i in range(X.shape[0]):
  pass

Routine	Function Version	Method Version
`nditer(op[,flags,op_flags,...])`	高性能迭代器	无
`nested_iters(op,axes[,flags,op_flags,...])`	在多组轴上嵌套创建`nditer`迭代器	无
`ndenumerate(arr)`	`(idx,val)`迭代器	无
`lib.Arrayterator(var[,buf_size])`	适合大数组的缓冲迭代
`flat`	无	返回`np.flatiter`迭代器
`ndindex(*shape)`	迭代shape对应数组的索引	无

`np.nditer`

class np.nditer(
	op,
	flags=None,
	op_flags=None,
	op_dtypes=None,
	order='K'/'C'/'F'/'A',
	casting='safe',
	op_axes=None,
	itershape=None,
	buffersize=0
)

迭代方式
- 通过标准python接口迭代数组中各数组标量元素
- 显式使用迭代器本身，访问其属性、方法
  - np.nditer[0]访问当前迭代的结果
  - np.iternext()获取下个迭代对象
包含特殊属性、方法获取额外信息（可能需设置迭代标志）
- 跟踪索引：获取索引np.nditer.index、 np.nditer.multi_index
- 手动迭代np.nditer.iternext()得到下个 np.nditer对象
- 获取操作数np.nditer.operands：迭代器关闭之后将无法访问，需要在关闭前获得引用

https://www.numpy.org.cn/reference/arrays/nditer.html

参数

flags：迭代器标志
- buffered：允许缓冲
  - 增大迭代器提供给循环内部的数据块
  - 减少开销、提升性能
- c_index：track C顺序索引
- f_index：track C顺序索引
- multi_index：track 多维索引
- common_dtype：将所有操作数转换为公共类型
  - 需设置copying或buffered
- copy_if_overlap：迭代器决定是否读操作数覆盖写操作数，还是使用临时副本避免覆盖
- delay_bufalloc：延迟缓冲区设置直至reset()函数调用
  - 允许allocate操作数在其值被复制到缓冲区前初始化
- external_loop：迭代一维数组而不是零维数组标量
  - 利于矢量化操作
  - 返回的循环块与迭代顺序相关
- grow_inner：允许迭代数组大小大于缓冲区大小
  - buffered、external_loop均设置情况下
- ranged：
- refs_ok：允许迭代引用类型，如object数组
- reduce_ok：允许迭代广播后的readwrite操作数（也即reduction操作数）
- zerosize_ok：允许迭代大小为0
op_flags
- readonly：操作数只能被读取
- readwrite：操作数能被读写
- writeonly：操作只能被写入
- no_broadcast：禁止操作数被广播
- contig：强制操作数数据连续
- aligned：强制操作数数据对齐
- nbo：强值操作数数据按原生字节序
- copy：允许临时只读拷贝
- updateifcopy：允许临时读写拷贝
- allocate：允许数组分配若op中包含None
  - 迭代器为None分配空间，不会为非空操作数分配空间，即使是广播后赋值空间不足
  - 操作数中op中None对应op_flags缺省为 ["allocate", "writeonly"]
- no_subtype：阻止allocate操作数使用子类型
- arraymask：表明对应操作数为mask数组
  - 用于从设置有writemasked标志的操作数中选择写回部分
- writemasked：只有arraymask操作数选择的元素被写回
- overlap_assume_elementwise：标记操作数只能按照迭代顺序获取
  - 允许在copy_if_overlap设置的场合，更保守的拷贝
op_dtypes：操作数需求的数据类型
- 在循环内对单个值进行数据类型转换效率低
- 迭代器以缓冲、复制整体进行类型转换提高效率
- 需要同时设置"copy"或"buffered"，否则因无法复制、缓冲报错（类型不同时）（类型转换不修改原数组值，需要额外空间存储转换后值）
order：迭代顺序
- C/F：C风格、Fortran风格
- A：若所有数组均为Fortran风格则为Fortran风格，否则为C风格
- K：尽量贴近内存布局
- allocate操作数的内存布局会兼容此参数设置
casting：指明在拷贝、缓冲时允许的数据类型转换规则（包括读取、写回数组时可能的类型转换）
- no：不允许任何类型转换
- equiv：仅允许字节顺序改变
- safe：仅允许可保证数据精度的类型转换
- same_kind：只能允许safe或同类别类型转换
- unsafe：允许所有类型转换
op_axes：设置迭代器维度到操作数维度的映射
- 需为每个操作数设置维度映射
itershape：设置迭代器的shape
buffersize：设置缓冲区大小
- buffered设置的情况下
- 0表示默认大小

使用说明

控制迭代顺序

设置order参数
缺省按照内存布局迭代
- 提高效率
- 适合不关心迭代顺序场合

# 二者迭代顺序完全相同
np.nditer(X, order="K")
np.nditer(X.T)
# 指定按C或Fortran顺序
np.nditer(X, order="C")
np.nditer(X, order="F")

修改数组值
- 设置writeonly、readwrite
  - 生成可写的缓冲区数组，并在迭代完成后复制回原始数组
  - 发出迭代结束信号，将缓冲区数据复制回原始数组
    - 支持with语句上下文管理
    - 迭代完成后手动.close()
- 可设置allocate标志支持为空操作数分配空间
  - 对None参数op，其op_flags缺省设置为 ["allocate", "readwrite"]
  1
  2
  3
  with np.nditer(X, op_flags=["readwrite"]) as it:
  for x in it:
  x[...] = 0
迭代一维数组而不是数组标量
- 缺省返回最低维维度长的一维数组
- 可以通过设置buffered扩大返回的数组长度
  - buffersize设置buffered大小，可用此参数决定返回的数组长度
  - 返回数组长度完全由buffersize决定，与数组shape 无关
    1
    2
    3
    a = np.arange(30).reshape(5,6)
    for x in np.nditer(a, flags=["external_loop", "buffered"], buffersize=11):
    print(x, type(x))

跟踪、获取索引

it = np.nditer(a, flags=["multi_index"])
while not it.finished:
	print(it[0], it.multi_index)
	it.iternext()

以特定数据类型迭代
- op_dtypes参数设置迭代返回的数据类型
- 需同时设置"copy"或"buffered"字段
1
2
for x in np.nditer(a, op_dtypes=["complex128"]):
print(np.sqrt(x), end=" ")
迭代器分配空间
- allocate标志表示允许为操作数分配空间，即允许空操作数
- 若分配空间初值被使用，注意迭代前初始化（如reduction迭代场合）
1
2
3
4
5
6
7
def square(a, ret=None):
with np.nditer([a, ret],
op_flags=[["readonly"], ["writeonly", "allocate"]]
) as it:
for x, y in it:
y[...] = x**2
return ret

外积（笛卡尔积）迭代

设置op_axes参数指定各操作数op各维度位置、顺序

迭代器负责将迭代器维度映射回各操作数维度
类似于手动自由广播

# 指定维度位置、顺序
it = np.nditer([a,b,None], flags=["external_loop"],
		op_axes=[[0,-1,-1], [-1,0,1],None])
# 迭代得到外积
with it:
	for x,y,z in it:
		z[...] = x*y
	result = it.operands[2]

Reduction迭代

触发条件：可写的操作数中元素数量小于迭代空间
- "reduce_ok"需被设置
- "readwrite"而不是"writeonly"被设置，即使循环内部未读
- 暗含"no_broadcast"必然不被设置

ret = np.array([0])
with np.nditer([a,b], flags=["reduce_ok", "external_loop"],
		op_flags=[["readonly"], ["readwrite"]]) as it:
	for x,y in it:
		y[...] += x
# 或者同样设置`allocate`标志，并且在迭代器内设置初始值
np.nditer([a, None], flags=["reduce_ok", "external_loop"],
		op_flags=[["readonly"], ["readwrite", "allocate"]],
		op_axes=[None, [0,1,-1]])
with it:
	# 设置初始值
	it.operands[1][...] = 0
	for x, y in it:
		y[...] += x
	result = it.operands[1]

`nested_iters`

nested_iters：按维度嵌套nditer

迭代参数类似nditer

i, j = np.nested_iters(X, flags=["multi_index"])
for x in i:
	print(i.multi_index)
	for y in j:
		print("", j.multi_index, y)

`flat`迭代器

X.flat：返回C-contiguous风格迭代器np.flatiter
- 支持切片、高级索引
- 实质上是数组的一维视图

`np.ndenumerate`

np.ndenumerate：多维索引迭代器，返回多维索引、值元组
1
2
for multi_idx, val in np.ndenumerate(X):
pass

`np.broadcast`

np.broadcast：返回（多个）数组广播结果元组的迭代器
- 类似广播后zip，即先将数组广播，然后将广播后元素组合成元组作为迭代器中元素
1
2
for item in np.broadcast([[1,2],[3,4]], [5,6]):
pass

Function	Desc
`np.busdaycalendar(weekmask,holidays)`	返回存储有效工作日对象
`np.busday_offset(date,offset[,roll,weekmask,holidays,busdaycal,out])`	工作日offset
`np.is_busday(date[,weekmask,holidays,busdaycal,out])`	判断是否是工作日
`np.busday_count(begindates,enddates[,weekmask,holidays,busdaycal,out])`	指定天数
`np.datetime_as_string(arr[,unit,timezone,...])`	转换为字符串数组
`np.datetime_date(dtype,/)`	获取日期、时间类型步长信息

NumPy Numeric

矩阵、向量乘积

其他

np.linalg

np.linalg

（快速）傅里叶变换np.fft

Standard FFTs

Real FFTs

Hermitian FFTs

其他

np.lib.scimath

Array Manipulation

Shape Only

Order Alteration

维数改变

插入、删除元素

改变类型

组合数组

拆分数组

Padding

Index Routine

数组无关切片、高级索引

网格

值相关索引

特殊位置索引

Searching 索引

Value Manipulation

Value Extraction

Value Modification

Sorting

Logical Test

真值测试

数组内容

类型测试

Mathmatics

UFunc初等运算

UFunc Floating函数

比较函数

UFunc比较函数

非UFunc

UFunc Bit-twiddling函数

UFunc 三角函数

基本数学

复数运算

Miscellaneous

Statistics

Count

顺序

均值、方差

相关系数

Array Creation

Ones and Zeros

Numerical Ranges

Repetition

Matrix-Relative

From Existing Data

转入、转出

类型转出

打包二进制

输入、输出

串

字符串输出格式

Data Source

Miscellaneous

性能调优

Array Mixin

浮点错误处理

NumPy帮助

财金

Histogram

Set

Operation

Unique

索引、切片

基本切片、索引

高级索引

整数索引

Boolean索引

字段名称形式访问

Universal Functions

`np.linalg`

`np.linalg`

（快速）傅里叶变换`np.fft`

`np.lib.scimath`

`array`方法

`array`属性

`np.matrix`

`np.matlib`

`np.char`

`np.chararray`

`np.rec`

`np.recarray`

`np.ma`

`ma.MaskedArray`

`np.ma`运算

`np.memmap`

数据类型类`np.dtype`