Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU上面的除法测试 快速平方根倒数+牛顿迭代 #3

Open
shuxiong opened this issue Mar 11, 2014 · 1 comment
Open

GPU上面的除法测试 快速平方根倒数+牛顿迭代 #3

shuxiong opened this issue Mar 11, 2014 · 1 comment

Comments

@shuxiong
Copy link
Collaborator

代码在branch yesx下面

精度请参考程序 https://github.com/konjac/division/blob/yesx/yesx/test.cpp
使用快速平方根倒数得到一个接近的参考值,然后使用牛顿迭代法
自己测了若干数据:
牛顿迭代发的次数,float迭代2次,double迭代3次收敛

速度请参考程序 https://github.com/konjac/division/blob/yesx/gputest-yesx/divisionflops.cu
time = 544.192322 ms

对比的程序为https://github.com/konjac/division/blob/yesx/gputest/divisionflops.cu
time = 748.940674 ms

ARCH = sm_21
实验环境
Device 0: "GeForce GT 630"
CUDA Driver Version / Runtime Version 5.5 / 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073283072 bytes)
( 2) Multiprocessors x ( 48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock rate: 1620 MHz (1.62 GHz)
Memory Clock rate: 667 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GT 630

@shuxiong
Copy link
Collaborator Author

GeForce GT 630

time = 29.137247 ms, Gflops=0.14740

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant