Skip to content

NVIDIA GeForce RTX 4090

WolframRhodium edited this page May 1, 2024 · 8 revisions

Ada, AD102, 16384 shaders, PCIe 4.0 x16

Thanks to @MysteryDove

Benchmark

vsmlrt v14.test4

  • processor clock @ 2520 MHz
  • driver 551.86
  • Windows 10 21H2 (19044.1415)
  • Python 3.11.3
  • vapoursynth-classic R57.A8
  • vapoursynth-plugin v0.96g3

TRT FP16

1920x1080 rgbs, CUDA graphs enabled, fp16, max_aux_streams=0

Measurements: FPS / Device Memory (MB)

general
model 1 stream 2 streams 3 streams
dpir gray 22.05 / 1818.796 25.30 / 3111.114 25.33 / 4403.488
dpir color 18.30 / 1851.632 25.13 / 3176.808 25.17 / 4501.984
waifu2x upconv_7_{anime_style_art_rgb, photo} 20.45 / 2148.716 41.22 / 3867.240 61.21 / 5585.764
waifu2x upresnet10 17.91 / 1716.588 34.53 / 2941.540 42.33 / 4166.492
waifu2x cunet / cugan 13.89 / 4391.292 25.74 / 8346.248 25.96 / 12301.202
waifu2x swin_unet 4.62 / 7436.692 5.43 / 14426.812 5.43 / 21412.840
real-esrgan (v2/v3, xsx2) 17.06 / 1087.844 33.41 / 1778.264 38.26 / 2468.684
scunet gray 5.29 / 3590.320 5.40 / 6678.768 5.40 / 9767.208
scunet color 5.13 / 3555.568 5.48 / 6611.308 5.47 / 9667.048
swinir-s (2x, color) 1.63 / 15897.048 N/A N/A
swinir-m* (2x, color, 720p) 1.05 / 11305.268 N/A N/A
swinir-l* (4x, color, 720p) 0.61 / 15391.316 N/A N/A

*: swinir-m and swinir-l models exhibit overflow issues.

rife

v2, fp16 i/o

version 1 stream 2 streams 3 streams 4 streams 5 streams
v4.4-v4.5 136.92/778.432 273.80/1149.204 414.80/1522.028 553.70/1892.796 574.31/2263.568
v4.6 136.01/800.960 275.26/1192.212 411.01/1585.516 544.30/1979.764 550.01/2368.020
v4.7-v4.9 98.20/1302.724 195.78/2187.548 210.12/3074.420 210.45/3957.196 210.66/4844.068
v4.10-v4.15 84.41/1595.592 160.93/2773.280 161.96/3953.020 162.04/5132.760 162.07/6310.448
{v4.12, v4.13, v4.15, v4.16}_lite 93.39/1333.444 187.32/2255.132 197.71/3178.872 198.01/4098.508 197.95/5022.248
v4.14 lite 81.83/1595.292 153.40/2779.424 154.19/3963.260 154.28/5149.140 154.30/6332.980

vsmlrt v14.test3

  • processor clock @ 2520 MHz
  • driver 551.86
  • Windows 10 21H2 (19044.1415)
  • Python 3.11.3
  • vapoursynth-classic R57.A8
  • vapoursynth-plugin v0.96g3

TRT FP16

1920x1080 rgbs, CUDA graphs enabled, fp16

Measurements: FPS / Device Memory (MB)

general
model 1 stream 2 streams 3 streams
dpir gray 21.93/1757.352 25.48/3049.696 25.31/4342.044
dpir color 18.24/1790.184 25.11/3115.360 25.22/4440.540
waifu2x upconv_7_{anime_style_art_rgb, photo} 19.58/2148.716 39.87/3867.240 59.94/5585.768
waifu2x upresnet10 17.40/1655.144 34.22/2880.096 42.78/4105.048
waifu2x cunet / cugan 13.64/4391.292 25.09/8346.248 25.19/12301.208
waifu2x swin_unet 4.62/14989.772 OOM OOM
real-esrgan (v2/v3, xsx2) 16.77/1136.996 33.99/1876.568 41.44/2616.140
rife

v2, fp16 i/o

version 1 stream 2 streams 3 streams 4 streams 5 streams
v4.4-v4.5 150.20/622.784 301.05/835.860 448.90/1053.024 615.84/1268.152 787.57/1481.224
v4.6 147.63/624.832 294.53/837.904 452.26/1055.072 603.63/1270.200 764.31/1485.320
v4.7-v4.9 132.06/747.712 268.63/1075.476 403.54/1405.284 494.98/1737.152 496.41/2064.908
v4.10-v4.15 119.09/862.400 238.68/1304.852 346.98/1749.352 349.48/2195.904 349.80/2638.356
{v4.12, v4.13, v4.15, v4.16}_lite 123.72/782.528 250.81/1151.252 377.27/1522.020 403.14/1894.844 403.79/2263.568
v4.14 lite 117.97/839.872 234.67/1265.940 320.23/1696.104 321.88/2124.224 321.18/2552.340

vsmlrt v12

  • processor clock @ 2860 MHz
  • memory clock @ 1406.25 MHz
  • driver 527.56
  • Windows 11 21H2
  • VapourSynth R57

FP16

1920x1080 rgbs

Measurements: FPS / Device Memory (MB)

model ORT_CUDA 1 stream TRT 1 stream TRT 2 streams
dpir gray 9.093 / 1845 23.95 / 975 24.527 / 1297
dpir color 8.688 / 1749 21.92 / 1073 24.212 / 1413
waifu2x upconv7 20.15 / 5905 39.035 / 2354 53.406 / 4029
waifu2x upresnet10 13.445 / 2814 29.411 / 2147 37.001 / 3439
waifu2x cunet 6.981 / 8532
cugan 6.869 / 8719 20.511 / 5490 24.018 / 9579
realesrgan 14.188 / 2346 28.518 / 2080 35.771 / 2970
rife (1920x1088, model=44) 95.52 / 1609 138.513 / 1319 208.345 / 1653

vsmlrt v14.test2

  • driver 545.84
  • Windows 10 22H2
  • VapourSynth-classic R57.A8

Waifu2x.swin_unet_art

1920x1080 rgbs

PSNR is tested on a private set of samples compared to FP32.

Measurements: FPS / Device Memory (MB)

precision TRT 1 stream TRT 2 streams PSNR
fp16 4.91 / 7399.8 5.40 / 14351 65.3
bf16 4.70 / 7797.1 5.10 / 15142 53.7