News
I see that FP4MM is used in this article. I have a small question. Does NVIDIA dequantize the A and B matrices to FP16 and then perform matrix multiplication for FP4MM at the hardware level, or does ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results