News

I see that FP4MM is used in this article. I have a small question. Does NVIDIA dequantize the A and B matrices to FP16 and then perform matrix multiplication for FP4MM at the hardware level, or does ...