News

An improved variant of the precise-integration time-domain (PITD) method is proposed to eliminate the inverse matrix calculation and optimize the storage burden with the help of sparse computation.
On-chip optical neural networks (ONNs) have recently emerged as an attractive hardware accelerator for deep learning applications, characterized by high computing density, low latency, and compact ...
Block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge. Additionally, this repo includes codes for quantizing Pytorch bf16 matmul with fp8. - GitHub - luongth ...