News

Then we can quantize the scaled values to FP8 and perform low-precision matrix multiplication for lower memory footprint and faster throughput. The result is accumulated in full precision FP32, ...
An improved variant of the precise-integration time-domain (PITD) method is proposed to eliminate the inverse matrix calculation and optimize the storage burden with the help of sparse computation.
On-chip optical neural networks (ONNs) have recently emerged as an attractive hardware accelerator for deep learning applications, characterized by high computing density, low latency, and compact ...