News

High-performance matrix multiplication remains a cornerstone of numerical computing, underpinning a wide array of applications from scientific simulations to machine learning.
Bibek Bhattarai details Intel's AMX, highlighting its role in accelerating deep learning on CPUs. He explains how AMX ...
By transforming operands into a Montgomery domain, these algorithms enable efficient modular multiplication and exponentiation, which are crucial for public-key cryptosystems.
In the past decade, however, the parallel processing of GPUs was also found to more efficiently run the matrix multiplication algorithms needed to power AI models. AI development has two key phases.
As a key operation in contemporary cryptosystems, modular multiplication occupies non-negligible latency and area. We first show optimizations of the k-term Karatsuba algorithm for AB/rk and ABmodrk ...
tritonBLAS: A Lightweight Triton-based General Matrix Multiplication (GEMM) Library Important This project is intended for research purposes only. Use it at your own risk and discretion. Triton is a ...
Block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge. Additionally, this repo includes codes for quantizing Pytorch bf16 matmul with fp8. - GitHub - luongth ...
A matrix multiplication algorithm was applied to integrate scRNA-seq data of each sample with drug–gene interactions to generate the target-based enrichment profiles of compounds as input for the ...