News

This project implements a high-speed matrix-matrix multiplication module in C/C++, optimized with multi-threading, SIMD, and cache miss minimization. It supports large, configurable matrix sizes, ...
In intelligent connected vehicle applications, tasks, such as path planning and health management involve numerous matrix operations, particularly matrix multiplication. Due to limited resources, ...
tritonBLAS: A Lightweight Triton-based General Matrix Multiplication (GEMM) Library Important This project is intended for research purposes only. Use it at your own risk and discretion. Triton is a ...
Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in ...