News

We present a photonic memory pooling architecture and a co-designed optimization methodology for flexible allocation of compute, memory, and network resources.
When allocating blocks, if the remaining memory on GPU0 (e.g. 40GB) is greater than the remaining memory on GPU1 (e.g. other processes occupy some memory, leaving only 20GB for nano-vllm), the actual ...