Chinese AI research firm DeepSeek has released an experimental model, V3.2-exp, designed to significantly reduce inference costs in long-context operations. Announced on Hugging Face with supporting research on GitHub, the model introduces DeepSeek Sparse Attention, a system that uses a “lightning indexer” to prioritize excerpts from large context windows and a “fine-grained token selection system” to filter tokens efficiently.
Early testing shows API call costs could be cut by half in extended-context scenarios, a breakthrough for reducing the expense of running pre-trained transformer models. The open-weight release enables independent validation by third parties. Building on its earlier R1 model, DeepSeek is positioning Sparse Attention as a practical advance in efficiency that could influence global approaches to AI deployment.




