Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
xziayro's picture
8 750 963

xziayro

xziayro
CCP6's profile picture TereSporles's profile picture Gargaz's profile picture
·
  • xziayro

AI & ML interests

None yet

Recent Activity

liked a model about 8 hours ago
reaperdoesntknow/Structure-Over-Scale
reacted to OzTianlu's post with 🔥 about 8 hours ago
https://github.com/lizixi-0x2F/March I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication. When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately — duplicating the exact same data over and over again. Pure waste. March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it. - 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations) - Zero-copy queries — returns direct pointers into the memory pool, no expensive memcpy on the hot path - Predictable memory usage — fixed-size page pool with O(L) complexity - Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production
upvoted an article about 8 hours ago
Mastering Long Contexts in LLMs with KVPress
View all activity

Organizations

Runware's profile picture

models 0

None public yet

datasets 0

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs