Skymizer’s HTX301 Wants to Put 700B AI Models on One PCIe Card
Skymizer, a Taiwan-based AI hardware and software company, has revealed the HTX301, a PCIe AI accelerator card built for running large language models on-premises instead of depending on massive cloud GPU clusters.
The big claim here is properly wild: Skymizer says the HTX301 can handle inference for models up to 700 billion parameters on a single PCIe card. For companies, labs, and even serious regional AI builders, that could be a big deal if the numbers hold up in real-world testing.
A local AI card with 384GB memory
The HTX301 is designed as a PCIe add-in card, so physically it looks closer to the kind of accelerator you would slot into a server than a giant external AI rack. According to Skymizer, each board carries six HTX301 chips and up to 384GB of memory.
Interestingly, the card does not rely on expensive memory types like HBM, GDDR6, GDDR7, or LPDDR5X. Instead, Skymizer is using standard LPDDR4 and LPDDR5 DRAM. That sounds less flashy, but the company’s pitch is that its architecture is tuned around lower bandwidth needs and smarter compression rather than brute-forcing everything with the most expensive memory stack possible.
The chip is built on an older 28nm process, which makes the performance claims even more notable. Skymizer says the HTX301 can hit 30 tokens per second with 0.5 TOPS at 100GB/s bandwidth. Its Octa-Core LPU is also claimed to reach 240 tokens per second in Llama2 7B prefill, while multi-chip configurations can scale that to 1,200 tokens per second for the same model.
Why this matters for Malaysia and SEA
For Malaysia and Southeast Asia, this kind of hardware is worth watching because AI infrastructure is becoming a serious cost problem. Not every startup, university lab, game studio, or local enterprise can afford to rent premium cloud GPU capacity forever. And if you are dealing with private business data, customer records, government workloads, or internal tools, sending everything to the cloud is not always ideal.
That is where on-prem AI gets interesting. Skymizer is positioning the HTX301 around data sovereignty, predictable latency, and fixed infrastructure cost. In plain English: keep the AI model in your own server room, know your monthly cost, and avoid depending fully on cloud providers.
For SEA game studios, esports analytics teams, localisation companies, and content platforms, local AI inference could eventually mean cheaper internal chatbots, translation tools, moderation systems, data analysis, and player-support automation. We are not saying every Malaysian company is suddenly going to buy one card and run a 700B model tomorrow, but this direction is important.
The 240W power claim is the spicy part
The headline spec is power. Skymizer says the HTX301 card runs at 240W, which is less than half the 600W class of major PCIe AI accelerators mentioned in the source, including NVIDIA’s RTX PRO 6000 Blackwell and AMD’s Instinct MI350P.
For Malaysia, power draw is not just a spec-sheet flex. Lower wattage means easier cooling, lower electricity cost, and less headache for smaller server setups. Anyone who has built or maintained serious PC hardware here knows heat is the enemy, bro. Our climate does not forgive badly cooled systems.
Skymizer also says its compression methods help reduce memory pressure. Its weight compression reportedly performs 9% to 17.8% better than open-source llama.cpp, while KV cache compression is said to keep perplexity loss low, between under 0.06% and 3.52%.
Still waiting for real proof
For now, this is still an on-paper announcement. Skymizer plans to preview the HTX301 at Computex, where the real question will be whether the company’s claims survive closer inspection.
If it works as advertised, the HTX301 could make serious local AI deployments more realistic for smaller companies that cannot justify huge GPU clusters. If not, it is still a sign of where the AI hardware race is heading: less “who has the biggest data centre” and more “who can run powerful models efficiently in a normal server.”
Either way, this is one to keep an eye on, especially for SEA businesses trying to build AI without burning cloud money every month.
Source: Wccftech Gaming


