Inference Models - Search News

Multi-token prediction technique triples LLM inference speed without auxiliary draft models

With reported 3x speed gains and limited degradation in output quality, the method targets one of the biggest pain points in production AI systems: latency at scale.

How AI Inference Costs Are Reshaping The Cloud Economy

The shift from training-focused to inference-focused economics is fundamentally restructuring cloud computing and forcing ...

13h

Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding

Researchers from the University of Maryland, Lawrence Livermore, Columbia and TogetherAI have developed a training technique that triples LLM inference speed without auxiliary models or infrastructure ...

Tech Xplore on MSN

AI energy use: New tools show which model consumes the most power, and why

AI users and developers can now measure the amount of electricity various AI models consume to complete tasks with an ...

Alibaba's Qwen 3.5 397B-A17 beats its larger trillion-parameter model — at a fraction of the cost

These speed gains are substantial. At 256K context lengths, Qwen 3.5 decodes 19 times faster than Qwen3-Max and 7.2 times ...

Taalas Launches Hardcore Chip With ‘Insane’ AI Inference Performance

Taalas has launched an AI accelerator that puts the entire AI model into silicon, delivering 1-2 orders of magnitude greater ...

4don MSN

Co-founders behind Reface and Prisma join hands to improve on-device model inference with Mirai

Mirai raised a $10 million seed to improve how AI models run on devices like smartphones and laptops.

Business Wire

Vultr Launches Cloud Inference to Simplify Model Deployment and Automatically Scale AI Applications Globally

WEST PALM BEACH, Fla.--(BUSINESS WIRE)--Vultr, the world’s largest privately-held cloud computing platform, today announced the launch of Vultr Cloud Inference. This new serverless platform ...

The Register on MSN

Intel backs SambaNova's $350M bid to challenge GPUs in AI inference

Upstart's 5th-gen RDU aims to undercut Nvidia's B200 on speed and cost AI infrastructure company SambaNova has raised $350 ...

ZDNet

Nvidia doubles down on AI language models and inference as a substrate for the Metaverse, in data centers, the cloud and at the edge

Machine learning, task automation and robotics are already widely used in business. These and other AI technologies are about to multiply, and we look at how organizations can best take advantage of ...

Network World

Nvidia claims 10x cost savings with open-source inference models

Nvidia noted that cost per token went from 20 cents on the older Hopper platform to 10 cents on Blackwell. Moving to ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results