We are excited to announce the launch of Tensormesh Beta 2, a complete redesign of our platform focused on simplicity, speed, and visibility. After listening to feedback from our beta community, we rebuilt the experience from the ground up to make deploying and managing LLMs easier than ever.
Ready to try it? Access the platform at app.tensormesh.ai and join our Slack community to chat with the team directly.
We have completely reimagined the Tensormesh interface with user experience as our top priority. The new design eliminates unnecessary scrolling and complexity—you can now deploy an LLM with a single click.

Why this matters:
The deployment process is now intuitive and fast. Select your GPU provider, choose your model, configure advanced settings if needed, and deploy. No more navigating through multiple screens or complex configuration steps.
Your command center has received a major upgrade. The new Overview Dashboard features Quick Actions including a conversational chatbot with a test interface that simulates concurrent usage through batching.

In practice:
You can now generate synthetic cache hit rates before committing to a full deployment. This allows you to validate performance expectations and optimize your configuration upfront. All deployment and management tasks are now accessible in a single click from the dashboard.
Deploying the latest models just got dramatically faster. We have pre-loaded support for trending LLMs so you can spin them up instantly without waiting for downloads:
Qwen3 Family:
Mistral:

The benefit:
No more waiting for model downloads. The latest trending models are ready for immediate deployment, letting you experiment and iterate faster.
We have introduced a new deployment option for teams that need dedicated GPU capacity at predictable pricing.
What does this mean for you?
Reserved deployments allow you to lock in dedicated GPUs at a discounted rate with a time commitment. This is ideal for production workloads where you need guaranteed capacity and want to optimize costs over time.
Visibility is essential for optimization. The new Deployed Model Dashboard provides an extensive view of your deployment information including ready-to-use curl samples and two new critical metrics:
GPU Compute Utilization This metric shows exactly how hard your GPU hardware is working. Monitoring GPU utilization helps you right-size your deployments and identify opportunities to increase efficiency or scale capacity.
KV Cache Usage Ratio This measures how effectively your deployment is utilizing the KV cache. A higher ratio indicates better cache efficiency, which directly correlates with cost savings and improved latency.
Why this matters:
These metrics give you the observability needed to understand and manage your deployed models effectively. You can now make data-driven decisions about scaling, optimization, and resource allocation.
The new User Management hub puts your spending and savings in one place:
The result:
You now have complete visibility into your AI infrastructure costs and can see exactly how much you are saving through Tensormesh's KV Cache optimization. Reaching the team is now just one click away.
We are not slowing down. Here is what is on our roadmap:
Infrastructure Expansion:
New Model Support:
Account Updates
To improve platform security and service quality, we've made a few changes to how accounts work:
To explore these new features, visit your Tensormesh dashboard.
Are there features you would like us to add to our product?
Feel free to reach out to us via: