Introducing Tensormesh Beta 2: One-Click LLM Deployment, New UI & Real-Time Cost Savings

We are excited to announce the launch of Tensormesh Beta 2, a complete redesign of our platform focused on simplicity, speed, and visibility. After listening to feedback from our beta community, we rebuilt the experience from the ground up to make deploying and managing LLMs easier than ever.

Ready to try it? Access the platform at app.tensormesh.ai and join our Slack community to chat with the team directly.

1. New User Interface with One-Click Deployment

We have completely reimagined the Tensormesh interface with user experience as our top priority. The new design eliminates unnecessary scrolling and complexity—you can now deploy an LLM with a single click.

Why this matters:

The deployment process is now intuitive and fast. Select your GPU provider, choose your model, configure advanced settings if needed, and deploy. No more navigating through multiple screens or complex configuration steps.

2. New Overview Dashboard with Test Interface

Your command center has received a major upgrade. The new Overview Dashboard features Quick Actions including a conversational chatbot with a test interface that simulates concurrent usage through batching.

In practice:

You can now generate synthetic cache hit rates before committing to a full deployment. This allows you to validate performance expectations and optimize your configuration upfront. All deployment and management tasks are now accessible in a single click from the dashboard.

3. New Pre-Loaded LLM Support

Deploying the latest models just got dramatically faster. We have pre-loaded support for trending LLMs so you can spin them up instantly without waiting for downloads:

Qwen3 Family:

  • Qwen3-30B
  • Qwen3-235B
  • Qwen3-Coder-30B-A3B-Instruct

Mistral:

  • Devstral-2-123B-Instruct-2512

The benefit:

No more waiting for model downloads. The latest trending models are ready for immediate deployment, letting you experiment and iterate faster.

4. Reserved Deployment Support

We have introduced a new deployment option for teams that need dedicated GPU capacity at predictable pricing.

What does this mean for you?

Reserved deployments allow you to lock in dedicated GPUs at a discounted rate with a time commitment. This is ideal for production workloads where you need guaranteed capacity and want to optimize costs over time.

5. Enhanced Deployed Model Dashboard

Visibility is essential for optimization. The new Deployed Model Dashboard provides an extensive view of your deployment information including ready-to-use curl samples and two new critical metrics:

GPU Compute Utilization This metric shows exactly how hard your GPU hardware is working. Monitoring GPU utilization helps you right-size your deployments and identify opportunities to increase efficiency or scale capacity.

KV Cache Usage Ratio This measures how effectively your deployment is utilizing the KV cache. A higher ratio indicates better cache efficiency, which directly correlates with cost savings and improved latency.

Why this matters:

These metrics give you the observability needed to understand and manage your deployed models effectively. You can now make data-driven decisions about scaling, optimization, and resource allocation.

6. User Management & Cost Tracking

The new User Management hub puts your spending and savings in one place:

  • Spending breakdown by day, week, and month
  • Cost savings tracker based on KV Cache Hit utilization
  • Simple contact options via form or live Slack channel
  • In-app notifications to keep you informed

The result:

You now have complete visibility into your AI infrastructure costs and can see exactly how much you are saving through Tensormesh's KV Cache optimization. Reaching the team is now just one click away.

Coming Soon

We are not slowing down. Here is what is on our roadmap:

Infrastructure Expansion:

  • Serverless Deployment for pay-per-request pricing
  • Autoscaling from 0 to 8 replicas
  • Multi-provider support: AWS, Google Cloud, Azure, and CoreWeave
  • HGX B200 GPU support for next-generation performance

New Model Support:

  • DeepSeek 3.2
  • Moonshotai Kimi-K2-Instruct
  • GLM-4.7-Flash (zai-org)

Account Updates

To improve platform security and service quality, we've made a few changes to how accounts work:

  • Credit card required for deployment: A valid credit card is now required to deploy models.
  • X-User-Id header mandatory: All API requests must include the X-User-Id header.
  • $100 credit refresh: Users who reach zero balance can receive a new $100 credit after completing a short feedback survey.

Try Tensormesh v2 Today

To explore these new features, visit your Tensormesh dashboard.

Are there features you would like us to add to our product?

Feel free to reach out to us via:

Recent Blog Posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Lorem ipsum dolor sit amet.

Name

Position
February 18, 2026

Agent Skills Caching with CacheBlend: Achieving 85% Cache Hit Rates for LLM Agents

Kuntai Du
Chief Scientist, Co-Founder

Read article

February 11, 2026

Beyond Prefix Caching: How Non-Prefix Caching Achieves 25x Better Hit Rates for AI Agents

Kuntai Du
Chief Scientist, Co-Founder

Read article

February 4, 2026

The Open Source Revolution: Why Open-Weight AI Models Are Redefining the Future

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

January 28, 2026

LMCache's Production-Ready P2P Architecture: Powers Tensormesh's 5-10x Cost Reduction

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

January 21, 2026

The Document Reprocessing Problem: How LLMs Waste 93% of Your GPU Budget

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

January 15, 2026

Building Tensormesh: A conversation with the CEO (Junchen Jiang)

Junchen Jiang
CEO, Co-Founder

Read article

January 7, 2026

The Hidden Metric That's Destroying Your AI Agent's Performance & Budget

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

December 17, 2025

LMCache ROI Calculator: When KV Cache Storage Reduces AI Inference Costs

Sandro Mazziotta
Head of Product Management

Read article

December 10, 2025

AI Inference Costs in 2025: The $255B Market's Energy Crisis and Path to Sustainable Scaling

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

December 3, 2025

New Hugging Face Integration: Access 300,000+ AI Models with Real-Time Performance Monitoring

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

November 26, 2025

The AI Inference Throughput Challenge: Scaling LLM Applications Efficiently

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

November 19, 2025

Solving AI Inference Latency: How Slow Response Times Cost You Millions in Revenue

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

November 13, 2025

GPU Cost Crisis: How Model Memory Caching Cuts AI Inference Costs Up to 10×

Bryan Bamford
Marketing, Enterprise and Partnerships

Read article

October 23, 2025

Tensormesh Emerges From Stealth to Slash AI Inference Costs and Latency by up to 10x

Junchen Jiang
CEO, Co-Founder

Read article

October 21, 2025

Comparing LLM Serving Stacks: Introduction to Tensormesh Benchmark

Samuel Shen
Software Engineer

Read article