Here is the list of feature that we intend to deliver progressively until our v1:
During the beta phase, we will not be charging you anything but the cost of the GPU you will be renting through us. Our intention is, once our beta has ended, to charge our customer for our value add. Namely, we will be charging based on the savings our customers are realizing through our caching technique. This model ensures that while you enjoy more GPU power for the same amount spent, we also have the incentive to further enhance our techniques. During the beta, you will be shown what those savings are but you will not be charged this.The formula that we will use for pricing once the product is in v1 is as follow:GPU = Number of GPU hours consumedWhere Baseline is the estimated cost if you serve the same amount of workload by yourself when renting the GPU servers directly from the cloudExample:
GPH = Price per GPU/hour for the chosen provider
EST = Estimated Saving based on cache hit rate reported
Pricing: (GPU * GPH) + ( GPU * GPH * EST * 0.3 )
Baseline: (GPU * GPH) + ( GPU * GPH * EST)
GPU = 100h
GPH = $2
EST = 60%
Customers pays: 200 + ( 200 * 0.6 * 0.3 ) = $233
Baseline : 200 + ( 200 * 0.6 ) = $320
We estimate the cost saving based on the cache hit rate. Every time the cache is hit, this is counted as GPU time saved.
We consider the cache as being hit when cache stored outside of the GPU VRAM is being pulled back into the GPU.See it in action.
We’re gradually onboarding new users in batches to ensure the smoothest possible experience as we scale. You’ll receive an email as soon as it’s your turn to join.
Please let us know which GPU provider you would like to see added on discord on using the feedback tool.
We plan to offer an on prem version of Tensormesh shortly after our v1 is released.
The first beta version does not do this but it is on our roadmap to deliver this before our v1.
Yes, Tensormesh includes a cache aware routing.
No, at this time Tensormesh is a full inference stack experience. You still can use LMCache with the supported list of inference servers, but that is outside of our product offering.
This is functionality that will be added to the beta before we go to final.
Tensormesh v1 will be available on prem. Feel free to contact us for more details.