hello dog Hello folks,
We recently published a blog where we benchmarked three 7 billion parameter language models (LLMs) 🚀 —LLama 2, Mistral, and Gemma with 6 different inference engines(includes vLLM, TensorRT-LLM, Deepspeed Mii, Ctranslate2 ,TGI, tritonserver+vLLM).
📊 This blog can help you identify the right model for your use case and understand which model will give you the best throughput with the right inference engine.
🔗 Check out the full report here:
https://www.inferless.com/learn/exploring-llms-speed-benchmarks-independent-analysis
For any discussion just drop me a message 🙂