Suman Portfolio

LLM at Edge: LLM inference locally with Llama CPP and Gemma 9B model

Architecture

Key features

Local LLM inference with Gemma 9B (GGUF) — no cloud API dependency
Model downloaded locally and offline deployment
Simple API to interact with the model
Configurable generation parameters — max tokens, temperature, and top_pli>