How browser-based local LLMs work
Browser AI is powerful, but it needs careful UX: manual model loading, progress states, fallback messages and device-dependent claims.
The simple explanation
A browser-local LLM downloads model files, caches them in the browser, and runs inference on the user device. WebGPU can accelerate the computation when the browser and hardware support it.
What users must know
First run can be slow. Some mobile devices may fail. A model download can be large. The prompt is not sent to Bluesky Labs servers during a local run, but model files may come from external model hosts.
UX requirements
Never auto-download a model on page load. Show WebGPU support first. Provide progress, errors, reset, and a no-model fallback such as a rule-based prompt checklist.
Safe public language
Use local mode, browser cache, device-dependent, estimated, and not sent to our server. Avoid 100% private, guaranteed output quality, offline forever, or runs every model.
Bluesky implementation
The Local Prompt Tester starts as a manual WebLLM MVP with one small recommended model path, streaming output, local metrics and share-card export.
Model storage & deletion
Model files are cached inside your browser's site storage (Cache Storage, IndexedDB, or OPFS) for bluesky-labs.com. They are not saved to your normal Downloads folder. To delete them, clear site data for bluesky-labs.com in your browser settings (DevTools → Application → Storage → Clear site data).
Editorial note
This guide is an implementation-oriented overview, not a benchmark guarantee. Browser-local AI behavior changes by browser, GPU, memory, model, cache and network conditions. Keep public claims conservative and test on real devices before launch.