Installing LM Studio
LM Studio is available for free for Windows, macOS and Linux. Here's how to install it:
- Go to lmstudio.ai
- Click "Download" and choose your system
- Launch the downloaded installer
- Follow the installation wizard (accept default settings)
- On first launch, LM Studio will automatically download its components
๐ก Tip: On macOS, you may need to authorize the application in System Preferences > Security & Privacy after first launch.
GPU Configuration
For optimal performance, configure GPU acceleration. This step is crucial for generation speed.
Windows with NVIDIA
- Open LM Studio and go to the Settings tab (โ๏ธ)
- Click Hardware in the left menu
- Under "GPU Acceleration", select cuBLAS (NVIDIA)
- Enable "GPU offload" and set the slider to 100%
- Verify that your GPU appears in the list
macOS (Apple Silicon)
- Go to Settings > Hardware
- Select Metal as GPU backend
- Enable "GPU offload" โ all unified RAM will be used
โ ๏ธ Warning: If you don't have a compatible GPU, LM Studio will use the CPU. It will work but be 5-10x slower.
Download your first model
LM Studio integrates a model browser. Here's how to find and download an LLM:
- Click on the Search tab (๐) on the left
- In the search bar, type a model name (ex: "Qwen 3", "Llama 3.3")
- Filter by recommended size according to your RAM:
- 8GB RAM โ 3-7B models
- 16GB RAM โ 7-13B models
- 32GB+ RAM โ 30B+ models
- Click on a model (ex: "Qwen3-8B-Q5_K_M.gguf")
- Click Download โ the model will be saved automatically
๐ก Quick alternative: Use LocalClaw to get a personalized recommendation with direct download link!
Launch chat and configure the model
Once the model is downloaded, it's time to chat with your local AI:
- Click on the Chat tab (๐ฌ) on the left
- At the top right, click Select a model to load
- Choose the downloaded model from the list
- Wait for loading (progress bar)
- Once loaded, the chat window becomes active
Important parameters to know
- Temperature (0-2): Model creativity. 0.7 = balanced, 1.2+ = more creative/risky
- Context Length: Conversation memory length (default: 4096 tokens)
- Max Tokens: Maximum length of responses
Your first prompts
Test these prompts to evaluate your installation:
Conversation test:
Explain the theory of relativity to me like I'm 10 years old
Code test:
Write a Python function that calculates the Fibonacci sequence up to n
Reasoning test:
A train leaves Paris at 100km/h, another leaves Lyon at 120km/h...
๐ก Performance indicator: Look at the "tokens/s" counter at the bottom of the screen. Above 20 tok/s is fluid, 50+ tok/s is very responsive.
Advanced features
Once comfortable, explore these powerful features:
Server Mode (local API)
LM Studio can serve your model as an OpenAI-compatible API:
- Go to the Developer tab (๐จโ๐ป)
- Click Start Server
- Your model is now accessible on
http://localhost:1234 - Use this URL in any OpenAI-compatible application
RAG โ Chat with your documents
LM Studio 0.3+ allows you to "converse" with your files:
- In the Chat tab, click the ๐ icon
- Select a PDF or text file
- The model will respond based on the document content
System Prompt
Customize model behavior with a "system prompt":
- In chat settings, find "System Prompt"
- Example:
You are a Python programming expert. Respond concisely and technically. - This will influence the entire conversation
Troubleshooting common issues
โ "Out of memory" / Crashes
The model is too large for your RAM. Switch to a smaller version (7B instead of 13B) or a stronger quantization (Q4 instead of Q8).
๐ Very slow generation
Verify that the GPU is properly configured in Settings > Hardware. If you don't have a compatible GPU, it's normal to be slow on CPU.
๐ฅ Model won't download
Check your internet connection and available disk space. Try another model to isolate the problem.
Conclusion
Congratulations! You now have a completely local, private, and functional generative AI. No subscription, no data sent to the cloud, just you and your machine.
To go further:
- Test different models to find the one that matches your usage
- Experiment with parameters (temperature, context length)
- Explore the local API to integrate your LLMs into other tools
- Join the community on Reddit r/LocalLLaMA for the latest news
And don't forget: LocalClaw is here to help you choose the perfect model every time you want to explore a new LLM!