LLaVA

LLaVA1. Model scale2. Pull LLaVA3. Use LLaVA3.1. Run LLaVA3.2. Have a conversation3.3. End the conversation4. Memory optimizationReferences

Demo Environment

Development board: Jetson Orin series motherboard

SSD: 128G

Tutorial application scope: Whether the motherboard can run is related to the available memory of the system. The user's own environment and the programs running in the background may cause the model to fail to run.

Motherboard modelRun directly with OllamaRun with Open WebUI
Jetson Orin NX 16GB
Jetson Orin NX 8GB
Jetson Orin Nano 8GB×
Jetson Orin Nano 4GB××

LLaVA (Large-scale Language and Vision Assistant) is a multimodal model that aims to achieve general vision and language understanding by combining visual encoders and large-scale language models.

1. Model scale

ModelParameters
LLaVA7B
LLaVA13B
LLaVA34B

2. Pull LLaVA

Using the pull command will automatically pull the model from the Ollama model library:

image-20250111184624149

3. Use LLaVA

Use LLaVA to identify local image content.

3.1. Run LLaVA

If the system does not have a running model, the system will automatically pull the LLaVA 7B model and run it:

3.2. Have a conversation

The time to reply to the question depends on the hardware configuration, so be patient!

image-20250111185418979

3.3. End the conversation

Use the Ctrl+d shortcut key or /bye to end the conversation!

image-20250111185443258

4. Memory optimization

Since the local model has very high memory requirements, if you cannot run the model, you can follow the tutorial below to close the graphical interface and run the model in command line mode.

After running, restart the system to take effect, and then use SSH remote system to run the model.

After running, restart the system to take effect and restore the desktop mode.

References

Ollama

Official website: https://ollama.com/

GitHub: https://github.com/ollama/ollama

LLaVA

GitHub: https://github.com/haotian-liu/LLaVA

Ollama corresponding model: https://ollama.com/library/llava