LLaVA

Demonstration environment

Development board : Raspberry Pi 5B (8G RAM)

SD(TF)card:64G(Above 16G, the larger the capacity, the more models can be experienced)

LLaVA (Large scale Language and Vision Assistant) is a multimodal model aimed at achieving universal visual and language understanding by combining visual encoders and large-scale language models.

Model scale

ModelParameter
LLaVA7B
LLaVA13B
LLaVA34B

Got LLaVA

Using the pull command will automatically pull the models from the Ollama model library.

image-20240625150752722

Use LLaVA

Use LLaVA to recognize local image content.

Run LLaVA

If the system does not have a running model, the system will automatically pull the LLaVA 7B model and run it.

Dialogue

The time to reply to the question is related to the hardware configuration, please be patient.

image-20240625151714755

End conversation

You can end the conversation by using the shortcut key 'Ctrl+d' or '/bye'.

image-20240625151833739

Reference material

Ollama

Website:https://ollama.com/

GitHub:https://github.com/ollama/ollama

LLaVA

GitHub:https://github.com/haotian-liu/LLaVA

Ollama model:https://ollama.com/library/llava