want to push the LLM modals and other things on the client side as much as possible, just rough notes and findings of what currently we are at,

react native app:

https://github.com/a-ghorbani/pocketpal-ai it’s using this https://github.com/mybigday/llama.rn

  • they are deploying small language models

web which downloads it and cache it, but size is for llama 3.2 1B instruct is around 733 mb https://github.com/mlc-ai/web-llm

this is interesting: Universal LLM Deployment Engine with ML Compilation https://github.com/mlc-ai/mlc-llm

https://llm.mlc.ai/

https://blog.mlc.ai/2023/05/01/bringing-accelerated-llm-to-consumer-hardware

https://blog.mlc.ai/2023/05/08/bringing-hardware-accelerated-language-models-to-android-devices

there is course also, these people are smart:

https://mlc.ai/summer22/schedule https://mlc.ai/

goldemine: https://www.youtube.com/@mlc-ai2867/videos

why i don’t know but my heart is saying rewrite this in rust lmao

onnx runtimes are also good option: https://onnxruntime.ai/docs/tutorials/mobile/

client side vector db:

https://objectbox.io/the-first-on-device-vector-database-objectbox-4-0/ https://github.com/yusufhilmi/client-vector-search https://clientvectorsearch.com/ client side embedding: https://medium.com/@robert.lukoshko/the-ultimate-guide-to-embeddings-in-frontend-development-e4211a06bb13

this seems more promising: https://rxdb.info/articles/javascript-vector-database.html

pgvector.rs: https://github.com/tensorchord/pgvecto.rs