want to push the LLM modals and other things on the client side as much as possible, just rough notes and findings of what currently we are at,
react native app:
https://github.com/a-ghorbani/pocketpal-ai it’s using this https://github.com/mybigday/llama.rn
- they are deploying small language models
web which downloads it and cache it, but size is for llama 3.2 1B instruct is around 733 mb https://github.com/mlc-ai/web-llm
this is interesting: Universal LLM Deployment Engine with ML Compilation https://github.com/mlc-ai/mlc-llm
https://blog.mlc.ai/2023/05/01/bringing-accelerated-llm-to-consumer-hardware
https://blog.mlc.ai/2023/05/08/bringing-hardware-accelerated-language-models-to-android-devices
there is course also, these people are smart:
https://mlc.ai/summer22/schedule https://mlc.ai/
goldemine: https://www.youtube.com/@mlc-ai2867/videos
why i don’t know but my heart is saying rewrite this in rust lmao
onnx runtimes are also good option: https://onnxruntime.ai/docs/tutorials/mobile/
client side vector db:
https://objectbox.io/the-first-on-device-vector-database-objectbox-4-0/ https://github.com/yusufhilmi/client-vector-search https://clientvectorsearch.com/ client side embedding: https://medium.com/@robert.lukoshko/the-ultimate-guide-to-embeddings-in-frontend-development-e4211a06bb13
this seems more promising: https://rxdb.info/articles/javascript-vector-database.html
pgvector.rs: https://github.com/tensorchord/pgvecto.rs