The library for simple creation of chatbots that I am developing for python has a strange problem when working on #GoogleColab, namely that on my computers and laptops the model is loaded directly into RAM when executing the prompt() function, on Google Colab it looks like the model did not load for 10 minutes (suspiciously low RAM usage), after this time the normal amount of RAM is used (looks like entire model is loaded) and after a short time the text is generated.
The library is written in Rust and uses llama.cpp and rustformers llm
If anyone knows how to solve this, I will be very grateful.
Link to issue:
https://github.com/Hukasx0/ai-companion-py/issues/6
Link to the repository:
https://github.com/Hukasx0/ai-companion-py