Cost Optimization
Learn how Onuro automatically optimizes your AI costs
Automatic Cost Optimization
While we do not profit off of your usage fees, nor do we lose money from them, we want you to get the most value out of your subscription while also keeping your costs low.
Onuro takes many steps to automatically optimize your AI costs, while doing our best to maintain high-quality responses:
1. Context Caching
We utilize context caching to reduce costs for multi-turn responses. With cost reductions of up to 90% at times, this ends up significantly reducing your costs for multi-turn conversations, and makes agentic behavior remain in a manageable price range.
2. Sliding Window Message Truncation
Due to all messages needing to be reprocessed to trigger a response from the AI, we intentionally do not keep all messages from your current conversation in the context. Instead, we keep at most the last 5 back and forth turns of the conversation, and discard the previous turns.
From our analysis, this is the optimal balance between cost and performance. This way we do not discard so much context that the AI can not respond properly, but also do not keep so much context that much of it is stale and meaningless.
3. Dynamic Content Truncation
The final cost optimization mechanism we use is truncating dynamic content, such as your code files. Often times the AI only needs 1 state of a file at a time, or no longer needs to see a file’s state at all. With this in mind, when we can infer that there is context that is no longer needed, we discard it from the conversation. This also doubles up as a privacy measure, as the state of your files is NEVER stored long term, as we discard it instantly after use.
What You Can Do to Further Optimize Your Costs
Even with all of our optimizations, heavy and/or reckless usage can still rack up costs. To further optimize your costs, we recommend:
1. Only Give Relevant Context
When selecting files, it’s recommended to only give the amount of context needed for the current task. This helps keep the cost low, as well as improve the quality of the response. While these AI models CAN process large amounts of context, processing large amounts of context also tends to reduce output quality.
2. Regenerate Instead of Re-prompt
When the AI does not respond sufficiently, or you realize you did not give proper context before sending your message to the AI, you have 2 options:
- Send a follow-up message to the AI
- Regenerate the response (optionally with a refined prompt)
When you come across this scenario, we recommend to just alter your message if needed and regenerate the response.
3. Use a Cheaper AI Model
We optimize for performance before optimizing for cost. Due to this, we give a high performing AI model as the default model to paid users. If you prefer to optimize for cost, you can select a cheaper (or free!) model from the dropdown menu, and this can significantly reduce your costs. You can identify which models are cheaper by using our tooltips in chat.
4. Create New Chats for New Tasks
When you start a new task that is completely unrelated to anything in your current conversation, it’s recommended to start a new chat since none of the previous context matters. Like point 1, this helps keep the cost low, as well as improve the quality of the response.