Automatic Cost Optimization

While we do not profit off of your usage fees, nor do we lose money from them, we want you to get the most value out of your subscription while also keeping your costs low.

Onuro takes many steps to automatically optimize your AI costs, while doing our best to maintain high-quality responses:

1. Context Caching

We utilize context caching to reduce costs for multi-turn responses. With cost reductions of up to 90% at times, this ends up significantly reducing your costs for multi-turn conversations, and makes agentic behavior remain in a manageable price range.

We currently only utilize context caching for Anthropic and OpenAI models.

2. Sliding Window Message Truncation

Due to all messages needing to be reprocessed to trigger a response from the AI, we intentionally do not keep all messages from your current conversation in the context. Instead, we keep at most the last 5 back and forth turns of the conversation, and discard the previous turns.

From our analysis, this is the optimal balance between cost and performance. This way we do not discard so much context that the AI can not respond properly, but also do not keep so much context that much of it is stale and meaningless.

3. Dynamic Content Truncation

The final cost optimization mechanism we use is truncating dynamic content, such as your code files. Often times the AI only needs 1 state of a file at a time, or no longer needs to see a file’s state at all. With this in mind, when we can infer that there is context that is no longer needed, we discard it from the conversation. This also doubles up as a privacy measure, as the state of your files is NEVER stored long term, as we discard it instantly after use.

What You Can Do to Further Optimize Your Costs

Even with all of our optimizations, heavy and/or reckless usage can still rack up costs. To further optimize your costs, we recommend:

1. Only Give Relevant Context

When selecting files, it’s recommended to only give the amount of context needed for the current task. This helps keep the cost low, as well as improve the quality of the response. While these AI models CAN process large amounts of context, processing large amounts of context also tends to reduce output quality.

2. Regenerate Instead of Re-prompt

When the AI does not respond sufficiently, or you realize you did not give proper context before sending your message to the AI, you have 2 options:

  1. Send a follow-up message to the AI
  2. Regenerate the response (optionally with a refined prompt)

When you come across this scenario, we recommend to just alter your message if needed and regenerate the response.

3. Use a Cheaper AI Model

We optimize for performance before optimizing for cost. Due to this, we give a high performing AI model as the default model to paid users. If you prefer to optimize for cost, you can select a cheaper (or free!) model from the dropdown menu, and this can significantly reduce your costs. You can identify which models are cheaper by using our tooltips in chat.

4. Create New Chats for New Tasks

When you start a new task that is completely unrelated to anything in your current conversation, it’s recommended to start a new chat since none of the previous context matters. Like point 1, this helps keep the cost low, as well as improve the quality of the response.