Name: AuraByt Inc.
Price range: $$

Question 1

Does our data ever leave our infrastructure?

Accepted Answer

No. The whole point of this build is that the model, the document index, and the inference all run on hardware you control, either on-prem or in a private cloud tenancy you own. Prompts, retrieved documents, and embeddings stay inside your network boundary. There are no calls to OpenAI, Anthropic, or any other hosted API at inference time.

Question 2

Which models do you deploy, and are they any good?

Accepted Answer

Open-weight models: Llama, Mistral, and Qwen, sized to your hardware and your task. For grounded question-answering over your own documents, a mid-size open model with good RAG usually does the job. For open-ended reasoning at the frontier, a hosted API is still ahead, and we will tell you plainly if your use case is one of those rather than sell you a local box that underperforms.

Question 3

What does this cost to run?

Accepted Answer

Honest answer: it depends on hardware, model size, and how many people hit it. Self-hosting trades a per-token API bill for fixed costs in GPUs, power, and maintenance, so it makes sense at steady volume or when compliance rules out the cloud, and it makes less sense for light or spiky usage. We write up the trade-off for your specific case before you commit. We do not publish a flat price because a number with no context would be misleading.

Question 4

Is this PHIPA or compliance friendly?

Accepted Answer

Keeping patient or client data on infrastructure you control, with access controls and audit logs, lines up with how regulated operators are expected to handle sensitive records under PHIPA and similar regimes. We build the technical controls: data stays in your boundary, access is role-based, and queries are logged. We are developers, not your lawyers or your privacy officer, so the final compliance sign-off is theirs to give.

Question 5

Can you maintain it after it goes live?

Accepted Answer

Yes, on a monthly retainer. Model updates, security patches on the host, index refreshes as your documents change, and tuning as usage grows. You can also take the whole thing in-house: you own the deployment, the configuration, and the runbooks, and nothing is locked behind our infrastructure.

Local AI for teams that can't ship data out.

Your data stays put.

Open-weight models, your hardware

RAG over your documents

Wired into systems you run

Audit and access logs

Regulated operators, not everyone.

Ontario clinics

Firms and in-house

Regulated finance

When you do and don't need this.

The thinking behind it.

Why local AI matters

When local LLMs win

Self-hosted LLM TCO

PHIPA-compliant AI

FAQ.

Have something that needs shipping?