Probably, an AI reliability startup, has raised $9 million in seed funding from Andreessen Horowitz to tackle one of the most persistent problems in large language models: hallucinations and factual errors that escape detection before reaching end users.
Founded by Peter Elias, the company is targeting the kind of 99.99% accuracy common in deterministic software systems but rarely achieved with AI. Its first product is a data science tool that generates answers from complex datasets, each accompanied by a citation and full audit trail. The core innovation is what Elias describes as a validator harness: the LLM’s initial outputs are checked against a deterministic system that rejects any result inconsistent with the underlying dataset, and the model has been trained in conjunction with that validator to optimise for speed and accuracy simultaneously.
The approach has a notable commercial advantage. Because the harness reduces ambiguity so precisely, the system can run on AI models significantly smaller than frontier equivalents, specifically models four capability classes below the leading offerings, allowing deployment on local hardware rather than data centre infrastructure and dramatically cutting token costs.

Elias argued that the major AI laboratories have little incentive to solve hallucinations at this level, since their revenue scales with the number of corrections and retries a model requires. Probably’s architecture inverts that logic, and Elias said the same engine could extend beyond data science into accounting, medical services, and any other precision-sensitive domain.