What Is Cosine’s Genie? A Look at Cosine’s Attempt to Build ‘The Best AI Software Engineer in The World’

Genie
Genie

What Is Cosine’s Genie? A Look at Cosine’s Attempt to Build ‘The Best AI Software Engineer in The World’

Insider Brief

  • Cosine’s Genie, launched in August 2024, revolutionizes AI-driven software development with autonomous coding, debugging, and problem-solving capabilities.
  • Genie outperforms competitors, scoring 30% on SWE-Bench, a significant leap compared to Amazon’s Q and Factory’s Code Droid, which scored 19%.
  • Built on a specialized dataset that mirrors human reasoning, Genie tackles full workflows, making it more than just a code generator—it’s a digital colleague for software engineers.

Cosine, a company founded with a clear vision for advancing AI in software engineering, is breaking new ground with its state-of-the-art model Genie. Released in August 2024, Genie is more than just another code generation tool. Cosine company leadership reports that the product represents a fundamental shift in how AI tackles the complexities of coding, debugging, and overall software development.

Genie, as outlined in the company’s blog, is described by co-founder and CEO Alistair Pullen as the “best AI software engineer in the world.” Boasting unparalleled performance in industry-standard benchmarks, Genie has achieved a significant leap in AI engineering, scoring 30% on the SWE-Bench—far surpassing competitors like Amazon’s Q and Factory’s Code Droid, which scored only 19%. According to the post, this achievement isn’t merely statistical; it reflects a deeper, more profound change in AI’s potential to perform human-like engineering tasks autonomously.

What Is Cosine?

Cosine builds an AI-driven software engineering platform that seeks to go beyond the typical “copilot” model of AI programming tools. A “copilot” model is an AI tool that assists a human user by providing suggestions, guidance, or partial solutions during tasks, rather than performing tasks autonomously.

Cosine’s Genie, on the other hand, is designed to perform end-to-end programming tasks with minimal human intervention, positioning it as a digital colleague rather than an assistant. The company’s roots can be traced back to 2022 when Pullen first experimented with OpenAI’s early models. From those initial experiments emerged the concept of an AI capable of not just writing code but thinking through problems like a human engineer.

According to Cosine’s blog, this ambition wasn’t fully realized until the technology could support the vision. Early models faced limitations such as small context windows and low token limits, which made it challenging for the AI to handle complex tasks. However, these challenges merely highlighted the untapped potential of using large language models (LLMs) in software engineering. Pullen’s early realization that AI could become a transformative tool led to the creation of Genie, which takes full advantage of advancements in LLM capabilities.

What Does Genie Do?

Cosine, through its Genie model, tackles a wide range of programming tasks that go beyond the conventional use of AI in software development. While many AI models can assist in code generation or provide suggestions during programming, Genie is designed to solve bugs, build new features, refactor existing code, and handle everything in between. What makes Genie stand out, according to the Cosine blog, is its ability to work autonomously, taking on tasks in a way that mimics a human engineer’s approach.

This level of autonomy allows Genie to not only assist developers but, in many cases, take on entire tasks without needing human intervention. As Pullen explained, Genie was trained to act like a software engineer rather than just generate random snippets of code. This training sets it apart from competitors that rely on generic LLMs and merely prompt them with code-related tasks.

The ability to handle full workflows independently is a key distinction between Genie and other tools in the market. Instead of piecing together various fragments of solutions, Genie approaches software tasks as a cohesive whole, drawing on its training to reason through the steps needed to solve problems. As Pullen emphasizes, this human-like reasoning is central to Cosine’s vision.

How Genie Works

Genie’s success is built on a different approach to AI training. Rather than relying solely on general-purpose models like GPT-4, Cosine developed a specialized dataset that mimics the step-by-step decision-making processes of human software engineers. This dataset, according to the company blog, is a critical innovation in the way AI handles coding tasks. Cosine uses development activity from real software engineers to build an information lineage that mirrors human reasoning. This dataset allows Genie to approach problems incrementally, learning from each step as it progresses.

According to Cosine, this attention to data quality has been central to Genie’s breakthrough performance. The company avoided the trap of prompting generic models, a method that often results in shallow, incomplete solutions. Instead, by training their model on real-world software engineering data, Cosine has created an AI that doesn’t just generate code but understands the logic behind it.

One of the significant challenges faced by AI models in the software engineering space is context length. Early models had limited ability to retain information across long workflows, leading to fragmented and incomplete solutions. However, Genie’s design allows it to utilize much larger context windows, significantly improving its capacity to handle complex tasks in a continuous, logical manner.

Cosine’s blog emphasizes the importance of maintaining high-quality data. According to Pullen, while factors like hyperparameters and model architecture matter, the quality and fidelity of the training data are paramount. Genie’s performance owes much to the detailed and refined nature of its training data, which captures how human engineers implicitly approach problems. This focus on high-quality data enables the model to solve software challenges in a way that closely mirrors the logic and reasoning of a human developer.

A Look at Genie’s Architecture

When developing Genie, Cosine faced significant architectural challenges due to limitations in context window size, according to the technical report. In its early iterations, Genie relied on models with relatively short context windows — in other words limited memory for processing large amounts of information — in the 16-32k range, which proved to be a constraint on how much information could be processed at once. During these initial stages, Genie was trained on large datasets exceeding 100 million tokens. Despite early successes, the model’s performance was restricted by the amount of data it could retain in a single pass.

This changed when Cosine gained access to long-context OpenAI models, which allowed them to train Genie on billions of tokens. This expansion in context length enabled the model to process and represent significantly more information, eliminating the need for compression or chunking techniques that had previously limited performance. The data mix for this training run was carefully curated to focus on the most relevant programming languages for Cosine’s users, though the company aims to match the true distribution of languages on the internet in future iterations.

Another key aspect of Genie’s architecture is its ‘agentic’ nature, according to the technical report. Cosine designed Genie to react logically to the tasks it encountered, rather than simply executing isolated commands. One of the biggest challenges was training the model to understand the prerequisite information needed for working within unfamiliar codebases. To avoid hallucinations and ensure the solutions Genie produced aligned with the structure and logic of the existing code, Cosine trained the model to gather contextual information before making changes—a critical element that distinguished Genie from traditional large language models.

Genie’s agentic loop is streamlined yet powerful, consisting of four core processes: planning, retrieval, code writing, and code running. These processes are common across AI coding tools, but Genie’s unique ability to perform each task as a human engineer would—rather than following the patterns of a base LLM—gives it a performance edge.

Cosine also implemented a self-improvement mechanism during Genie’s training, which significantly enhanced its capabilities. Initially, most of the training data reflected ‘perfect’ states of code, which limited the model’s ability to handle mistakes. By generating synthetic data using previous versions of Genie, the team was able to inject examples of failure into the training pipeline, teaching the model how to correct itself. Over time, each iteration of Genie became more adept at problem-solving, with less correction needed from the synthetic training data, leading to a more robust model with each cycle.

Genie can write software in a range of the most popular software languages, with at least 15 listed in the technical report  including:

  • JavaScript
  • Python
  • TypeScript
  • TSX
  • Java
  • C#
  • C++
  • C
  • Rust
  • Scala
  • Kotlin
  • Swift
  • Golang
  • PHP
  • Ruby

What Are Genie’s Advantages?

Cosine’s Genie seems to hold a distinct advantage over its competitors, primarily due to its unique training approach and autonomous capabilities. According to the company blog, most existing AI tools focus on code generation by prompting general-purpose models like GPT-4, which struggle with specialized tasks like software engineering. In contrast, Cosine has invested in a curated dataset that accurately reflects the cognitive processes of software engineers, allowing Genie to perform at a far higher level.

For instance, while GPT-4 scored a meager 1.31% on the SWE-Bench, even with agentic loops, Cosine’s Genie scored 30%. This staggering performance differential illustrates Genie’s ability to tackle complex tasks that would typically overwhelm general-purpose models. Additionally, because Cosine’s dataset is so flexible, it can be ported to any foundational model, allowing the company to stay ahead of the curve as AI technology evolves.

One of Cosine’s major innovations is its focus on incremental learning. By continuously improving Genie through iterative training, the AI becomes smarter with each iteration, learning from its mistakes and refining its approach. This self-improving loop, as described in the Cosine blog, is central to Genie’s ability to handle tasks at a scale and complexity that other models cannot.

Future Directions for Cosine and AI Coding

Cosine’s Genie is just the beginning of a broader vision for AI in software development. As outlined by Pullen, the company’s long-term goal is to build a family of models that can handle a wide range of tasks, from simple coding jobs to highly complex, large-scale projects. The flexibility of Cosine’s approach allows them to adapt their dataset to new foundational models as they emerge, ensuring that Genie remains at the cutting edge of AI technology.

According to Cosine, the future of AI coding lies in creating specialized models that can interact seamlessly with human teams. The company envisions a world where AI engineers are as common as human engineers, able to jump into codebases and solve problems faster than their human counterparts. Pullen sees this as a revolutionary shift, one that could dramatically reduce the time and resources needed to develop software.

Beyond software engineering, Cosine has broader ambitions. The company believes that the techniques they have pioneered—capturing human reasoning and decision-making in AI—can be applied to other industries. While software engineering is the most intuitive starting point, the potential applications are vast, and Cosine is already exploring how their AI models can be used in fields like design, research, and beyond.