Cleanlab Raises $25M Series A to Automatically Increase the Value and Accuracy of the World’s Enterprise Data Used by AI, ML, and Analytics Solutions

AI Funding & Investment

Insider Brief

Cleanlab has secured $25 million in Series A funding co-led by Menlo Ventures and TQ Ventures, with participation from Bain Capital Ventures and Databricks Ventures, bringing the total funding to $30 million.
The company provides an automated data curation solution that adds smart metadata, transforming messy data into useful inputs for enterprise analytics, LLM, and AI decisions. Cleanlab’s AI algorithms were developed by founders from MIT and are utilized by over 10% of Fortune 500 companies.
Cleanlab Studio, the company’s flagship platform, has introduced new features and a Trustworthy Language Model (TLM) that provides reliable LLM outputs and adds a trustworthiness score to these outputs. The TLM is available for beta testing on Cleanlab Studio’s website.

PRESS RELEASE — SAN FRANCISCO /October 10, 2023— (BUSINESS WIRE) — Cleanlab, the company behind the automated data curation solution used to increase the dollar value of every data point in enterprise artificial intelligence (AI), large language model (LLM), and analytics solutions, has secured $25 million in Series A funding. This financing round was co-led by Menlo Ventures and TQ Ventures; Menlo Ventures’ Matt Murphy and TQ’s Schuster Tanger will join the board. Existing investor Bain Capital Ventures (BCV) and new investor Databricks Ventures participated in this funding round, which brings Cleanlab’s total funding to $30 million.

Cleanlab helps drive profitability. For today’s businesses, revenue is directly tied to data-driven analytics decisions and generative AI solutions. Bad data costs the U.S. alone over $3 trillion1, and 80 percent of time spent by enterprises is manually improving the data quality.2 Cleanlab is the first enterprise solution that reliably adds smart metadata automatically, removing the vast majority of the work and turning messy, real-world data into useful inputs for various models. This process increases the reliability and profit margin of enterprise analytics, LLM, and AI decisions. Cleanlab also automatically identifies the majority of a dataset containing no issues, increasing the profit margins of enterprise pipelines by avoiding expensive data quality and annotation for the majority of data.

Cleanlab’s novel AI algorithms were developed in-house by the founders, all of whom are PhDs in Computer Science from MIT and published researchers. The team’s proprietary approach to automated data curation builds upon the “confident learning” field created by the Cleanlab team, enabling them to pioneer an enterprise-ready product.

Today, over 10% of Fortune 500 companies (including AWS, JPMorgan Chase, Google, Oracle, and Walmart) and a variety of innovative startups (like ByteDance, HuggingFace, and Databricks) use Cleanlab to find and fix problems in sizable structured and unstructured visual, text, and tabular datasets. Whether building an LLM for enterprise, tagging intents in chatbot text data, or objects in visual navigation data, Cleanlab increases the dollar value of every data point in your dataset by automatically analyzing and correcting outliers, ambiguous data, and mislabeled data.

The company is also announcing that its flagship automated data curation platform, Cleanlab Studio, has launched several new features that address unreliable LLM outputs. Cleanlab’s Trustworthy Language Model (TLM) produces high-quality LLM outputs like ChatGPT, Falcon, and similar LLMs. It also adds a trustworthiness reliability score to all LLM outputs. Cleanlab Studio identifies and fixes issues in all types of datasets, including text, image, and tabular data. TLM extends Cleanlab Studio’s capabilities to add intelligent metadata to help automate reliability and quality assurance for systems that rely on LLM outputs, synthetic data, and generated content. Cleanlab’s Trustworthy Language Model is available to try in Beta today with Cleanlab Studio at cleanlab.ai.

“After working with companies like Microsoft and Tesla to get their AI-driven products to function better and helping MIT and Harvard detect cheating, it became clear that mislabeled and poorly curated data was the core issue behind these challenges,” said Cleanlab Co-Founder and CEO Curtis Northcutt. “It’s the culmination of over a decade of work to introduce Cleanlab Studio, which reimagines what AI and analytics can do for people and enterprises now that we can automate data curation and reliability.”

“While most of the investment in generative AI is chasing the biggest, baddest, and best model, the reality is that there is a massive complimentary opportunity that can shave billions off those efforts and lead to a better outcome. That is Cleanlab,” said Matt Murphy, Partner at Menlo Ventures. “Cleanlab’s amazing team of ML researchers and practitioners has built a data curation platform that fundamentally improves models via better, cleaner data.”

“We are thrilled to partner with Curtis, Jonas and Anish, the eminent authorities on data-centric AI,” said Schuster Tanger, Co-Managing Partner of TQ Ventures. “They have developed a solution to a large and pressing problem for enterprises across almost all industries: namely, ambiguous and wrongly labeled data. In addition to an exceptional team and superior technology, Cleanlab also has real world results from customers that point to Cleanlab’s effectiveness around percent accuracy improvement, percent reduction in labeled transactions required to train models, and dollar reduction in enterprise costs.”

“Cleanlab is well-designed, scalable, and theoretically grounded: It accurately finds data errors, even on well-known and established datasets,” said Patrick Violette, Senior Software Engineer at Google, “After using it for a successful project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”

About Cleanlab

Pioneered at MIT and trusted by hundreds of top organizations, Cleanlab turns unreliable data into reliable models and insights by automatically finding and fixing errors in both structured and unstructured datasets, such as visual, text, and tabular data. Based in San Francisco, Cleanlab was founded in 2021 by three PhDs in Computer Science from MIT.

About Menlo Ventures:

Menlo Ventures is a venture capital firm that strives to have a positive impact on everything we do. That’s why we support businesses, including Anthropic, Carta, Chime, Harness, Poshmark, Pillpack, Pinecone, Roku, Rover, Uber, and Warby Parker, that are reimagining life and work for the better. Over 47 years, we’ve grown a portfolio that includes more than 80 public companies, over 165 mergers and acquisitions, and currently have $5.6 billion under management. We invest at every stage and in every sector, with expertise in Consumer, Enterprise, and Healthcare. From developing market strategies to creating communities, we provide real impact where entrepreneurs need it most. When we’re in, we’re ALL IN. www.menlovc.com @MenloVentures

About TQ Ventures:

Based in New York City, TQ Ventures is a venture capital firm led by Schuster Tanger and Andrew Marks. The firm is generally agnostic on industry vertical and geography, and instead prioritizes partnering with extraordinary founders across the software complex (B2B and B2C). Across our more than 80 global investments, we believe the differentiated support and networks we provide our founders has fueled our reputation and in turn performance record. Founded in 2018, TQ has approximately $1 billion under management and is currently investing out of its third fund. www.tqventures.com

Harvard Business Review. (2016, September). “Bad Data Costs the U.S. $3 Trillion Per Year.” Harvard Business Review. https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year

Ng, A. (2021, March 24). “80 percent of our work is data preparation.” The Batch: AI at Work (Issue 84). deeplearning.ai. https://www.deeplearning.ai/the-batch/issue-84/