Inactiva Labs
PUBLISHED POSITIVE RESULTSNEGATIVE DATA

We generate the other 94%.

94% of drug failure data is never published.

Inactiva Labs runs automated robotic assays to produce the negative pharmaceutical data that AI models need — but no one collects.

THE PROBLEM

The missing majority of pharmaceutical data.

Publication bias silently distorts drug development. Negative results vanish, billions are wasted, and only a fraction of candidates survive — all because most of the evidence never makes it to the table.

6%
of negative results ever get publishedSource: Turner et al., NEJM 2008
$1.4B
average cost of a failed drug trialSource: Bian et al., ScienceDirect 2025
1 in 10
drugs that enter trials ever get approvedSource: Congressional Budget Office

Publication bias breakdown

Positive results published94%
Negative results published6%

Built on peer-reviewed research from NEJM, ScienceDirect, and the Congressional Budget Office

The Data Gap

A 58% performance gap.
Billions left on the table.

MCC (Matthews Correlation Coefficient) measures how well AI models predict drug outcomes. With balanced data, accuracy jumps dramatically.

Based on published benchmarks comparing biased vs. balanced training datasets.

Current AI Drug Prediction0.48–0.55

Current AI drug prediction accuracy

vs
With Inactiva's Balanced Data0.75+

Projected accuracy with balanced negative data

Live Science

We ran the experiment. Here's the data.

Phase 1 complete: a computational degradation study using real NCATS kinetic solubility data and the SC2 blind benchmark. No synthetic data. No assumptions.

85%MCC dropwhen very insoluble compounds are removed from training data (200 models, 20 seeds)
0.19→0.03MCC degradationmeasured on SC2 benchmark (Llinas et al. 2020, 132 compounds)
200models trained10 removal levels × 20 random seeds — NCATS training, SC2 test
Figure 1 — Degradation Curve · Training: NCATS · Test: SC2 (Llinas et al. 2020)
March 2026
Degradation curve showing MCC drops 75% as very insoluble compounds are removed from training data

Blue line (primary):MCC at logS < −5 threshold. As very insoluble compounds are systematically removed from training data, model accuracy on the SC2 benchmark drops from 0.20 to 0.05 — a 75% collapse. The gray dashed line at ~65% represents AqSolDB's actual imbalance level, where current state-of-the-art models operate. Inactiva's data moves models left on this curve.

NCATS AID 1645848SC2 Benchmark · Llinas et al. 2020GradientBoosting + Morgan FP200 models · 10 conditions × 20 seedsOpen source · github.com/Inactiva-Labs
The Solution

The world's first negative
pharmaceutical data foundry.

High-throughput robotic foundry generating standardized, validated negative data — starting with our primary focus:

01Primary Focus

Solubility

Compounds that fail dissolution — the most common early-stage failure mode. Our robotic assays screen thousands of compounds for solubility, generating the negative data that AI models need.

Know which compounds dissolve before wasting months in the lab.

02

Permeability

Coming Soon

Compounds that cannot cross biological membranes — critical for oral bioavailability.

Predict oral drug viability early.

03

Stability

Coming Soon

Compounds that degrade under physiological conditions before reaching their target.

Eliminate unstable candidates before costly trials.

04

Toxicity

Coming Soon

The most expensive failure mode. Structured negative toxicity data no public dataset contains.

Catch toxic compounds before they reach patients.

Ready to move forward?

Ready to move forward? Partner with us.

Get In Touch
THE SCIENCE

From robot to dataset in hours, not months.

End-to-end automated pipeline — robot to ML-ready dataset.

01

Custom Robot

Custom-built 3D-printed liquid handler (<$600 to build) designed for high-throughput screening.

02

PEG Stress Test

Automated stress test pushes proteins across hundreds of conditions per run to find their breaking points.

03

Optical Measurement

Real-time light-based measurement detects if a protein clumps — a key failure signal.

04

Data Logged

Every result — pass or fail — is logged into clean, standardized datasets ready for machine learning.

Structural Moat

A cost advantage
competitors can't match.

Commercial Arms$150k+
  • Locked proprietary software
  • Expensive service contracts
  • Months to reconfigure an assay
vs
Inactiva Custom Robots<$3k
  • Unified Python orchestration
  • Push code to change protocol
  • Full fleet for price of one arm
65%

Lower operating costs vs United States

10x

Robot fleet for the price of one arm

0

Direct competitors in negative pharma data

Get in Touch

Let's talk.

Whether you're an investor, pharma partner, or researcher — we'd love to hear from you.