bioTCAD: AI Component Analysis and Expansion Landscape

Erudio Bio, Inc. — Confidential Internal Document

Analysis of AI components latent within the bioTCAD provisional filing, and a structured brainstorm of AI-driven expansion vectors.
Document prepared by Sunghee Yun based on provisional patent on bioTCAD, June 10, 2026

Overview

bioTCAD (Measurement-Anchored Computational Simulation for Computer-Aided Design and Optimization of Molecules) is, at its core, an AI-integrated platform whose artificial intelligence components span the full design-synthesize-measure-recalibrate loop. The following analysis is organized into five zones: (1) AI already embedded within the current provisional disclosure; (2) AI components that would feed upstream into bioTCAD; (3) AI models derivable from bioTCAD as a data-generation platform; (4) expansions of bioTCAD’s molecular and therapeutic scope; and (5) longer-horizon, frontier directions.

Zone 1 — AI Already Latent in the Provisional Disclosure

A careful reading of the claims and detailed description reveals that bioTCAD already constitutes a multi-layer AI system. The following components are explicitly asserted or clearly implied within the current filing.

Deep Learning Structure Prediction (Stage 1)

Claims 25 and 27 explicitly invoke AlphaFold3 and the RoseTTAFold family (RoseTTAFold, RoseTTAFold2, RoseTTAFold3) as the structural initialization stage of the pipeline. Critically, the disclosure identifies a systematic bias in top-ranked predicted models — over-prediction of secondary structure content relative to experiment — and proposes mitigation through multi-model ensemble carry-forward. This bias characterization and its correction constitute a learned heuristic that could be formalized into an ensemble weighting model, representing a refinement layer on top of existing structure prediction infrastructure.

Machine-Learning Force-Field Potentials

The definitions section explicitly enumerates “a machine-learning potential or surrogate model” as a valid model type subject to measurement-anchored calibration. This means the claims already cover the application of learned interatomic potentials — such as ANI, NequIP, or MACE — within the bioTCAD calibration loop. The combination of a data-driven force field with per-molecule experimental anchoring is a substantively novel pairing not present in the existing literature.

Simulation-Based Inference (SBI)

Claim 26 explicitly names simulation-based inference as a parameter calibration strategy. SBI — encompassing neural posterior estimation and likelihood-free inference — enables probabilistic, uncertainty-aware parameter fitting from simulation outputs without requiring an explicit likelihood function. Its inclusion elevates bioTCAD from a point-estimate calibration engine to a full Bayesian inference framework, with implications for uncertainty quantification throughout the design pipeline.

Contact-Map Classifier

Claims 13 through 15 describe a residue–residue contact probability map computed over simulation trajectories, with bind/no-bind classification based on the number, location, and stability of contacts. This two-dimensional contact fingerprint is a natural input representation for a convolutional neural network (CNN) classifier — a connection the disclosure leaves implicit but which is technically straightforward to instantiate given sufficient labeled data.

Labeled Simulation Data Factory

Claim 33 explicitly states: “further comprising outputting labeled simulation data and using the labeled simulation data to train a machine-learning model.” This is arguably the most strategically significant latent AI component in the filing. bioTCAD is not merely a molecular design tool; it is a proprietary, measurement-validated training data generation platform. Every calibration run produces ground-truth-anchored molecular trajectories — structurally labeled, affinity-labeled, and thermodynamically decomposed — that are otherwise inaccessible from public databases.

Candidate Ranking and Selection Engine

Claim 34 describes application of the method across a plurality of candidate molecules, with ranking by predicted property and selection of top candidates for synthesis. This constitutes a learned scoring and ranking system whose performance improves as the calibrated dataset grows — an asset with compounding value over time.

Zone 2 — AI Components to Feed INTO bioTCAD

The following AI components are not present in the current disclosure but represent natural upstream integrations that would substantively enhance bioTCAD’s throughput, accuracy, and autonomy.

Generative Sequence Design as an Upstream Proposal Engine

Generative models such as RFdiffusion (a diffusion-based backbone generator) and ProteinMPNN (a message-passing neural network sequence designer) can produce thousands of candidate binder backbones and sequences against a defined target. bioTCAD is ideally positioned as the high-fidelity, measurement-anchored downstream filter in this pipeline: generative AI proposes at scale, and bioTCAD validates with physics grounded in experimental measurement. This combination — generative proposal followed by calibrated simulation screening — represents a kill chain that no existing competitor has fully assembled.

Inverse-Problem Neural Network for CD Spectrum Interpretation

The current workflow requires manual comparison of CD-derived secondary structure fractions against DSSP-computed simulation fractions. A neural network trained on paired (CD spectrum, known secondary structure) datasets could directly infer secondary structure composition with calibrated uncertainty bounds from raw spectral data, eliminating an intermediate manual step and accelerating the calibration loop. This constitutes an inverse-problem solver for the CD measurement stage.

Active Learning Oracle for Experimental Design

Rather than selecting the next candidate molecule heuristically, a Bayesian optimization engine could identify the most informationally valuable next experiment — the molecule whose synthesis and measurement would maximally reduce uncertainty in the calibrated model. This transforms bioTCAD’s closed design loop into an autonomous, self-directing experimental system, minimizing the number of wet-lab cycles required to converge on an optimized candidate.

Graph Neural Network for Force-Field Selection

Claim 11 identifies the selection between CHARMM-type and AMBER-type force fields as a meaningful design decision with molecule-dependent outcomes. A graph neural network trained on the growing library of bioTCAD-calibrated molecules could learn to predict which force-field family will calibrate most accurately for a given molecular scaffold, eliminating a significant source of human judgment from the workflow and reducing calibration iteration cycles.

Zone 3 — AI Models Derivable FROM bioTCAD

bioTCAD’s most durable competitive advantage may lie not in the simulation engine itself, but in the proprietary dataset it generates at scale. The following AI models become trainable as that dataset accumulates.

Per-Molecule Foundation Model

Each bioTCAD calibration run produces a measurement-anchored molecular dynamics trajectory — a data artifact of a quality and specificity that does not exist in any public database. Aggregating hundreds of such trajectories creates a corpus suitable for fine-tuning a molecular foundation model (e.g., ESM-3 for peptides and proteins, or MolFormer for small molecules) on experimentally validated, physics-grounded representations. The resulting model would understand molecular behavior through the lens of wet-lab-validated simulation — a fundamentally stronger foundation than models trained on noisy, heterogeneous public data.

Affinity Surrogate Model

Once a sufficient number of (contact map, experimentally measured K_d) pairs have been accumulated through bioTCAD runs, a fast ML surrogate model can be trained to predict binding affinity directly from contact map representations. Such a surrogate would operate in milliseconds rather than the hours required for full MD trajectories, enabling virtual screening of candidate libraries orders of magnitude larger than those accessible to full simulation. The critical differentiator is that this surrogate is trained exclusively on calibrated, measurement-anchored data — not the publicly available, experimentally heterogeneous datasets used by competing approaches.

Parameter Recommender Network

The calibration parameters derived for each molecule — including Lennard-Jones epsilon scaling factors and the empirical correction coefficients a and b in Equation 1 — vary systematically with molecular scaffold and composition. A sequence-to-parameter predictor trained on the accumulated calibration library could initialize new calibration runs from a learned prior, dramatically reducing the number of simulation iterations required to converge. This is directly analogous to the technology computer-aided design (TCAD) parameter libraries used in semiconductor manufacturing, where process parameters are transferred across similar device geometries.

Thermodynamic Decomposition Model

Claim 32 explicitly describes the decoupling of enthalpic (ΔH) and entropic (ΔS) contributions to binding through strategic residue substitution. A model trained on many calibrated thermodynamic decompositions could learn to predict the thermodynamic signature of a molecular scaffold — identifying whether a given chemical series is enthalpy-dominated or entropy-dominated, and recommending the substitution strategy most likely to improve the relevant term. This encodes medicinal chemistry intuition that typically requires years of empirical experience.

Zone 4 — Expanding bioTCAD’s Molecular and Therapeutic Scope

The claims are deliberately broad with respect to molecule type and measured property. The following expansions are fully within the scope of the current disclosure and represent addressable market extensions.

Small-Molecule bioTCAD

The definitions section explicitly includes small molecules as a covered molecule type. Anchoring MD simulations to ADMET measurements — absorption, distribution, metabolism, excretion, and toxicity assays, including permeability (PAMPA, Caco-2) and solubility measurements — would extend bioTCAD into the dominant pharmaceutical drug class. The calibration framework is identical; only the experimental anchoring modality changes.

Antibody and Nanobody bioTCAD

Complementarity-determining region (CDR) loops of antibodies and nanobodies present a well-defined calibration target: structural measurements of CDR loop conformation (via NMR or crystallography) can anchor force-field parameters governing loop flexibility, and SPR or BLI measurements can anchor affinity correction. The resulting calibrated models would enable rational CDR optimization without exhaustive combinatorial library screening.

Nucleic Acid bioTCAD

RNA aptamers and siRNA therapeutic candidates are amenable to the same measurement-anchored calibration framework using structural observables (CD, NMR) and binding affinity measurements. As RNA therapeutics represent a rapidly growing drug class, this extension addresses a significant unmet computational need.

Selectivity Panel bioTCAD

Running bioTCAD against not a single target but a curated panel — the intended therapeutic target plus its closest structural homologs — and calibrating each independently would yield a selectivity map: a quantitative, simulation-derived prediction of off-target binding risk. This output has direct value for both lead optimization and regulatory submission, constituting a computational selectivity safety assessment.

VSA × bioTCAD Vertical Integration

This is the most strategically significant expansion opportunity specific to Erudio Bio. The VSA platform generates dynamic force spectroscopy (DFS) data via single-molecule bond rupture measurements. Section IX of the disclosure explicitly claims DFS — interpreted through a Bell–Evans model — as a valid affinity-anchoring modality for bioTCAD. This means VSA’s primary output directly feeds bioTCAD’s calibration input, creating a vertically integrated, proprietary measurement-to-simulation pipeline that no external competitor can replicate. The DFS chip under development becomes not merely a diagnostic device but a molecular design input instrument.

Proteolytic Stability bioTCAD

Proteolytic degradation is a primary failure mode for peptide therapeutics in vivo. Calibrating MD simulations to protease stability assay data (enzymatic half-life measurements) would enable in silico prediction of proteolytic stability and rational design of degradation-resistant peptide candidates — a property currently addressed only empirically in the drug development process.

Material Surface and Biosensor bioTCAD

The definitions section includes “material surface” as a valid binding target. Applied to the functionalized tip surfaces of VSA biosensor chips, bioTCAD’s calibration framework could inform the rational design of surface chemistries that maximize specific binding signal and minimize non-specific adsorption — closing a design loop between Erudio Bio’s diagnostic and therapeutic platforms.

Multi-Target Allosteric Design

Extending bioTCAD to simulate a molecule simultaneously against multiple conformational states of a target — including allosteric sites and cryptic pockets — would enable the design of molecules that modulate protein function through mechanisms beyond competitive active-site binding, accessing a broader and less competed target landscape.

Zone 5 — Frontier Directions

The following directions represent longer-horizon opportunities that extend materially beyond the current disclosure but are grounded in its technical foundations.

Closed-Loop Autonomous Drug Design Foundry

Equipping a large language model (LLM) agent with bioTCAD as a callable computational tool, a synthetic chemistry procurement API, and the VSA measurement pipeline would enable fully autonomous closed-loop drug discovery: the agent proposes a candidate, invokes bioTCAD to compute a calibrated affinity prediction, triggers synthesis if the prediction clears a defined threshold, receives experimental measurement data, recalibrates, and iterates. The LLM contributes chemical intuition and literature synthesis; bioTCAD contributes measurement-grounded physics; VSA contributes rapid per-molecule affinity data. No comparable end-to-end autonomous system currently exists in the field.

Molecular Digital Twin

Each bioTCAD calibration run constitutes an instance of a molecule’s digital twin — a simulation artifact personalized to that specific molecule’s experimentally observed behavior. As additional measurements are incorporated, the twin’s predictive accuracy increases. This framing — borrowed from industrial manufacturing and IoT — positions bioTCAD not as a generic simulation tool but as a platform for creating validated virtual models of individual molecules, with direct implications for regulatory submissions, investor communications, and partnership negotiations.

Causal and Counterfactual AI for Molecular Design

The majority of machine learning models in drug discovery are correlational — they identify statistical associations between molecular features and biological activity without capturing causal mechanisms. bioTCAD’s experimental intervention structure — systematically varying parameters and observing simulated outcomes — is inherently causal. A causal model trained on bioTCAD runs could answer rigorously counterfactual questions: if residue 4 were substituted with alanine, how would ΔH and ΔS change independently? This level of mechanistic reasoning is currently inaccessible to standard ML approaches.

Cross-Molecule Transfer Learning and Universal Calibration Prior

Each molecule’s calibrated parameter set constitutes a data point in a high-dimensional parameter space. A model that learns the mapping from molecular topology and composition to calibration parameters is constructing a universal prior for measurement-anchored simulation. New molecules would initialize calibration from this learned prior rather than from generic force-field defaults, converging to accurate calibration with fewer experimental measurements. This represents compounding return on every calibration run ever performed — a data asset that appreciates with scale.

Regulatory AI Co-Pilot

For regulatory submissions under MFDS (Korea) and FDA frameworks, bioTCAD simulation outputs anchored to experimental measurements could constitute admissible computational pharmacology evidence, provided appropriate validation documentation is supplied. An AI co-pilot trained on regulatory guidance documents and bioTCAD output formats could automate the translation of calibrated simulation results into regulatory submission language — a capability with standalone commercial value for pharmaceutical clients operating across multiple regulatory jurisdictions.

Summary

The table below maps each AI component to its zone and strategic significance.

Zone	Component	Strategic Significance
1	DL structure prediction (AlphaFold3 / RoseTTAFold3)	Already claimed; pipeline entry point
1	ML force-field potentials	Novel combination with calibration loop
1	Simulation-based inference	Bayesian uncertainty quantification
1	Contact-map classifier	Natural CNN application
1	Labeled data factory	Proprietary training data moat
1	Candidate ranking engine	Compounding dataset value
2	Generative sequence design filter	Scale + fidelity kill chain
2	Inverse-problem CD neural net	Accelerated calibration loop
2	Active learning oracle	Autonomous experimental design
2	GNN force-field selector	Reduced human judgment dependency
3	Per-molecule foundation model	Proprietary data → model moat
3	Affinity surrogate model	Million-compound virtual screening
3	Parameter recommender	Reduced calibration cost per molecule
3	Thermodynamic decomposition model	Encoded medicinal chemistry intuition
4	Small-molecule bioTCAD	Dominant drug class expansion
4	Antibody / nanobody bioTCAD	Biologics market entry
4	VSA × bioTCAD vertical integration	Unique Erudio Bio moat
4	Selectivity panel bioTCAD	Regulatory safety asset
4	Proteolytic stability bioTCAD	In vivo relevance
5	Autonomous drug design foundry	Full closed-loop autonomy
5	Molecular digital twin	Regulatory + investor narrative
5	Causal / counterfactual AI	Mechanistic reasoning beyond correlation
5	Universal calibration prior	Compounding ROI on all past runs
5	Regulatory AI co-pilot	Standalone commercial product

Document prepared for internal strategic use. All content derived from the bioTCAD provisional patent disclosure and proprietary analysis.
Confidential — Erudio Bio, Inc.