OntoCodex GPT built with this framework as a showcase: https://chatgpt.com/g/g-6734e1cd43008190b7245d756c8e75ef-ontocodex
OntoCodex is a multi-agent, LLM-enabled framework for semi-automated ontology enrichment. It integrates large language models, biomedical ontologies, curated knowledge bases, and standard terminologies to support scalable, reproducible, and human-in-the-loop ontology development for real-world biomedical and clinical research.
This repository provides the core framework, agent design, and example workflows for enriching OWL/RDF ontologies with new concepts, relationships, provenance, and standardized codes (e.g., ICD, SNOMED CT, RxNorm, LOINC).
- Multi-agent architecture for ontology enrichment
- Decision Agent
- Ontology Reading Agent
- Knowledge Base Agent
- Terminology Agent
- Script Generation Agent
- Ontology-aware parsing of user-provided OWL/TTL ontologies
- Automated concept extraction from trusted biomedical sources
- Terminology normalization to standard vocabularies (ICD, SNOMED CT, RxNorm, ATC, LOINC)
- Provenance annotation and confidence scoring
- Executable Python code generation for OWL-compliant ontology updates
- Human-in-the-loop design for expert review and validation
- Extensible and modular, suitable for new domains and data sources
Ontology enrichment is essential for semantic interoperability but remains labor-intensive and error-prone. OntoCodex addresses this challenge by:
- Reducing manual curation burden
- Improving consistency of concept definitions and mappings
- Supporting downstream analytics using EHR, claims, and clinical trial data
- Enabling rapid iteration and reuse across biomedical domains
OntoCodex follows a coordinated, agent-based workflow:
-
Decision Agent
Interprets enrichment goals and orchestrates agent interactions. -
Ontology Reader Agent
Parses OWL/TTL ontologies to identify existing classes, properties, and gaps. -
Knowledge Base Agent
Retrieves candidate concepts and relationships from curated sources such as:- MedlinePlus (API / cached TTL)
- PubMed
- NIH / CDC / ADA / ACC / AHA resources
- Disease Ontology (DOID)
-
Terminology Agent
Normalizes extracted concepts to standard vocabularies using CSV, OWL, or TTL datasets. -
Script Generation Agent
Generates executable Python scripts to update ontologies with new classes, axioms, annotations, and provenance.
- ICD-9 / ICD-10
- SNOMED CT
- RxNorm
- ATC
- LOINC
- DOID
- Custom CSV / OWL / TTL vocabularies
OntoCodex supports:
- Source attribution for extracted concepts
- Timestamped annotations
- Confidence scores (LLM-estimated or heuristic-based)
- Reproducible enrichment scripts
This design supports auditability and regulatory-aligned research workflows.
- Multiple Chronic Conditions (MCC)
- Alzheimer’s Disease / ADRD
- Opioid and substance use research
- Clinical trial Common Data Elements (CDEs)
- EHR and claims-based phenotyping
- Outputs require expert review before production use
- LLM-generated content may reflect source bias or incompleteness
- Not intended for direct clinical decision-making
This project is released under the MIT License. See LICENSE for details.
Contributions are welcome!
- Fork the repository
- Create a feature branch
- Add tests and documentation
- Submit a pull request
If you use OntoCodex in your research, please cite:
Feng J, et al. OntoCodex: A Multi-Agent Biomedical Ontology Enrichment Framework. (Manuscript under review).
Tao AI Group
Mayo Clinic, AI&Informatics
For questions or collaboration inquiries, please open an issue or contact the maintainers via GitHub.
OntoCodex is a research framework designed to augment—not replace—domain expertise.
MIT © 2025 Jingna Feng