Cardiosim-Tox: An Interpretable Multitask Deep Learning QSAR Platform with Multi-Modal Feature Fusion for hERG, Cav1.2, and Nav1.5 Blockade Risk and Potency Prediction

Cardiosim-Tox is an interpretable multitask deep learning QSAR platform for predicting cardiac ion-channel blockade risk and potency across three major cardiac ion channels: hERG, Cav1.2, and Nav1.5.

This repository contains the tutorial notebooks used for molecular descriptor calculation, chemical diversity analysis, applicability domain analysis, model training, and model evaluation for binary classification and regression tasks.

Study Overview

Drug-induced cardiotoxicity remains one of the major causes of drug attrition and post-market withdrawal. A key mechanism underlying this toxicity is the blockade of cardiac ion channels, particularly hERG, Cav1.2, and Nav1.5, which are closely associated with cardiac electrophysiological disturbances.

Cardiosim-Tox Framework

Overall framework of the Cardiosim-Tox platform for multi-modal feature fusion, multitask learning, and prediction of cardiac ion-channel blockade risk and potency.

To support early-stage cardiotoxicity screening, this study develops Cardiosim-Tox, a multi-modal and multitask QSAR platform that predicts:

Blockade risk
Binary classification of compounds as blockers or non-blockers.
Blockade potency
Regression prediction of pIC₅₀ values.

The platform integrates multiple molecular representations, including:

Molecular fingerprints
Mordred molecular descriptors
Graph-based molecular encodings

The model is designed under a multitask learning framework, allowing shared learning across multiple cardiac ion-channel endpoints. This enables the model to capture common molecular patterns associated with ion-channel blockade while maintaining endpoint-specific prediction capacity.

Main Features

Multitask deep learning for hERG, Cav1.2, and Nav1.5 prediction
Binary classification for ion-channel blockade risk
Regression modeling for pIC₅₀ potency prediction
Multi-modal molecular feature fusion
Molecular descriptor calculation workflow
Chemical diversity analysis using UMAP
Applicability domain analysis
External validation and performance evaluation
Interpretable modeling framework supporting QSAR-based cardiotoxicity screening

Repository Contents

This repository includes the following tutorial notebooks:

Notebook	Description
`Biner_Descriptor Calculation.ipynb`	Calculates molecular descriptors and prepares features for binary classification tasks.
`Chemical Diversity Test_UMAP.ipynb`	Performs chemical space visualization and diversity analysis using UMAP.
`Training_MTL_Binary Classification.ipynb`	Trains multitask learning models for binary ion-channel blockade classification.
`Evaluation_MTL_Binary Classification.ipynb`	Evaluates trained multitask classification models.
`Training_MTL_Regression.ipynb`	Trains multitask learning models for pIC₅₀ regression prediction.
`Evaluation_MTL_Regression.ipynb`	Evaluates trained multitask regression models.
`Aplicability Domain_Analysis.ipynb`	Performs applicability domain analysis for model reliability assessment.

Note: The notebook name Aplicability Domain_Analysis.ipynb follows the current file name in this repository. For consistency, it may be renamed to Applicability Domain_Analysis.ipynb.

Dataset

The development and testing datasets used in this study are available on Zenodo:

Please download the dataset from the Zenodo record before running the notebooks. After downloading, place the dataset files in the appropriate data directory used by each notebook, or update the file paths inside the notebooks according to your local folder structure.

The dataset includes data used for model development and testing of the Cardiosim-Tox platform for hERG, Cav1.2, and Nav1.5 blockade risk and potency prediction.

Recommended Workflow

The notebooks should be run in the following order:

1. Descriptor Calculation

Run:

Biner_Descriptor Calculation.ipynb

This notebook prepares molecular descriptors and molecular features used for model development. It may include preprocessing of SMILES strings, descriptor calculation, feature cleaning, and dataset preparation.

2. Chemical Diversity Analysis

Run:

Chemical Diversity Test_UMAP.ipynb

This notebook analyzes the chemical diversity of the dataset and visualizes the molecular chemical space using UMAP. This step helps evaluate whether training, validation, and external test compounds occupy similar or distinct regions of chemical space.

3. Train Multitask Binary Classification Model

Run:

Training_MTL_Binary Classification.ipynb

This notebook trains the multitask deep learning model for binary classification of ion-channel blockade risk across hERG, Cav1.2, and Nav1.5.

The output of this notebook may include trained model files, training history, loss curves, and classification-related outputs.

4. Evaluate Multitask Binary Classification Model

Run:

Evaluation_MTL_Binary Classification.ipynb

This notebook evaluates the trained binary classification model using metrics such as:

Accuracy
Sensitivity
Specificity
Precision
F1-score
ROC-AUC
PR-AUC

The evaluation is performed for each cardiac ion-channel endpoint.

5. Train Multitask Regression Model

Run:

Training_MTL_Regression.ipynb

This notebook trains the multitask deep learning model for potency prediction using pIC₅₀ values.

The model learns regression tasks for cardiac ion-channel blockade potency prediction.

6. Evaluate Multitask Regression Model

Run:

Evaluation_MTL_Regression.ipynb

This notebook evaluates regression model performance using metrics such as:

R²
RMSE
MAE

7. Applicability Domain Analysis

Run:

Aplicability Domain_Analysis.ipynb

This notebook evaluates whether query compounds fall within the applicability domain of the trained QSAR models. Applicability domain analysis is important for assessing the reliability of model predictions, especially for external or structurally novel compounds.

Installation

1. Clone the Repository

git clone https://github.com/YOUR_USERNAME/Cardiosim-Tox.git
cd Cardiosim-Tox

Replace YOUR_USERNAME with your GitHub username or organization name.

2. Create a Python Environment

Using conda:

conda create -n cardiosimtox python=3.10
conda activate cardiosimtox

Or using venv:

python -m venv cardiosimtox_env
cardiosimtox_env\Scripts\activate

For macOS/Linux:

source cardiosimtox_env/bin/activate

3. Install Required Packages

Install the main dependencies:

pip install numpy pandas scipy scikit-learn matplotlib seaborn tqdm jupyter notebook
pip install rdkit mordred
pip install torch torchvision torchaudio
pip install torch-geometric
pip install shap umap-learn

Depending on your system and CUDA version, PyTorch and PyTorch Geometric installation may require specific commands. Please refer to the official installation pages if GPU support is needed.

4. Launch Jupyter Notebook

jupyter notebook

Then open and run the notebooks according to the recommended workflow.

Input Data

The notebooks require curated molecular datasets containing molecular structures and endpoint labels or pIC₅₀ values.

Typical input columns may include:

Column	Description
`SMILES`	Molecular structure in SMILES format
`hERG`	Binary label or pIC₅₀ value for hERG
`Cav1.2`	Binary label or pIC₅₀ value for Cav1.2
`Nav1.5`	Binary label or pIC₅₀ value for Nav1.5

The exact column names should be adjusted according to the dataset format used in each notebook.

Output Files

Depending on the notebook, the workflow may generate:

Processed descriptor files
Molecular fingerprint matrices
Training and validation datasets
Trained deep learning model files
Model performance tables
ROC and PR curves
Regression scatter plots
UMAP chemical space visualizations
Applicability domain results
Prediction outputs for external compounds

Model Tasks

Cardiosim-Tox supports two main prediction tasks.

Binary Classification

The binary classification task predicts whether a compound is likely to block a given cardiac ion channel.

Endpoints:

hERG blocker / non-blocker
Cav1.2 blocker / non-blocker
Nav1.5 blocker / non-blocker

Regression

The regression task predicts ion-channel blockade potency as pIC₅₀.

Endpoints:

hERG pIC₅₀
Cav1.2 pIC₅₀
Nav1.5 pIC₅₀

Applicability Domain

Applicability domain analysis is included to assess whether a compound is located within the chemical space covered by the training data.

Predictions for compounds outside the applicability domain should be interpreted carefully, as model reliability may decrease for structurally distant molecules.

Citation

If you use this repository, model, or workflow in your research, please cite:

Nursyafi FS, Kang B, Eom J, Izza RN, Fauzan AL, Putri NH, Hanum UL, Olivia Y, Rose SA, Lim KM.
Cardiosim-Tox: An Interpretable Multitask Deep Learning QSAR Platform with Multi-Modal Feature Fusion for hERG, Cav1.2, and Nav1.5 Blockade Risk and Potency Prediction.

Once the article is published, please update this section with the final journal citation and DOI.

Authors

Fauzan Syarif Nursyafi
Kang-Byunggyu
Eom-Junhyeok
Rahmafatin Nurul Izza
Abdul Latif Fauzan
Nadilla Hafani Putri
Ulfa Latifa Hanum
Yossi Olivia
Sanjida Afrin Rose
Ki Moo Lim

Corresponding Author

Ki Moo Lim
Department of Biomedical Engineering
Kumoh National Institute of Technology
Gumi, Republic of Korea

Email: kmlim@kumoh.ac.kr

Disclaimer

Cardiosim-Tox is intended for research purposes only. Model predictions should not be used as the sole basis for clinical, regulatory, or safety-critical decisions. Experimental validation is recommended for compounds of interest.

Keywords

cardiotoxicity, QSAR, multitask learning, deep learning, hERG, Cav1.2, Nav1.5, ion-channel blockade, molecular descriptors, molecular fingerprints, graph neural network, applicability domain, computational toxicology

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
SHAP Analysis v2		SHAP Analysis v2
Tutorial		Tutorial
Cardiosim-Tox Framework.png		Cardiosim-Tox Framework.png
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Cardiosim-Tox: An Interpretable Multitask Deep Learning QSAR Platform with Multi-Modal Feature Fusion for hERG, Cav1.2, and Nav1.5 Blockade Risk and Potency Prediction

Study Overview

Cardiosim-Tox Framework

Main Features

Repository Contents

Dataset

Recommended Workflow

1. Descriptor Calculation

2. Chemical Diversity Analysis

3. Train Multitask Binary Classification Model

4. Evaluate Multitask Binary Classification Model

5. Train Multitask Regression Model

6. Evaluate Multitask Regression Model

7. Applicability Domain Analysis

Installation

1. Clone the Repository

2. Create a Python Environment

3. Install Required Packages

4. Launch Jupyter Notebook

Input Data

Output Files

Model Tasks

Binary Classification

Regression

Applicability Domain

Citation

Authors

Corresponding Author

Disclaimer

Keywords

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages