OctopusNet is a modular neural network that learns without global backpropagation. Four independent modules process the same image at different resolutions, each trained locally with Hinton's Forward-Forward algorithm, and a central coordinator aggregates their outputs via attention. The result: 64.34% on CIFAR-10 with zero global gradients between modules β and a resilience floor of 61.12% when any single module fails.
The design is inspired by the octopus nervous system, where ~2/3 of neurons live in the arms and compute locally before sending signals to the brain. Each module here is an arm.
Undergraduate thesis: Erick, 2026.
Centralized networks are fragile. When any component fails, the system collapses.
| Model | Normal accuracy | Critical module fails | Two modules fail | Degradation |
|---|---|---|---|---|
| CNN (backprop) | 90.96% | 10.00% (random chance) | β | β80.96 pts |
| OctopusNet (FF) | 52.50% | 41.72% | ~30% | β10.78 pts |
| OctopusNet + Channel Grouping (A18b) | 64.17% | 41.47% | 22.32% | β22.70 pts |
| OctopusNet + CG + Module Dropout (A6b) | 64.34% | 61.12% | 52.87% | β3.22 pts |
FF standard had one catastrophic failure point β losing M1 dropped accuracy to 13.89%, near random chance. Channel grouping eliminates that. Module Dropout goes further: every single-module failure stays above 61%, and even with two modules dead simultaneously the system holds above 52%. The floor is structural, not lucky.
Module Dropout costs nothing in normal accuracy (64.34% vs 64.17%) and adds +19.65 points of resilience floor. It is now the default training mode.
This matters for robotics, IoT, autonomous vehicles, and embedded systems where a sensor can fail at any time.
OctopusNet is a neural network that learns without global backprop. Instead of one big network trained end-to-end, it uses N independent processing modules (any differentiable architecture) that each learn locally using Hinton's Forward-Forward algorithm. A central coordinator aggregates their outputs via attention. Current implementation uses CNNs with heterogeneous kernel sizes.
Inspired loosely by the octopus nervous system, where ~2/3 of neurons live in the arms and process information locally before sending signals to the brain.
Key features: multiscale input (each module sees a different resolution), Fourier label overlay (labels encoded as frequency patterns instead of pixel patches), and two training modes: standard backprop coordinator or fully local SFF.
Each module learns to distinguish positive samples (image + correct label overlay) from negative samples (image + wrong label) using a local goodness score. No gradients flow between modules.
| Mode | Accuracy | Epochs | Notes |
|---|---|---|---|
| FF modules + backprop coordinator | 52.75% | 100 | Standard mode |
| FF modules + SFF local coordinator | 53.16% | 100 | 100% local learning |
| Simple ensemble average (SFF) | 53.59% | 100 | Best fully local result |
| Channel Grouping + coordinator | 64.17% | 30 | A18b |
| Channel Grouping + Module Dropout | 64.34% | 30 | Best overall (A6b) β floor 61.12% |
Each module specializes in different classes:
| airplane | auto | bird | cat | deer | dog | frog | horse | ship | truck | |
|---|---|---|---|---|---|---|---|---|---|---|
| M1 | 54% | 48% | 47% | 37% | 52% | 58% | 60% | 55% | 51% | 44% |
| M2 | 52% | 65% | 46% | 38% | 54% | 51% | 54% | 55% | 64% | 56% |
| M3 | 53% | 55% | 50% | 41% | 57% | 55% | 61% | 58% | 55% | 50% |
| M4 | 53% | 58% | 47% | 39% | 53% | 53% | 57% | 55% | 60% | 62% |
| Mechanism | Accuracy | Tradeoff |
|---|---|---|
| Soft attention | 43.72% | Best for N=4 modules |
| Top-K (K=2) | 42.32% | Good for N>>4 |
| Gumbel-softmax | 39.33% | Hard selection, needs more modules |
| Top-K (K=1) | 38.09% | Too sparse for small N |
python train.py --channel_grouping --module_dropout 0.5 --epochs 3064.34% accuracy, single-failure floor 61.12%. Channel grouping (Ortiz Torres et al.) + Module Dropout.
python train.py --dataset cifar10 --epochs 50python train.py --use_sff --dataset cifar10 --epochs 50In SFF mode, an AuxClassifier attaches to each module's feature map and a LogitCoordinator learns attention over their logits. No global backprop anywhere.
--dataset cifar10 | cifar100 | mnist (default: cifar10)
--epochs int (default: 50)
--batch_size int (default: 128)
--bottleneck int (default: 64)
--use_sff flag 100% local SFF mode
--channel_grouping flag CGCNNModule (A18b/A6b)
--module_dropout float module dropout prob (0.5 = A6b)
--no_multiscale flag disable multiscale input
--seed int (default: 42)
--device cuda | cpu (auto-detected)
from config import OctopusNetConfig
from octopusnet import OctopusNet
from train import train
config = OctopusNetConfig(
dataset="cifar10",
epochs=50,
device="cuda"
)
model, history = train(config) # standard mode
model, history = train(config, use_sff=True) # 100% localUpload OctopusNet_Colab.ipynb to Colab and run cells. Includes all experiments, visualizations, and ablations.
| File | Description |
|---|---|
config.py |
Model hyperparameters |
modules.py |
CNN modules + ModuleDecoder |
nerve_ring.py |
Cross-attention lateral communication |
coordinator.py |
Coordinator + AuxClassifier + LogitCoordinator |
octopusnet.py |
Full model |
data.py |
Dataset loaders |
train.py |
Training loop (standard + SFF) |
experiments.py |
Ablation experiments |
OctopusNet_Colab.ipynb |
Interactive notebook |
| ID | What | Key Finding |
|---|---|---|
| A1 | Number of modules (2, 4, 8, 16) | 4 modules optimal |
| A2 | Bottleneck size (8β128) | 64 best accuracy/size tradeoff |
| A6 | Module resilience (FF) | Floor 41.72%, one catastrophic point at 13.89% |
| A7 | With/without feedback | Feedback adds ~0.5% |
| A8 | With/without nerve ring | Nerve ring adds ~1% |
| A9 | Homogeneous vs heterogeneous | Heterogeneous kernels help |
| A10 | GWT competition mechanism | Soft attention wins for N=4 |
| A15b | SFF local coordinator | 53.16%: best fully local mode |
| A18b | Channel grouping (Ortiz Torres) | 64.17%: eliminates catastrophic failures, floor 41.47% |
| A6b | Channel grouping + Module Dropout | 64.34%: floor jumps to 61.12% β +19.65 pts vs A18b, no accuracy cost |
Forward-Forward
- Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations
- Krotov & Hopfield (2023). Training CNNs with the Forward-Forward Algorithm. arXiv:2312.14924
- Krutsylo (2025). Scalable Forward-Forward (SFF). arXiv:2501.03176: basis for SFF local mode
- Ortiz Torres et al. (2025). On Advancements of the Forward-Forward Algorithm. arXiv:2504.21662: 84.7% CIFAR-10, channel grouping technique
- ASGE (2025). Adaptive Spatial Goodness Encoding. arXiv:2509.12394
- SCFF (2025). Self-Contrastive Forward-Forward. Nature Communications: 98.70% MNIST, 80.75% CIFAR-10
- Codellaro et al. (2025). Training CNNs with Forward-Forward: Fourier spatial label encoding. Scientific Reports: basis for Fourier label overlay
Global Workspace & Coordination
- Goyal et al. (ICLR 2022). Coordination Among Neural Modules Through a Shared Global Workspace
- Baars, B. (1988). A Cognitive Theory of Consciousness: original GWT theory
Octopus Neuroscience
- Sumbre, G. et al.: Autonomous arm movements in octopus
- Gutnick, T. et al.: Information flow between brain and arms in octopus
- Hochner, B. (2012). An Embodied View of Octopus Neurobiology. Current Biology
If you use OctopusNet in your research:
@misc{octopusnet2026,
author = {Arriola Aguill\'{o}n, Erick},
title = {OctopusNet: Bio-inspired Distributed Neural Architecture},
year = {2026},
publisher = {GitHub},
url = {https://github.com/ErickUser1/OctopusNet}
}MIT
