Skip to content

ErickUser1/OctopusNet

Repository files navigation

πŸ™ OctopusNet

OctopusNet is a modular neural network that learns without global backpropagation. Four independent modules process the same image at different resolutions, each trained locally with Hinton's Forward-Forward algorithm, and a central coordinator aggregates their outputs via attention. The result: 64.34% on CIFAR-10 with zero global gradients between modules β€” and a resilience floor of 61.12% when any single module fails.

The design is inspired by the octopus nervous system, where ~2/3 of neurons live in the arms and compute locally before sending signals to the brain. Each module here is an arm.

Undergraduate thesis: Erick, 2026.

Why OctopusNet?

Centralized networks are fragile. When any component fails, the system collapses.

Model Normal accuracy Critical module fails Two modules fail Degradation
CNN (backprop) 90.96% 10.00% (random chance) β€” βˆ’80.96 pts
OctopusNet (FF) 52.50% 41.72% ~30% βˆ’10.78 pts
OctopusNet + Channel Grouping (A18b) 64.17% 41.47% 22.32% βˆ’22.70 pts
OctopusNet + CG + Module Dropout (A6b) 64.34% 61.12% 52.87% βˆ’3.22 pts

FF standard had one catastrophic failure point β€” losing M1 dropped accuracy to 13.89%, near random chance. Channel grouping eliminates that. Module Dropout goes further: every single-module failure stays above 61%, and even with two modules dead simultaneously the system holds above 52%. The floor is structural, not lucky.

Module Dropout costs nothing in normal accuracy (64.34% vs 64.17%) and adds +19.65 points of resilience floor. It is now the default training mode.

This matters for robotics, IoT, autonomous vehicles, and embedded systems where a sensor can fail at any time.


What is this?

OctopusNet is a neural network that learns without global backprop. Instead of one big network trained end-to-end, it uses N independent processing modules (any differentiable architecture) that each learn locally using Hinton's Forward-Forward algorithm. A central coordinator aggregates their outputs via attention. Current implementation uses CNNs with heterogeneous kernel sizes.

Inspired loosely by the octopus nervous system, where ~2/3 of neurons live in the arms and process information locally before sending signals to the brain.

Key features: multiscale input (each module sees a different resolution), Fourier label overlay (labels encoded as frequency patterns instead of pixel patches), and two training modes: standard backprop coordinator or fully local SFF.


Architecture

OctopusNet Architecture

Each module learns to distinguish positive samples (image + correct label overlay) from negative samples (image + wrong label) using a local goodness score. No gradients flow between modules.


Results (CIFAR-10)

Mode Accuracy Epochs Notes
FF modules + backprop coordinator 52.75% 100 Standard mode
FF modules + SFF local coordinator 53.16% 100 100% local learning
Simple ensemble average (SFF) 53.59% 100 Best fully local result
Channel Grouping + coordinator 64.17% 30 A18b
Channel Grouping + Module Dropout 64.34% 30 Best overall (A6b) β€” floor 61.12%

Module specialization (A15b)

Each module specializes in different classes:

airplane auto bird cat deer dog frog horse ship truck
M1 54% 48% 47% 37% 52% 58% 60% 55% 51% 44%
M2 52% 65% 46% 38% 54% 51% 54% 55% 64% 56%
M3 53% 55% 50% 41% 57% 55% 61% 58% 55% 50%
M4 53% 58% 47% 39% 53% 53% 57% 55% 60% 62%

GWT Competition (A10)

Mechanism Accuracy Tradeoff
Soft attention 43.72% Best for N=4 modules
Top-K (K=2) 42.32% Good for N>>4
Gumbel-softmax 39.33% Hard selection, needs more modules
Top-K (K=1) 38.09% Too sparse for small N

Training Modes

A6b mode: best overall (recommended)

python train.py --channel_grouping --module_dropout 0.5 --epochs 30

64.34% accuracy, single-failure floor 61.12%. Channel grouping (Ortiz Torres et al.) + Module Dropout.

Standard mode (FF + backprop coordinator)

python train.py --dataset cifar10 --epochs 50

SFF mode: 100% local learning

python train.py --use_sff --dataset cifar10 --epochs 50

In SFF mode, an AuxClassifier attaches to each module's feature map and a LogitCoordinator learns attention over their logits. No global backprop anywhere.

Options

--dataset          cifar10 | cifar100 | mnist  (default: cifar10)
--epochs           int                          (default: 50)
--batch_size       int                          (default: 128)
--bottleneck       int                          (default: 64)
--use_sff          flag                         100% local SFF mode
--channel_grouping flag                         CGCNNModule (A18b/A6b)
--module_dropout   float                        module dropout prob (0.5 = A6b)
--no_multiscale    flag                         disable multiscale input
--seed             int                          (default: 42)
--device           cuda | cpu                   (auto-detected)

Quick Start

from config import OctopusNetConfig
from octopusnet import OctopusNet
from train import train

config = OctopusNetConfig(
    dataset="cifar10",
    epochs=50,
    device="cuda"
)

model, history = train(config)           # standard mode
model, history = train(config, use_sff=True)  # 100% local

Google Colab

Upload OctopusNet_Colab.ipynb to Colab and run cells. Includes all experiments, visualizations, and ablations.


File Structure

File Description
config.py Model hyperparameters
modules.py CNN modules + ModuleDecoder
nerve_ring.py Cross-attention lateral communication
coordinator.py Coordinator + AuxClassifier + LogitCoordinator
octopusnet.py Full model
data.py Dataset loaders
train.py Training loop (standard + SFF)
experiments.py Ablation experiments
OctopusNet_Colab.ipynb Interactive notebook

Ablations

ID What Key Finding
A1 Number of modules (2, 4, 8, 16) 4 modules optimal
A2 Bottleneck size (8–128) 64 best accuracy/size tradeoff
A6 Module resilience (FF) Floor 41.72%, one catastrophic point at 13.89%
A7 With/without feedback Feedback adds ~0.5%
A8 With/without nerve ring Nerve ring adds ~1%
A9 Homogeneous vs heterogeneous Heterogeneous kernels help
A10 GWT competition mechanism Soft attention wins for N=4
A15b SFF local coordinator 53.16%: best fully local mode
A18b Channel grouping (Ortiz Torres) 64.17%: eliminates catastrophic failures, floor 41.47%
A6b Channel grouping + Module Dropout 64.34%: floor jumps to 61.12% β€” +19.65 pts vs A18b, no accuracy cost

References

Forward-Forward

  • Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations
  • Krotov & Hopfield (2023). Training CNNs with the Forward-Forward Algorithm. arXiv:2312.14924
  • Krutsylo (2025). Scalable Forward-Forward (SFF). arXiv:2501.03176: basis for SFF local mode
  • Ortiz Torres et al. (2025). On Advancements of the Forward-Forward Algorithm. arXiv:2504.21662: 84.7% CIFAR-10, channel grouping technique
  • ASGE (2025). Adaptive Spatial Goodness Encoding. arXiv:2509.12394
  • SCFF (2025). Self-Contrastive Forward-Forward. Nature Communications: 98.70% MNIST, 80.75% CIFAR-10
  • Codellaro et al. (2025). Training CNNs with Forward-Forward: Fourier spatial label encoding. Scientific Reports: basis for Fourier label overlay

Global Workspace & Coordination

Octopus Neuroscience

  • Sumbre, G. et al.: Autonomous arm movements in octopus
  • Gutnick, T. et al.: Information flow between brain and arms in octopus
  • Hochner, B. (2012). An Embodied View of Octopus Neurobiology. Current Biology

Cite

If you use OctopusNet in your research:

@misc{octopusnet2026,
  author    = {Arriola Aguill\'{o}n, Erick},
  title     = {OctopusNet: Bio-inspired Distributed Neural Architecture},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/ErickUser1/OctopusNet}
}

License

MIT

About

Modular Forward-Forward Network with independent processing modules and central coordinator. CIFAR-10: 53.16%.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors