SOFTWARE STACK · ONNX-TO-BINARY COMPILER

7 stages. ONNX in. Binary out.

The NeuraEdge compiler transforms standard ONNX models into optimized binary instructions for the 256-MAC systolic array. Every stage is implemented in Python, fully readable, and extensible for custom operators.

15 Python modules14 operatorsResNet-18 verifiedINT8 cosine sim: 0.91195.9% peak utilisation

COMPILER PIPELINE

From model to silicon in 7 stages.

ONNX ParserGraph OptimizerQuantizerTensor TilerSystolic SchedulerDescriptor GeneratorBinary Generator
Stage 1

ONNX Parser

Loads .onnx model, validates graph topology, extracts operator nodes and tensor shapes

Stage 2

Graph Optimizer

Fuses consecutive operators, eliminates dead nodes, constant-folds where possible

Stage 3

Quantizer

INT8 quantization with per-tensor calibration. Preserves accuracy via min/max range mapping

Stage 4

Tensor Tiler

Splits large tensors into 2×2 tile-sized chunks that fit PE-local SRAM (512B per PE)

Stage 5

Systolic Scheduler

Generates execution schedule for 256-MAC systolic array. Handles data dependencies and tile ordering

Stage 6

Descriptor Generator

Produces hardware instruction descriptors: DMA commands, PE configurations, NoC routing tables

Stage 7

Binary Generator

Emits final .npu binary with instruction stream, weight blobs, and activation buffers

SUPPORTED OPERATORS

14 operators. Categorized by function.

Conv2DCompute
MatMulCompute
GEMMCompute
ReLUActivation
MaxPoolPooling
Elemwise AddElementwise
ConcatMemory
ReshapeMemory
FlattenMemory
SoftmaxActivation
BatchNormNormalization
GlobalAvgPoolPooling
IdentityPassthrough
ReduceMeanReduction

Additional operators available in v2.0 roadmap. Extension for custom operators is documented in the compiler architecture guide.

BENCHMARK RESULTS

Verified models. Measured performance.

All benchmarks are simulation-derived on the SKY130A process node using gate-level simulation with SPEF-extracted parasitics. No silicon measurement exists.

ModelLatencyEnergyINT8 SimilarityTilesBinary
ResNet-18190.9 ms3.59 mJ0.91125,864404 KB
DS-CNN1.20 ms22.5 µJ0.9341,24728 KB
MobileNetV2391 ms0.88741,203612 KB

All benchmarks simulation-only (SKY130A). No silicon measurement exists.

OUTPUT FORMATS

Four output formats. One pipeline.

.bin

Raw binary instruction stream + weight blobs

.hex

Intel HEX format for firmware flashing

.c array

C header + source for embedded integration

firmware header + source

Platform-specific firmware wrapper with driver API

OPERATOR COVERAGE

Operator coverage across common ONNX model families.

This matrix shows exactly which models work today versus which need v1.1 or v2.0 compiler support. It prevents post-purchase surprise.

Operatorv1.0v1.1v2.0Models requiring it
Conv2DAll CNNs
MatMul / GEMMTransformers, BERT
ReLU / ReLU6MobileNet, ResNet
MaxPoolResNet, EfficientNet
GlobalAvgPoolMobileNetV2
Concat / ReshapeYOLO variants
BatchNormMost CNNs
DepthwiseConv2DMobileNetV2, EfficientNet
TransposeConvYOLO decoder, GANs
LayerNormBERT-tiny, transformers
GELU / SiLUEfficientNet, YOLOv8
Attention (MHSA)BERT, ViT
Slice / GatherYOLO, detection heads

COMPILER FREQUENCY MIGRATION

v2.0 compiler update required for TSMC 40nm.

v1.0 compiler targets: 50 MHz, SKY130A SRAM latencies

v2.0 compiler update required for TSMC 40nm:

hardware_config_tsmc40nm.json: updated clock period, SRAM read latency, write latency, pipeline cycle counts

— Regression suite: cosine similarity re-verification for all supported models at 400 MHz timing

— ResNet-18 baseline: 0.911 INT8/FP32 cosine similarity to be re-verified post-migration

No RTL changes required for frequency migration. Compiler configuration file update only.

FULL SOFTWARE DELIVERY PACKAGE

32 files. Compiler, drivers, SDK, examples.

Every word of this is a competitive advantage against vendors who deliver encrypted RTL with no software.

Compiler (14 Python modules)

pipeline.pyonnx_parser.pyquantizer.pytiler.pyscheduler.pydesc_gen.pybinary_gen.pynoc_mapper.pygraph_optimizer.pymemory_planners.pyperf_model.pyequivalence_checker.pybit_accurate_model.pyisa.py

Firmware / drivers (14 files)

Bare-metal drivernpu_driver.c / npu_driver.h
Hardware Abstraction LayerHAL
CSR register access library
Linux kernel driverneuraedge_drv.c
Linux IOCTL interfaceneuraedge_ioctl.c / .h
Linux power managementneuraedge_pm.c
Device tree source.dts
Linker script
Startup code
CMake build system

SDK + Example

SDKneuraedge.h / neuraedge.c / neuraedge.py
Exampleresnet18_demo.py — end-to-end inference demo

Evaluate the compiler yourself.

Schedule a technical review and walk through the compiler pipeline live. We will compile your model on screen and show you the binary output.