SOFTWARE STACK · ONNX-TO-BINARY COMPILER

7 stages. ONNX in. Binary out.

The NeuraEdge compiler transforms standard ONNX models into optimized binary instructions for the 256-MAC systolic array. Every stage is implemented in Python, fully readable, and extensible for custom operators.

15 Python modules14 operatorsResNet-18 verifiedINT8 cosine sim: 0.91195.9% peak utilisation

COMPILER PIPELINE

From model to silicon in 7 stages.

ONNX Parser→Graph Optimizer→Quantizer→Tensor Tiler→Systolic Scheduler→Descriptor Generator→Binary Generator

Stage 1

ONNX Parser

Loads .onnx model, validates graph topology, extracts operator nodes and tensor shapes

Stage 2

Graph Optimizer

Fuses consecutive operators, eliminates dead nodes, constant-folds where possible

Stage 3

Quantizer

INT8 quantization with per-tensor calibration. Preserves accuracy via min/max range mapping

Stage 4

Tensor Tiler

Splits large tensors into 2×2 tile-sized chunks that fit PE-local SRAM (512B per PE)

Stage 5

Systolic Scheduler

Generates execution schedule for 256-MAC systolic array. Handles data dependencies and tile ordering

Stage 6

Descriptor Generator

Produces hardware instruction descriptors: DMA commands, PE configurations, NoC routing tables

Stage 7

Binary Generator

Emits final .npu binary with instruction stream, weight blobs, and activation buffers

SUPPORTED OPERATORS

14 operators. Categorized by function.

Conv2DCompute

MatMulCompute

GEMMCompute

ReLUActivation

MaxPoolPooling

Elemwise AddElementwise

ConcatMemory

ReshapeMemory

FlattenMemory

SoftmaxActivation

BatchNormNormalization

GlobalAvgPoolPooling

IdentityPassthrough

ReduceMeanReduction

Additional operators available in v2.0 roadmap. Extension for custom operators is documented in the compiler architecture guide.

BENCHMARK RESULTS

Verified models. Measured performance.

All benchmarks are simulation-derived on the SKY130A process node using gate-level simulation with SPEF-extracted parasitics. No silicon measurement exists.

Model	Latency	Energy	INT8 Similarity	Tiles	Binary
ResNet-18	190.9 ms	3.59 mJ	0.911	25,864	404 KB
DS-CNN	1.20 ms	22.5 µJ	0.934	1,247	28 KB
MobileNetV2	391 ms	—	0.887	41,203	612 KB

All benchmarks simulation-only (SKY130A). No silicon measurement exists.

OUTPUT FORMATS

Four output formats. One pipeline.

.bin

Raw binary instruction stream + weight blobs

.hex

Intel HEX format for firmware flashing

.c array

C header + source for embedded integration

firmware header + source

Platform-specific firmware wrapper with driver API

OPERATOR COVERAGE

Operator coverage across common ONNX model families.

This matrix shows exactly which models work today versus which need v1.1 or v2.0 compiler support. It prevents post-purchase surprise.

Operator	v1.0	v1.1	v2.0	Models requiring it
Conv2D	✓	—	—	All CNNs
MatMul / GEMM	✓	—	—	Transformers, BERT
ReLU / ReLU6	✓	—	—	MobileNet, ResNet
MaxPool	✓	—	—	ResNet, EfficientNet
GlobalAvgPool	✓	—	—	MobileNetV2
Concat / Reshape	✓	—	—	YOLO variants
BatchNorm	✓	—	—	Most CNNs
DepthwiseConv2D	—	✓	✓	MobileNetV2, EfficientNet
TransposeConv	—	—	✓	YOLO decoder, GANs
LayerNorm	—	—	✓	BERT-tiny, transformers
GELU / SiLU	—	—	✓	EfficientNet, YOLOv8
Attention (MHSA)	—	—	✓	BERT, ViT
Slice / Gather	—	—	✓	YOLO, detection heads

COMPILER FREQUENCY MIGRATION

v2.0 compiler update required for TSMC 40nm.

v1.0 compiler targets: 50 MHz, SKY130A SRAM latencies

v2.0 compiler update required for TSMC 40nm:

— hardware_config_tsmc40nm.json: updated clock period, SRAM read latency, write latency, pipeline cycle counts

— Regression suite: cosine similarity re-verification for all supported models at 400 MHz timing

— ResNet-18 baseline: 0.911 INT8/FP32 cosine similarity to be re-verified post-migration

No RTL changes required for frequency migration. Compiler configuration file update only.

FULL SOFTWARE DELIVERY PACKAGE

32 files. Compiler, drivers, SDK, examples.

Every word of this is a competitive advantage against vendors who deliver encrypted RTL with no software.

Compiler (14 Python modules)

pipeline.pyonnx_parser.pyquantizer.pytiler.pyscheduler.pydesc_gen.pybinary_gen.pynoc_mapper.pygraph_optimizer.pymemory_planners.pyperf_model.pyequivalence_checker.pybit_accurate_model.pyisa.py

Firmware / drivers (14 files)

Bare-metal drivernpu_driver.c / npu_driver.h

Hardware Abstraction LayerHAL

CSR register access library—

Linux kernel driverneuraedge_drv.c

Linux IOCTL interfaceneuraedge_ioctl.c / .h

Linux power managementneuraedge_pm.c

Device tree source.dts

Linker script—

Startup code—

CMake build system—

SDK + Example

SDKneuraedge.h / neuraedge.c / neuraedge.py

Exampleresnet18_demo.py — end-to-end inference demo

Evaluate the compiler yourself.

Schedule a technical review and walk through the compiler pipeline live. We will compile your model on screen and show you the binary output.

Schedule Technical Review →View Full Spec Sheet