| Title: |
Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference |
| Authors: |
Masud, Adiba; Foley, Nicholas; Rajarajan, Pragathi Durga; Lama, Palden |
| Publication Year: |
2026 |
| Subject Terms: |
Distributed, Parallel, and Cluster Computing |
| Description: |
The deployment of deep neural networks (DNNs) on resource-constrained edge devices is frequently hindered by their significant computational and memory requirements. While partitioning and distributing a DNN across multiple devices is a well-established strategy to mitigate this challenge, prior research has largely focused on single-objective optimization, such as minimizing latency or maximizing throughput. This paper challenges that view by reframing DNN partitioning as a multi-objective optimization problem. We argue that in real-world scenarios, a complex trade-off between latency and throughput exists, which is further complicated by network variability. To address this, we introduce ParetoPipe, an open-source framework that leverages Pareto front analysis to systematically identify optimal partitioning strategies that balance these competing objectives. Our contributions are threefold: we benchmark pipeline partitioned inference on a heterogeneous testbed of Raspberry Pis and a GPU-equipped edge server; we identify Pareto-optimal points to analyze the latency-throughput trade-off under varying network conditions; and we release a flexible, open-source framework to facilitate distributed inference and benchmarking. This toolchain features dual communication backends, PyTorch RPC and a custom lightweight implementation, to minimize overhead and support broad experimentation. |
| Document Type: |
Working Paper |
| Access URL: |
http://arxiv.org/abs/2601.08025 |
| Accession Number: |
edsarx.2601.08025 |
| Database: |
arXiv |