Building AI? Don’t overlook the network

Category: News
Published: 19^th March 2026

When you consider the objective behind any AI system initiative is to improve productivity, it’s somewhat ironic that their performance can be severely affected by an often neglected, but critical element to AI’s success.

And when companies build their own AI systems, the focus typically falls on models, data, GPUs and software frameworks. But a frequently overlooked yet critical component of successful AI deployment is the network itself.

AI workloads behave very differently from traditional enterprise applications. Distributed training generates intense east-west traffic between servers, not just north-south traffic between users and data centres. Large volumes of data move continuously between GPUs and nodes, making ultra-low and, crucially, deterministic latency essential. Even small amounts of jitter can slow synchronisation and extend training times because distributed AI workloads rely on tightly coordinated communication between nodes. This means that delays in even a single data exchange can hold up the entire process and impact AI’s performance and the end-user’s quality of experience.

Throughput is equally important. AI clusters often run links at sustained high utilisation, where even minor packet loss can trigger retransmissions and degrade performance. Lossless or near-lossless behaviour, intelligent congestion management and carefully designed spine–leaf architectures become fundamental rather than optional.

There are also several lesser-known network characteristics that can influence the success of AI deployments. From subtle traffic behaviours within high-performance compute clusters, to timing and synchronisation considerations, and the way large datasets move across infrastructure, these factors are often only discovered once organisations begin scaling their AI systems. For example:

Timing and synchronisation

Many AI training environments rely on precise timing across nodes, meaning clock synchronisation protocols such as PTP (Precision Time Protocol) become critical.

Microburst traffic

AI clusters often produce microbursts of traffic when gradient updates are exchanged between nodes. These bursts can exceed switch buffer capacity even when average utilisation appears safe.

Incast patterns

During training phases, multiple workers may simultaneously send updates to a single parameter server or aggregation node, creating incast congestion that can overwhelm switch queues.

Because of this, AI environments require dedicated network testing and validation processes. Simply applying traditional performance checks is rarely sufficient because networks need to be evaluated under the types of traffic patterns and load conditions that AI workloads actually create.

This includes validating precise timing and synchronisation across nodes to minimise jitter, analysing how the network handles sudden microbursts of high-intensity traffic that can overwhelm buffers, and stress-testing incast scenarios where many nodes transmit simultaneously to a single point, potentially triggering congestion.

Beyond this, companies must also consider how networks perform under sustained high utilisation, how they respond to even minimal packet loss, whether bandwidth is consistently and fairly distributed across highly parallel workloads, and how resilient the network remains during failures or disruption. Together, these factors highlight that validating AI networks requires a far broader and more specialised approach than traditional testing alone. In short, the network is no longer just supporting infrastructure – it’s a key enabler of successful AI system performance and ensuring a productive AI quality of experience for your users.

These are some of the issues network engineers encounter when AI workloads begin to scale. At Future Networks LIVE on 28^th April, we’ll be exploring these behaviours in more detail, including how organisations are testing and validating their infrastructure.

Click here for the full agenda and registration: www.redhelix.com/media/future-networks-live-agenda/

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
crisp-clientsession/*	6 months	Session ID for existing chat through Crisp chat
li_gc	2 years	No description

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-3109036-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Send a message

Contact details

Social media

Building AI? Don’t overlook the network

Timing and synchronisation

Microburst traffic

Incast patterns