SATAnet

Unified communications and storage

In 2006, Intesym developed SATAnet. Through a ubiquitous single interface and single protocol, SATAnet unifies both remote storage and high-performance networking between multiple computers.

  • Compatible with any system which can use Serial ATA hard discs (not just PCs).
  • Recognised by BIOSes as a standard SATA disc.
  • Recognised and used by operating systems as a standard SATA disc.
  • Install, boot, and use any OS (inc. Microsoft™ Windows™) transparently over the network.
  • Manage network-wide storage in a centralised way.
  • Implement transparent distributed RAID for speed and fault tolerance.
  • Run multicomputer/clustering software directly over the same interface.
  • Packet-based communication with circuit-switching option.
  • Support for TCP/IP and MPI under Linux.
  • Configurable “in-flight processing” — transform and manipulate data packets as they pass through the network.
  • Algorithm control — let the network co-ordinate the synchronisations and scheduling of your algorithm.
  • “Intelligent storage” allows arbitrarily powerful storage facilities, including self-modifying files and even mounting the entire world-wide web as if it were stored on a local disc.

Serial ATA is now a ubiquitous interface for local disc drives, i.e. connecting drives directly to a motherboard. However, there are times when having a local disc is undesireable — in particular they generate substantial noise and heat.

Now, the invention of SATAnet allows remote storage to be accessed exactly as if it were a local disc. No changes are needed to the BIOS or Operating System, and nor is there any need for additional drivers. This gives local disc speeds with remote storage benefits.

Through inventive use of the Serial ATA protocol, SATAnet provides complete networking capabilities between two or more computers, most notably useful in High-Performance Computing clusters where there are large benefits of high-bandwidth, low-latency communication combined with remote storage and wide hardware and operating-system compatibility.

Approximate comparisons with other interconnects

Product Bandwidth (MByte/s) Latency (µs)
Myrinet 2000 247 / 495 3.2 / 2.6
Myri-10G 1200 2
Quadrics QsNet 900 1
Infiniband 156 1.3 ~ 2.6
GbE 125 60 ~ 100
10GbE 1250 21
SATAnet (I) 150 ~ 300 1

SATAnet bandwidths will increase as newer, faster Serial ATA standards arise. 450MByte/s and 600MByte/s bandwidths are planned. In addition, multiple ports per node can be utilised, allowing greater per-node bandwidth limited only by the motherboard chipset capabilities and/or PCI/PCI-E SATA expansion cards.

Distributed storage

  • Allows abstraction of storage.
  • Mount any data (e.g. the world-wide web) as a normal disc and filesystem.
  • Manipulate data with “intelligent storage” (e.g. encryption).
  • Share memory amongst nodes.

SATAnet makes it possible to provide complete abstraction of any data storage and present it through the SATA interface. For example, the entire world-wide web could be made available as an apparent disc and mounted as a sub-directory, e.g. as /mnt/www with websites accessed as local files. A particularly interesting possibility opened up by this technology is one of bootable websites, whereby a PC can boot up directly from a software vendor’s website over the internet as if from a local disc, revolutionising the distribution, security, and licencing models of operating systems and applications.

Similarly, “intelligent storage” can be implemented, e.g. transparent encryption and compression, RAID, on-site/off-site backups, etc.

Other benefits include: large (and fast) swap files on discless nodes; secure swap files — the underlying storage medium is abstract and not contained within the node; memory can be easily shared amongst nodes by mapping it to a block of sectors or file.

In-flight processing

  • Relieves load from the CPU.
  • Run-time configurable transformations.

Communication packets can be passed through programmable transformation units within the SATAnet router. This allows the data within a packet to be processed, either to change, filter, or expand the data within the packet or to compute additional data such as checksums.

Algorithmic control

  • Relieves load from the CPU.
  • Reduces network traffic.
  • Removes coding from the algorithm implementation.

Some algorithms require many synchronisations to ensure that data is ready prior to starting execution or it is desireable to run parts in lock-step. Normally this will be implemented with software semaphores and synchronisation primitives with much communication going between nodes. SATAnet can reduce the costs and complexities incurred by such algorithms by itself co-ordinating the synchronisations and execution scheduling, removing the overhead of software control.

Default mode

  • Operates as a normal local SATA disc.

The SATAnet router appears as a normal local hard disc to each node it is connected to. If the router has a physical disc installed then this disc is seen by the nodes otherwise the router appears as a blank disc.

Packet-switched mode

  • High-bandwidth network communication.
  • Co-exists with storage and filesystems.

This is the most flexible networking mode. Data is encapsulated into packets and sent from one node to another using normal SATA protocols. Specifically, reading or writing a packet is simply a case of reading or writing the relevant number and location of sectors.

Both network communication and storage access can co-exist and be freely intermixed over the same SATA interfaces and the networking does not interfere with filesystem operation.

Circuit-switched mode

  • Ultra-low latency transfers between applications.

This mode is intended for use for short periods of time between two or more nodes when ultra-low latency communication is desireable. The contents of a variable in a userspace program on one node can be transferred to a variable in a userspace program on another node in under 1µs.

Whilst in this mode, a path through the SATAnet is reserved and so general communication and storage access by other nodes will have temporarily higher latencies.

Distributed storage mode

  • Allows abstraction of storage.
  • Mount any data (e.g. the world-wide web) as a normal disc and filesystem.
  • Manipulate data with “intelligent storage” (e.g. encryption).

This mode co-exists with the other modes and allows SATA commands to be transferred elsewhere for processing, effectively allowing a processing node (normally a SATA host) to behave as a SATA device.

This is a very flexible mode as it allows complete abstraction of any data storage and present it through the SATA interface. For example, the entire world-wide web could be made available and mounted as a sub-directory, e.g. as /mnt/www with websites accessed as local files.

Similarly, “intelligent storage” can be implemented, e.g. transparent encryption and compression, RAID, on-site/off-site backups, etc.

Other benefits include: large (and fast) swap files on discless nodes; secure swap files — the underlying storage medium is abstract and not contained within the node; memory can be easily shared amongst nodes by mapping it to a block of sectors or file.

Networking Characteristics

Bandwidth

  • Can determine the dataset sizes for batch-style parallelism — effectively a latency
  • Can determine the processing speed for stream-based parallelism
  • Low bandwidth is acceptable with highly-independent parallelism
  • High bandwidth is beneficial for highly-dependent parallelism

A node can be connected to one or more SATAnet routers and to one or more ports per router. Using multiple ports allows for multiplication of the bandwidth and also lower average latency.

The following table lists bandwidths for packet transfers of varying sizes on a SATA-I interface.

Packet size (bytes) Bandwidth (Mbytes/s)
512 8.5
1024 11.9
2048 21.9
4096 36.9
8192 56.6
16384 72.2

Latency

  • Can determine the granularity of the parallelism
  • Low-latency is important in synchronous or poorly-parallel applications
  • High-latency is tolerable in asynchronous and highly-parallel applications
  • Set-up latencies can be offset by merging packets
  • In core-rich nodes one core can be allocated to communications to minimise blocking

Packet-switched latencies

The following table lists timings for a packet to transfer through various parts of a Linux-based system.

Metric Symbol Time (µs)
Application to driver Ttad 2
Driver to SATA host Ttdh 60
Host to host via SATAnet switch Tnet 0.2
SATA host to driver Trhd 60
Driver to application Trda 2

Transmission on one node is asynchronous to reception on another node, and so the latencies overlap. In practice this means that the overall latency is lower than the simple summation of each part of the transfer. Since the latencies are roughly symmetrical for both transmission and reception, the typical overall latency is Ttad + Ttdh + Tnet = 62µs.

Circuit-switched latencies

The following table lists the approximate timings for a 4-byte word to transfer through various parts of a Linux-based system.

Metric Symbol Time (µs)
Application to SATAhost Ttah 0.3
Host to host via SATAnet switch Tnet 0.2
SATAhost to application Trha 0.3

Total end-to-end latency is simply Ttah + Tnet + Trha = 0.8µs. The bandwidth achieved for back-to-back transfers of 4-byte words is around 8~12Mbytes/sec.

Storage Characteristics

The following three commands were run under Linux over a SATA-I interface, giving the results shown.

# hdparm -Tt /dev/sdb /dev/sdb: Timing cached reads: 2256 MB in 2.00 seconds = 1127.62 MB/sec Timing buffered disk reads: 178 MB in 3.01 seconds = 59.04 MB/sec

# dd if=/dev/sdb of=/dev/null bs=4096 count=4096 4096+0 records in 4096+0 records out 16777216 bytes (17 MB) copied, 0.242752 seconds, 69.1 MB/s

# dd if=/dev/zero of=/dev/sdb bs=4096 count=4096 4096+0 records in 4096+0 records out 16777216 bytes (17 MB) copied, 0.262695 seconds, 63.9 MB/s