Segmented file transfer

From HandWiki

Segmented file-transfer (also known as multisource file-transfer or swarming file-transfer) is a software method that is intended to improve file download speed. It works by simultaneously downloading different portions of the computer file sourced from either multiple servers or from a single server, recombining the parts into the single file requested. The majority of Download Manager applications work in this way.

History

Segmented downloads probably have an origin with NASA and the magnetic tape based file systems used on Deep Space Network craft such as those in the Voyager Program. However, from the 1960s to the 1980s there was a lot of experimentation with uploading, downloading (and synchronizing) data over bandwidth restricted telecommunications links by most[which?] mainframe computer users. So the early origins of segmented downloading are not historically clear.

It is understood that some NASA missions use some kind of segmented downloading technique (for either file formats or data streams) :

Swarmcast was the first significant peer-to-peer (P2P) content delivery system that implemented a kind of segmented downloading technology. The program and protocol was invented and developed in 1999 by Justin Chapweske and sold to Opencola, which released the software under a GPL license.

A lot of the terms used in segmented downloading technology have their origin with Swarmcast, with BitTorrent being the only other significant contributor to the terms in use.[citation needed]

Network implications

In this animation, the coloured bars beneath all of the clients represent individual pieces of the file. After the initial pieces transfer from the seed, the pieces are individually transferred from client to client. The original seeder only needs to send out one copy of the file for all the clients to receive a copy.

Most IP networks are designed for users to download more than they upload, usually with an expected (Download:Upload) ratio of 3:1 or more.

Segmented downloading, when used by only 20% of an ISP's user base, can upset the ISP's network to a point of requiring substantial reprogramming of routers and a rethink of network design.

  • Traditional web object caching technology (like the Squid proxy) is of no use here.
  • Universal adoption of IPv6 cannot help either, as it only allows all users to have fixed IP addresses. Fixed IP address don't fully address the routing table problems associated with segmented downloading.
  • Typical downloading configurations can have a single user in touch with up to 10 to 30 ephemeral users per file scattered across the global internet.
  • IP router tables can become bloated with routes to these ephemeral users slowing down table lookups.

Network advantages

  • Large files can be made available efficiently to many other users by someone who does not have large upload bandwidth.
  • Routes to the more obscure parts of the Internet can assert themselves across most of the Internet—this is especially true for dial-up users
  • Segmented downloading does save some transmission capacity, as the number of lost or redundant megabytes is minimal compared to losing a prolonged http or ftp download

Most ISPs have learned to cope with segmented downloading technology, but coping has meant the mandatory deployment of TCP/IP traffic shaping technology.[citation needed]

Limitations

Segmented downloading technology cannot magically solve all downloading problems. There are mathematical constraints on the effectiveness of the technology.

In a group of users that has insufficient upload-bandwidth, with demand higher than supply. Segmented downloading can however very nicely handle traffic peaks, and it can also, to some degree, let uploaders upload "more often" to better utilize their connection.

Data integrity issues

  • Very simple implementations of segmented downloading technology can often result in varying levels of file corruption, as there often is no way of knowing if all sources are actually uploading segments of the same file.
  • Data corruption problems have led to most programs using segmented downloading using some sort of checksum or hash algorithm to ensure file integrity (to receive file intact) and uniqueness (to not receive bits of other similar files).
  • Usually MD5 and SHA-1 hashes are preferred in most segmented download protocols, but CRC-64-ECMA would suffice in most cases. In cases where only MPEG files are being sent CRC-32-MPEG would also be acceptable.
  • In the future most segmented downloading technologies will probably use layered hashes and checksums like WHIRLPOOL, SHA-256, SHA-512 and CRC-64-ECMA (for individual segments) to unquestionably guarantee data integrity. MD5 and SHA-1 have been determined to be cryptographically weak with respect to protecting data integrity.[citation needed]

Segmented uploading

Although with respect to BitTorrent and other distributed file transfer protocols there is no difference between uploading and downloading (as clients can do both) or any meaningful distinction between client and server (as both are the same) there are some segmented uploading technologies that do exist.

Space segment based telecom systems are the only widely known cases where segmented uploading technologies have emerged. This is mainly due to the limited bandwidth and other space segment constraints.

  • CCSDS software uploading protocols have the capability of segmented uploading, but current deployed systems have not been in need of the protocol being used in its most BitTorrent like capability.
  • Satellite direct to home subscription systems deployed in Europe and North America have employed an approach of upgrading software on customer devices by only sending a few bytes at a time (~2k or less) over a long period of time. Generally these segmented upload approaches are proprietary and related to the SIM card security and subscription mechanism.

With respect to Direct To Home TV systems using segmented uploading to outwit "hackers" — only SkyTV (UK) and DirecTV (USA) have been possibly linked to having the capability to do so or have done so in the past. However, one can assume that any modern MPEG2 DVB DTH mass subscriber system has the ability to accept software upgrades trickled to it at the rate of 8kb/day or less.

Examples

See also