Privacy in file sharing networks

From HandWiki

Peer-to-peer file sharing (P2P) systems like Gnutella, KaZaA, and eDonkey/eMule, have become extremely popular in recent years, with the estimated user population in the millions. An academic research paper analyzed Gnutella and eMule protocols and found weaknesses in the protocol; many of the issues found in these networks are fundamental and probably common on other P2P networks.[1] Users of file sharing networks, such as eMule and Gnutella, are subject to monitoring of their activity. Clients may be tracked by IP address, DNS name, software version they use, files they share, queries they initiate, and queries they answer to.[1] Clients may also share their private files to the network without notice due to inappropriate settings.[2]

Much is known about the network structure, routing schemes, performance load and fault tolerance of P2P systems in general.[3] The eMule protocol does not provide much privacy to the users, even though it is a P2P protocol which is supposed to be decentralized.[4]

The Gnutella and eMule protocols

The eMule protocol

eMule is one of the clients which implements the eDonkey network. The eMule protocol consists of more than 75 types of messages. When an eMule client connects to the network, it first gets a list of known eMule servers which can be obtained from the Internet. Despite the fact that there are millions of eMule clients, there are only small amount of servers.[5][6] The client connects to a server with TCP connection. That stays open as long as the client is connected to the network. Upon connecting, the client sends a list of its shared files to the server. By this the server builds a database with the files that reside on this client.[7] The server also returns a list of other known servers. The server returns an ID to the client, which is a unique client identifier within the system. The server can only generate query replies to clients which are directly connected to it. The download is done by dividing the file into parts and asking each client a part.

The Gnutella protocol

Gnutella protocol v0.4

In Gnutella protocol V0.4 all the nodes are identical, and every node may choose to connect to every other.[8] The Gnutella protocol consist of 5 message types: query for tile search. Query messages use a flooding mechanism, i.e. each node that receives a query forwards it on all of its adjacent graph node links.[9] A node that receives a query and has the appropriate file replies with a query hit message. A hop count field in the header limits the message lifetime. Ping and Pong messages are used for detecting new nodes that can be linked to the actual file download performed by opening TCP connection and using the HTTP GET mechanism.[10][11]

Gnutella protocol v0.6

Gnutella protocol V0.6 includes several modifications: A node has one of two operational modes: "leaf node" or "ultrapeer". Initially each node starts in a leaf node mode in which it can only connect to ultrapeers. The leaf nodes send query to an ultrapeer, the ultrapeer forwards the query and waits for the replies. When a node has enough bandwidth and uptime, the node may become an ultrapeer.{{citation needed|date=March 2012} quest for their clients to send a list with the shared files they have. If a query arrives with a search string that matches one of the files in the leaves, the ultrapeer replies and pointing to the specific leaf.[citation needed]

Tracking initiators and responders
Tracking a single node

Many clients of Gnutella have an HTTP monitor feature. This feature allows sending information about the node to any node which supports an empty HTTP request, and receiving on response. Research shows that a simple crawler which is connected to Gnutella network can get from an initial entry point a list of IP addresses which are connected to that entry point.{{citation needed|date=March 2012} inquire for other IP addresses. An academic research performed the following experiment: At NYU, a regular Gnucleus software client that was connected to the Gnutella network as a leaf node, with distinctive listening TCP port 44121. At the Hebrew University of Jerusalem, a crawler ran looking for client listening with port 44121. In less than 15 minutes the crawler found the IP address of the Gnucleus client in NYU with the unique port.[citation needed]

IP address harvesting
Tracking nodes by GUID creation

Gnucleus on Windows uses the Ethernet MAC address used as the GUID 6 lower bytes. Therefore, Windows clients reveal their MAC address when sending queries.[12]


Collecting miscellaneous information users
Tracking users by partial information
Tracking users by queries
Usage of hash functions

SHA-1 hashes refer to SHA-1 of files not search strings.

Half of the search queries are strings and half of them are the output of a hash function (SHA-1) applied on the string. Although the usage of hash function is intended to improve the privacy, an academic research showed that the query content can be exposed easily by a dictionary attack: collaborators ultrapeers can gradually collect common search strings, calculate their hash value and store them into a dictionary. When a hashed query arrives, each collaborated ultrapeer can check matches with the dictionary and expose the original string accordingly. [13]

Measures


See also

  • Gnutella2, a reworked network based on Gnutella
  • Bitzi, an open content file catalog integrated with some Gnutella clients
  • Torrent poisoning

References

  1. 1.0 1.1 Bickson, Danny; Malkhi, Dahlia (2004). "A Study of Privacy in File Sharing Networks". http://leibniz.cs.huji.ac.il/tr/631.ps. 
  2. Liu, Bingshuang; Liu, Zhaoyang; Zhang, Jianyu; Wei, Tao; Zou, Wei (2012-10-15). "How many eyes are spying on your shared folders?". Proceedings of the 2012 ACM workshop on Privacy in the electronic society. WPES '12. Raleigh, North Carolina, USA: Association for Computing Machinery. pp. 109–116. doi:10.1145/2381966.2381982. ISBN 978-1-4503-1663-7. https://doi.org/10.1145/2381966.2381982. 
  3. Eng Keong Lua Jon Crowcroft. "A Survey and Comparison of Peer-to-Peer Overlay Network Schemes". IEEE Communications Surveys & Tutorials 7 (2): 72–93. 
  4. Silva, Pedro Moreira da (19 June 2017). "Mistrustful P2P: Deterministic privacy-preserving P2P file sharing model to hide user content interests in untrusted peer-to-peer networks". Computer Networks 120: 87–104. doi:10.1016/j.comnet.2017.04.005. http://repositorio.inesctec.pt/handle/123456789/4318. 
  5. "Top Project Listings". https://sourceforge.net/top/. 
  6. "Safe Server List for eMule. Generated: September 17 2021 18:28:20 UTC+3". http://www.emule-security.org/serverlist/. 
  7. Yoram Kulbak and Danny Bickson. "The eMule protocol specification". EMule Project. 
  8. "privacy in file sharing" (in en). https://inba.info/privacy-in-file-sharing_578ff20cb6d87fba528b4600.html. 
  9. Yingwu Zhu; Yiming Hu (2006-12-01). "Enhancing Search Performance on Gnutella-Like P2P Systems". IEEE Transactions on Parallel and Distributed Systems 17 (12): 1482–1495. doi:10.1109/tpds.2006.173. ISSN 1045-9219. http://dx.doi.org/10.1109/tpds.2006.173. 
  10. "Gnutella Protocol Development". https://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html. 
  11. "Tornado Cash". https://tornado.community. 
  12. Courtney, Kylan. (2012). Information and internet privacy handbook. Murdock, Keon. (1st ed.). Delhi [India]: College Publishing House. ISBN 978-81-323-1280-2. OCLC 789644329. 
  13. Zink, Thomas (October 2020). "Analysis and Efficient Classification of P2P File Sharing Traffic". Universität Konstanz. https://www.researchgate.net/publication/271823708. 

Further reading