Blue Whale Clustered file system

From HandWiki

Blue Whale Clustered file system (BWFS) is a shared disk file system (also called clustered file system, shared storage file systems or SAN file system) made by Tianjin Zhongke Blue Whale Information Technologies Company in China .


BWFS enables simultaneous file access across heterogeneous platforms and high-performance file creation, storing, and sharing. BWFS is installed on hosts that are connected to the same disk array in a storage area network (SAN) . Client systems are not required to run the same operating system to access a shared filesystem containing StorNext data. As of January 2010, the operating systems with available client software are Microsoft Windows, Linux, and Mac OS X.

BWFS can convert many FibreChannel or iSCSI disk arrays into a storage cluster that supports multi-server for parallel processing, provide high-performance and extensible file-sharing service, and sustains multi-machine workflow or applications under cluster environment.

BWFS file system is realized in the mode of direct data access. Shared file data directly access to FC or iSCSI disk array through SAN network to transfer data by skipping file server or NAS head, which fully displays the advantage of high bandwidth of SAN environment. BWFS allows great enhancement of system on processing ability for simultaneous file without changing front-end application environment and back-end SAN condition.

BWFS backs the MDC of redundant structure (Meta Data Controller), providing excellent performance and high availability capabilities, combined with SAN infrastructure to bring system reliability and data security for storage at enterprise level.

Data access process

BWFS supporting heterogeneous multi-operating system platform, allowing multiple servers to concurrently access the same set of disk and files without concerning the type of their respective file system. Currently, BWFS supports a variety of enterprise-class Linux platform and Windows 2000, Windows XP and Windows 2003. Aiming at different operating systems, BWFS has different client programs, some of which is able to identify and provide the access to BWFS shared file system, and ensure consistent presentation of file system in different operating system. IO requests can be handled properly.

When multiple servers concurrently access the same file system, certain mechanism is needed to prevent two servers from writing to the same disk location. It should also be ensured that certain server will not read different content in reading file while other server is upgrading this file. In BWFS, such mechanism and function is provided by MetaData Controller.

MDC is responsible for coordinating the access of server to BWFS file system, located outside the read and write path of file data. Client communicates through a separate IP links and MDC to obtain the location of files and resource allocation information of data block. And then, through SAN network, the disk is directly read and written in block-level mode. Such design of architecture is called “out of band transmission frame” or "asymmetric architecture" in technical term:

Data access process can be broken down as follows:

  1. Application program issues a write request
  2. BWFS client sends an operating request to MDC through LAN
  3. MDC processes this request and responds to the client for which disk blocks can be read in data through LAN.
  4. BWFS client directly writes data in file system at line speed.

BWFS is designed on the basis of SAN environment, allowing a large number of servers or workstations connecting to FC SAN or IP SAN (iSCSI) to directly access the same file system. BWFS FC can use one or more FC links to access disk resources, so that the IO performance of a single server can be extended to several GB / s from more than 100 MB/s by simply increasing FC HBA card.

Of course, the overall performance of a system is not only relevant to the performance of host and network, but also influenced by the performance of the disk constituting file system. So, BWFS file system can be structured by the LUN from multiple disk arrays. It equals to another layer of RAID structured between multiple disk arrays, which maximizes the performance of disk arrays.

Another factor performance factor should be considered is the location of metadata. A file consists of actual data and metadata. Actual data is the content of a file, while metadata includes file attributes, permissions and so on. When a file is created, modified, or deleted, metadata information shall be modified, which means a file is processed by reading both file data and metadata. Usually, large file is read and written continuously, while metadata shall be read by moving magnetic-disc head to other location. For the disk, its read and write mode is much higher than randomness degree. If the data and metadata are memorized in the same disk (mode of the most file systems), the randomness degree of large file will be enhanced accordingly to reduce read and write performance. For this reason, BWFS file system memorizes metadata in different disk or volume in layout, so that the continuous file reading and writing is separated with the randomness of metadata. They are not mutually influenced, so as to provide higher IO bandwidth as much as possible.

In addition, after separation of data and metadata, data and metadata can be processed independently in different hosts without occupying bandwidth of data channel, which can improve the concurrency of data and metadata to further enhance file system performance.


A 2006 Gartner publication said:

"BWFS, an Internet Protocol (IP) cluster file system (CFS), has moved beyond the research lab and into the commercialization stage, and has now been successfully deployed in various industries including the energy, automotive, military and the media sectors. Its success demonstrates the strengths of China's research institutes in the technology realm, despite their relative lack of commercial experience and investment resources compared to many Western technology providers. Although CFSs are not yet prevalent in the mainstream storage market, for some users who need very high input/output I/O performance — especially leading-edge applications such as oil and gas, biotech and computer-aided design (CAD) — BWFS offers a good price/performance solution. Users should also consider BWFS if looking for a lower-priced CFS. Users that need a more commercialized solution — or that like to have a more “out of box” interface — should consider other vendors such as Panasas, Isilon and Ibrix rather than BWFS."[1]

BWFS was developed at the National Research Centers for High Performance Computers of the Chinese Academy of Sciences. In 2007, FalconStor announced a joint venture to sell the software.[2] The joint venture was named Tianjin Zhongke Blue Whale Information Technologies Company, located in Tianjin, China .[3] Venture capital firm VantagePoint Capital also made an investment.[4] It was announced that BWFS would be used for video from a satellite intended to cover the 2008 Summer Olympics.[5]

See also


Further reading

  • Zhenhan Liu, Xiaoxuan Meng, Lu Xu. Lock management in blue whale file system. In Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human (ICIS 2009)
  • Zhenhan Liu, Xiaoxuan Meng, Lu Xu (2009). "Performance Optimization under Small Files Intensive Workloads in BWFS". International Conference on Parallel and Distributed Computing, Applications and Technologies: 154–159. doi:10.1109/PDCAT.2009.60. 
  • Liu Shi, Jingliang Zhang, Lu Xu (2010). "Client Based Data Isolation of Blue Whale File System in Non-linear Edit Field". Proceedings of 12th IEEE International Conference on High Performance Computing and Communications: 49–54. doi:10.1109/HPCC.2010.39. 
  • A Storage Slab Allocator for Disk Storage Management in File System[Q],NAS’09,2009
  • Lu Xu, Hongyuan Ma, Zhenjun Liu, Huan Zhang, Shuo Feng, Xiaoming Han, "Experiences with Hierarchical Storage Management Support in Blue Whale File System," pdcat, pp. 369–374, 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, 2010

External links