谁能解释下commodity managerhardware这个词

& HDFS学习(一) – HDFS设计
HDFS学习(一) – HDFS设计
我们都知道,Hadoop除了计算部分Mapreduce外,还有一个分布式文件系统HDFS,全称Hadoop Distributed Filesystem。
《Hadoop 权威指南》上用这么一句话来描述HDFS:
HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware.
有几个关键性的词组:Very large files,Streaming data access,以及Commodity hardware。解下来一个一个解释。
① Very large files
在Hadoop中,“very large”是多大?运行在HDFS上的应用具有很大的数据集。HDFS上的一个典型文件大小一般都在G字节至T字节。现如今,已经有不少的企业,存储在HDFS上的数据,已经超过了PB的级别,例如淘宝。
② Streaming data access
③ Commodity hardware
① 低延时数据访问
HDFS不太适合于要求低延时(数十毫秒)访问的应用程序,因为HDFS是设计用于高吞吐量数据访问的,这就需要以一定的延时为代价。而对于那些有低延时要求的应用程序,HBase是一个更好的选择。HBase的口号就是“Use Apache HBase when you need random, realtime read/write access to your Big Data”。
② 大量的小文件
还有一个问题就是,因为Map task的数量是由splits来决定的,所以用MR处理大量的小文件时,就会产生过多的Map task,线程管理开销将会增加作业时间。处理大量小文件的速度远远小于处理同等大小的大文件的速度。举个例子,处理10000M的文件,若每个split为1M,那就会有10000个Map tasks,会有很大的线程开销;若每个split为100M,则只有100个Map tasks,每个Map task将会有更多的事情做,而线程的管理开销也将减小很多。
对于第一问题,最新版本的Hadoop已经有了解决方案:HDFS Federation,将在《》做详细介绍。
③ 多用户写,任意文件修改
目前Hadoop只支持单用户写,不支持并发多用户写。可以使用Append操作在文件的末尾添加数据,但不支持在文件的任意位置进行修改。《Hadoop 权威指南》第三版上说,现在尚无对这方面的支持。
Files in HDFS may be written to by a single writer. Writes are always made at the end of the file. There is no support for multiple writers or for modifications at arbitrary offsets in the file. (These might be supported in the future, but they are likely to be relatively inefficient.)
More from my sitetraffic classfication over gibt speed with commodity hardware_百度文库
traffic classfication over gibt speed with commodity hardware
你可能喜欢From Wikipedia, the free encyclopedia
(Redirected from )
Commodity computing, or commodity cluster computing, is the use of large numbers of already-available computing components for , to get the greatest amount of useful computation at low cost. It is computing done in commodity computers as opposed to high-cost
or . Commodity computers are
manufactured by multiple vendors, incorporating components based on . Such systems are said to be based on
components, since the standardization process promotes lower costs and less differentiation among vendors' products. Standardization and decreased differentiation lower the switching or exit cost from any given vendor, increasing purchaser's leverage and preventing lock-in. A governing principle of commodity computing is that it is preferable to have more low-performance, low-cost hardware working in parallel () (e.g.
x86 ) than to have fewer high-performance, high-cost hardware (e.g. IBM
or Sun-Oracle's
). At some point, the number of discrete systems in a cluster will be greater than the
(MTBF) for any hardware platform, no matter how reliable, so
must be built into the controlling software. Purchases should be optimized on cost-per-unit-of-performance, not just absolute performance-per-CPU at any cost.
The first computers were large, expensive and proprietary. The move towards commodity computing began when
introduced the
in 1965. This was a computer that was relatively small and inexpensive enough that a department could purchase one without convening a meeting of the board of directors. The entire
industry sprang up to supply the demand for 'small' computers like the PDP-8. Unfortunately, each of the many different brands of minicomputers had to stand on its own because there was no software and very little hardware compatibility between the brands.
When the first general purpose
was introduced in 1974 it immediately began chipping away at the low end of the computer market, replacing
in many industrial devices.
This process accelerated in 1977 with the introduction of the first commodity-like microcomputer, the . With the development of the
application in 1979, microcomputers broke out of the factory and began entering office suites in large quantities, but still through the back door.
was introduced in 1981 and immediately began displacing Apple IIs in the corporate world, but commodity computing as we know it today truly began when
developed the first true IBM PC compatible. More and more PC-compatible microcomputers began coming into big companies through the front door and commodity computing was well established.
During the 1980s microcomputers began displacing larger computers in a serious way. At first, price was the key justification but by the late 1980s and early 1990s,
semiconductor technology had evolved to the point where microprocessor performance began to eclipse the performance of discrete logic designs. These traditional designs were limited by speed-of-light delay issues inherent in any CPU larger than a single chip, and performance alone began driving the success of microprocessor-based systems.
By the mid-1990s, nearly all computers made were based on microprocessors, and the majority of general purpose microprocessors were implementations of the
instruction set architecture. Although there was a time when every traditional computer manufacturer had its own proprietary micro-based designs there are only a few manufacturers of non-commodity computer systems today.
Today, there are fewer and fewer general business computing requirements that cannot be met with off-the-shelf commodity computers. It is likely that the low-end of the supermicrocomputer genre will continue to be pushed upward by increasingly powerful commodity microcomputers.
John E. D Josephine Palencia R Udaya Ranawake.
(PDF). : Goddard Space Flight Center. The purpose of commodity cluster computing is to utilize large numbers of readily available computing components for parallel computing to obtaining the greatest amount of useful computations for the least cost. The issue of the cost of a computational resource is key to computational science and data processing at GSFC as it is at most other places, the difference being that the need at GSFC far exceeds any expectation of meeting that need.}


更多关于 commodity manager 的文章


