Anthill: 一种基于MapReduce的分布式DBMS

本文介绍了一种名为Anthill的新型分布式数据库系统,它结合了共享无状态MPP架构数据库的优点与MapReduce模型,显著提高了可扩展性。通过使用MonetDB作为底层数据库,Anthill在读取性能上优于Hadoop和HadoopDB,并且通过引入完整的查询执行引擎,能够更智能地处理多样化数据,生成更优查询计划。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MapReduce is a parallel computing model proposed by Google for large data sets, it’s proved to have high availability, good scalability and fault tolerance, and has been widely used in recent years. However, there is a voice from traditional database community, D. J. DeWitt et al argue that MapReduce loses the performance and efficiency from DBMS, and it’s a step backwards of data analysis techniques. After that, HadoopDB article presents a new type of MapReduce based distributed database implementation, but it has the following disadvantages: (1) It doesn’t add any query execution engine for Hadoop client, HadoopDB is only a pilot project; (2) HadoopDB uses PostgreSQL as its underlying database, where redundancies and backups which Hadoop’s underlying file system HDFS provides is short of, thus it loses Hadoop’s original high availability; (3) There is not a complete partition mechanism, its tables are manually partitioned, this is not practical; (4) Table joining in HadoopDB is assumed in the ideal state, that two tables’ partitions are on the same node, but it’s not the case in the real world.
This paper overcomes the above problems, our implementation named Anthill can keep both advantages of shared-nothing MPP architecture databases and the MapReduce. Experiments show that: (1) Anthill provides better scalability than MPP databases, it can deploy more than 100 nodes, even 500 nodes; (2) Since Anthill uses the column-oriented database MonetDB as its underlying database, where I/O is effectively reduced, it achieves better performance than Hadoop and HadoopDB; (3) Anthill adds a complete query execution engine with parser, optimizer and planner, thus it can more intelligently handle the diversity of data and produce better query plans compared to HadoopDB. In short, Anthill is a new type of distributed database system with high commercial values.

Keywords: Distributed Database; Anthill; MapReduce; Hadoop

[flash=425,355]http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=anthill-100511101418-phpapp01&stripped_title=anthill-a-distributed-dbms-based-on-mapreduce[/flash]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值