The New HDFS Features in Apache Hadoop v2

Speaker:        Dr. Tsz-Wo Nicholas Sze
                Technical Staff at Hortonworks

Title:          "The New HDFS Features in Apache Hadoop v2"

Date:           Monday, 10 Feb 2014

Time:           4:00pm - 5:00pm

Venue:          Lecture Theater F (near lifts 25/26), HKUST

Abstract:

Apache Hadoop v2.2.0, the GA release of Hadoop v2, offers several
significant HDFS improvements including new append-pipeline, federation,
NameNode HA, snapshots, wire compatibility, NFS interface, further
performance improvements, etc.  In this talk, we first give a brief
introduction to Hadoop and then discuss some of these new features in
details.  The append feature is added to HDFS and the write-pipeline is
improved dramatically for better durability, visibility and consistency
guarantees.  Federation uses multiple independent NameNodes and namespaces
in order to scale the name service horizontally.  NameNode HA addresses
the problem of the NameNode being a single point of failure in a HDFS
cluster.  Snapshots are read-only point-in-time copies of the file system
for supporting "time travel in big data.  We also describe some of the
development that is underway for the next release and some future works.


*******************
Biography:

Dr. Tsz-Wo Nicholas Sze is a Member of Technical Staff at Hortonworks and
also a Member of the Project Management Committee at Apache Hadoop.  His
interests include distributed computing, algorithms and mathematical
analysis.  He started contributing to Hadoop in 2007.  Two of his recent
Hadoop contributions were HDFS Snapshots and WebHDFS. He accomplished a
new computation world record of Pi using Hadoop with Yahoo's clusters in
2010.  He received his Ph.D. degree in Computer Science from the
University of Maryland College Park in 2007, and his M.Phil. and B.Eng.
degrees from the Hong Kong University of Science and Technology
respectively in 2001 and 1999.