EEL 6938 (FEEDS): Data Intensive Computing and Clouds (Fall 2012)


Instructor:            Dr. Jun Wang, HEC 320, 823-0449(office),


Course Objective and Description:

Using large-scale computing systems to solve data-intensive real-world problems has become indispensable for many scientific and engineering disciplines. This course provides a broad introduction to the fundamentals in data intensive computing and its enabling systems architectures such as MapReduce, cloud computing and storage, with a focus on system architecture, middleware and building blocks, programming models, algorithmic design, and application development. Selected scientific applications will be used as case studies.

Prerequisite: introduction to programming or data structures/ algorithms, computer architecture, or instructor approval.

Required textbooks:

Distributed and Cloud Computing, from parallel processing to the Internet of things, Morgan Kaufmann, ISBN: 9780123858801, by Kai Hwang, et al, October 2011

Reference textbooks:

  • Computer Architecture: A quantitative approach, 5th edition, Hennessy&Patterson, Morgan Kaufmann, September 2011
  • Hadoop: The Definitive Guide (2nd Edition), Tom White, O’Reilly Media, 2010.

Other References: 

  • Many recent papers in leading conferences/journals will be discussed.
  • Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, 2010. (PDF version available online)
  • Programming Amazon EC2, Jurg van Vliet and Flavia Paganelli, O’Reilly Media, 2011.
  • The Grid: Blueprint for a New Computing Infrastructure (2nd Edition), Ian Foster, Carl Kesselman, Morgan Kaufmann/Elsevier, 2004.
  • The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey, Stewart Tansley, and Kristine Tolle, Microsoft Research, 2009. (PDF version available online)

Course Homepage:

Course Overview      Readings      Term Project

Course Outline (tentative):

  1. Introduction and Overview about Data Center and Cloud
  2. Data Parallel Programming Models
  3. Introduction to Hadoop
  4. MapReduce Runtime Management
  5. Algorithm Design and Implementation in MapReduce
  6. Consistency and Coordination
  7. Key-Value Structured Storage
  8. Enhancements to Hadoop/MapReduce
  9. Distributed File and Storage Systems
  10. Case Study

Grading Policies:

  • Class participation and contribution: 5%
  • Homework assignments, reading summary, and paper presentation: 45%
    • Programming assignments (10%)
    • Reading Summaries (20%)
    • Paper Presentation (15%)
  • Course Project: 50%
    • Proposal (10%)
    • Midterm Presentation (10%)
    • Final Presentation and Demo (15%)
    • Final Report (15%)

Note: Homework and programming assignments are due by 11:59pm of the due date (unless announced in class otherwise). Late homework (non-programming) will NOT be accepted. Late program penalty is 10% per day, according to the timestamp of your online submission. Only when verifiable extenuating circumstances can be demonstrated will extended assignment due dates be considered. Verifiable extenuating circumstances must be reasons beyond control of the students, such as illness or accidental injury. Poor performance in class is not an extenuating circumstance. Inform your instructor of the verifiable extenuating circumstances in advance or as soon as possible. In such situations, the date and nature of the extended due dates for the assignments will be decided by the instructor.

Attendance Policy:

Attendance is required. Students are responsible for any material covered in class. Lots of the materials covered in class will not be in the textbook. Announcements about homework, projects, programming assignments, etc. may be made in class or online or by emails. Students are encouraged to check the online WebCourses regularly.