ClassInfo

DSC 333 Introduction to Big Data Processing

Ahmed Abid

Spring 2023-2024
Class number: 35328
Section number: 901
Th 5:45PM - 9:00PM
CDM 00220 Loop Campus

Summary

This is a graduate course in large scale data mining applications. The map-reduce programming paradigm is a fundamental tool used in processing large data sets and is supported in tools such as Hadoop. In this course, you will learn the basics of map-reduce and its related technologies for big data processing.

Specific topics to be covered include:

  • Fundamentals of distributed file systems and MapReduce (MR) technology.
  • Advantages of an MR-based system compared to a relational database.
  • Tuning MR algorithm performance and tools for mining massive data sets.
  • Applications in clustering, similarity search, classification, data warehousing (e.g., Hive), machine learning (e.g., Mahout).
  • Modern Big Data frameworks such as Spark, Kafka, Flink, etc.



Texts

1. Mining of Massive Datasets, by Anand Rajaraman and Jeffrey D. Ullman

2. Hadoop: The Definitive Guide, by Tom White, O'Reilly Media, 4th edition 2015.



Prerequisites

If you are not sure that you have satisfied the prerequisites, speak to the instructor before the second lecture.

  • Prerequisite Courses: (CSC 453 or DSC 450) and (DSC 441 or CSC 480) or (MAT 491 and MAT 449).
  • Prior experience with Python.


School policies:

Changes to Syllabus

This syllabus is subject to change as necessary during the quarter. If a change occurs, it will be thoroughly addressed during class, posted under Announcements in D2L and sent via email.

Online Course Evaluations

Evaluations are a way for students to provide valuable feedback regarding their instructor and the course. Detailed feedback will enable the instructor to continuously tailor teaching methods and course content to meet the learning goals of the course and the academic needs of the students. They are a requirement of the course and are key to continue to provide you with the highest quality of teaching. The evaluations are anonymous; the instructor and administration do not track who entered what responses. A program is used to check if the student completed the evaluations, but the evaluation is completely separate from the student’s identity. Since 100% participation is our goal, students are sent periodic reminders over three weeks. Students do not receive reminders once they complete the evaluation. Students complete the evaluation online in CampusConnect.

Academic Integrity and Plagiarism

This course will be subject to the university's academic integrity policy. More information can be found at http://academicintegrity.depaul.edu/ If you have any questions be sure to consult with your professor.

All students are expected to abide by the University's Academic Integrity Policy which prohibits cheating and other misconduct in student coursework. Publicly sharing or posting online any prior or current materials from this course (including exam questions or answers), is considered to be providing unauthorized assistance prohibited by the policy. Both students who share/post and students who access or use such materials are considered to be cheating under the Policy and will be subject to sanctions for violations of Academic Integrity.

Academic Policies

All students are required to manage their class schedules each term in accordance with the deadlines for enrolling and withdrawing as indicated in the University Academic Calendar. Information on enrollment, withdrawal, grading and incompletes can be found at http://www.cdm.depaul.edu/Current%20Students/Pages/PoliciesandProcedures.aspx.

Students with Disabilities

Students who feel they may need an accommodation based on the impact of a disability should contact the instructor privately to discuss their specific needs. All discussions will remain confidential.
To ensure that you receive the most appropriate accommodation based on your needs, contact the instructor as early as possible in the quarter (preferably within the first week of class), and make sure that you have contacted the Center for Students with Disabilities (CSD) at:
Lewis Center 1420, 25 East Jackson Blvd.
Phone number: (312)362-8002
Fax: (312)362-6544
TTY: (773)325.7296