- Design distributed systems that manage "big data" using Hadoop and related technologies.
- Use HDFS and MapReduce for storing and analyzing data at scale.
- Use Pig to create scripts to process data on a Hadoop cluster in more complex ways.
- Analyze relational data using Hive and MySQL
- Analyze non-relational data using HBase, MongoDB if needed
- Choose an appropriate data storage technology for your application
- Understand how Hadoop clusters are managed by YARN, Zookeeper, Hue, and Oozie.
- You will need access to a PC running 64-bit Windows, MacOS, or Linux with an Internet connection, if you want to participate in the hands-on activities and exercises. You must have at least 8GB of free RAM on your system; 10GB or more is recommended. If your PC does not meet these requirements, you can still follow along in the course without doing hands-on activities.
- Some activities (not all) will require some prior programming experience, preferably in Python.
- A basic familiarity with the Unix command line will be very helpful.