Learn Big Data

Big Data is one of the hottest topics in the tech world. Here's your chance to learn Big Data by taking the carefully selected courses below.

We start off with a Big Data Fundamentals course which will help you get the big picture, understand what Big Data is, the different types of Big Data available, as well as the kind of Big Data problems affecting most industries.

The other courses in this track will teach you about Map-Reduce, the key concept that enables us to tackle Big Data problems through the magic of parallelism. We'll then move on to Hadoop and Spark, two of the key Map-Reduce frameworks to handle large volumes of data. The two Spark courses will really take your ability to handle Big Data to the next level.

Finally, in order to give you a well-rounded understanding of the Big Data ecosystem, you'll learn about IBM Watson Analytics and Cloud-based NoSQL databases.

Watson Analytics is a business tool that can greatly aid programmers and non-programmers alike with visualization and data analysis, by making exploration and prototyping easy and intuitive. Our NoSQL introduction on the other hand, will teach you about an emerging category of databases, why they are different from traditional, relational databases, and how to use them for document-based and key-value type of data.

If you are interested in a Big Data career, we recommend starting with this foundational Big Data learning path.

Featured Courses

Big Data Fundamentals

This course presents a holistic approach to Big Data, taking both a top-down and a bottom-up approach to questions such as: What is Big Data? How do we tackle Big Data? Why are we interested in it? What is a Big Data platform?

The course emphasizes that we study Big Data to gain insight that will be used to get  people throughout the enterprise to run the business better and to provide better service to customers. Rather than a implementation of a single open-source systems such as Hadoop, the course recommends that Big Data should be processed in a platform that can handle the variety, velocity, and volume of data by using a family of components that require integration and data governance.  Big Data is NoHadoop (“not only Hadoop”) as well as NoSQL (“not only SQL”).

 Explorer Badge Earn an IBM Badge and Share it in Social Media!

By completing this course and passing the assessment, you will earn an IBM Badge! IBM's open badges provide an online representation of skills and achievements, allowing you to share your credential to LinkedIn, Facebook and Twitter. Or download and add to your badge to your e-mails and websites. When someone clicks on your badge, they link to metadata describing your qualifications along with the process required to earn them. This additional level of information helps tell a more complete and compelling story of your achievements!

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more! See more at: http://bigdatauniversity.com/bdu-wp/bdu-course/big-data-fundamentals/#sthash.8I4Q9MVx.dpuf

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

This course was tested and validated for v3 of the QSE VM.

Please download the v3 VM to complete the lab exercises.

https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-beta-iibob

Spark Fundamentals II

This is the second course in the Big Data University Spark curriculum, and expands concepts discussed in Spark Fundamentals. This course covers Spark’s architecture and goes in-depth on how data is distributed and tasks are parallelized. Students will have a better understanding for how to optimize their data for joins, using Spark’s memory caching, and use the more advanced operations available in the API.

This course was developed with the support of:

MetiStreamLogo MetiStream, Inc. (metistream.com), experts in Apache Spark implementations and training
ibm-logo-blu.transparent_background IBM Analytics (ibm.com/analytics) helps you make better decisions by gleaning new insights from the volume and variety of big data.

 

Introduction to MapReduce Programming

This course explains the use of the mapper and reducer classes that make up a MapReduce application and where they get invoked in the application process. The student is walked through the development of a simple MapReduce application using a development environment based on Eclipse. Then the student goes through the same process but using a MapReduce development wizard to speed up development

Audience: Hadoop programmer beginners

Time to complete: 5 hours

Available in: English

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

This course was recently tested and updated for BigInsights Quick Start 4.0

http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/

 

Watson Analytics Fundamentals

Learn the fundamentals of Watson Analytics. Watson Analytics offers you the benefits of advanced analytics without the complexity. A smart data discovery service available on the cloud, it guides data exploration, automates predictive analytics and enables effortless dashboard and infographic creation. You can get answers and new insights to make confident decisions in minutes—all on your own.

Hadoop Fundamentals I

Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. This Hadoop course is entirely free, and so are the materials and software provided. This is the third version of our most popular Hadoop course. Since Version 2 was published, several more detailed courses covering topics such as MapReduce, Hive, HBase, Pig, Oozie, and Zookeeper have been added.  We recommend you start here and then dig deeper into the specific Hadoop technology you wish to learn more about.

Learn Hadoop

This Hadoop course is designed to give you a basic understanding of key Big Data technologies. In this Hadoop tutorial, we first begin with describing what Big Data is and the need for Hadoop to be able to process that data in a timely manner. This is followed by describing the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) both from the command line and using the BigInsights Console that is supplied with InfoSphere BigInsights.

This Hadoop course was recently tested and updated for BigInsights Quick Start 4.1 (IBM's edition of Hadoop).


 

 Explorer Badge Earn an IBM Badge and Share it on Social Media!

By completing this course and passing the assessment, you will earn an IBM Badge! IBM's open badges provide an online representation of skills and achievements, allowing you to share your credential to LinkedIn, Facebook and Twitter. Or download and add to your badge to your e-mails and websites.

When someone clicks on your badge, they link to metadata describing your qualifications along with the process required to earn them. This additional level of information helps tell a more complete and compelling story of your achievements!

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling in this Hadoop course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Introduction to NoSQL and DBaaS

Are you building a new application and want to utilize an operational datastore that has a flexible schema for fast and simple development? Do you need to ensure your entire application stack can scale elastically to accommodate a fast-growing dataset and a surge in concurrent users? Are you struggling with the management of an existing datastore and want to offload administration to a service provider? Do you require high availability and disaster recovery redundancy across nodes, data centers, geographies or asynchronous mobile/client access to application data?

If you answered yes to any of the questions above, then you have probably started to explore NoSQL and/or Database-as-a-Service offerings. In this NoSQL course, we will provide an overview of the NoSQL database landscape, the benefits of using a Database-as-a-Service offering, and where Cloudant fits into the picture. Additionally, we’ll get you started with using Cloudant by providing tutorials on account sign up, creating and replicating databases, loading and querying data, and conclude by pointing you to additional resources to continue on your education.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program. If this course is part of the IBM Open Badges program, you will be awarded the badge upon the completion of the badge criteria. Please visit this page to find out more (http://bigdatauniversity.com/bdu-badge/). By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

Spark Fundamentals I

Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.

Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Big Data Fundamentals

This course presents a holistic approach to Big Data, taking both a top-down and a bottom-up approach to questions such as: What is Big Data? How do we tackle Big Data? Why are we interested in it? What is a Big Data platform?

The course emphasizes that we study Big Data to gain insight that will be used to get  people throughout the enterprise to run the business better and to provide better service to customers. Rather than a implementation of a single open-source systems such as Hadoop, the course recommends that Big Data should be processed in a platform that can handle the variety, velocity, and volume of data by using a family of components that require integration and data governance.  Big Data is NoHadoop (“not only Hadoop”) as well as NoSQL (“not only SQL”).

 Explorer Badge Earn an IBM Badge and Share it in Social Media!

By completing this course and passing the assessment, you will earn an IBM Badge! IBM's open badges provide an online representation of skills and achievements, allowing you to share your credential to LinkedIn, Facebook and Twitter. Or download and add to your badge to your e-mails and websites. When someone clicks on your badge, they link to metadata describing your qualifications along with the process required to earn them. This additional level of information helps tell a more complete and compelling story of your achievements!

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more! See more at: http://bigdatauniversity.com/bdu-wp/bdu-course/big-data-fundamentals/#sthash.8I4Q9MVx.dpuf

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

This course was tested and validated for v3 of the QSE VM.

Please download the v3 VM to complete the lab exercises.

https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-beta-iibob

Spark Fundamentals I

Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.

Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Introduction to NoSQL and DBaaS

Are you building a new application and want to utilize an operational datastore that has a flexible schema for fast and simple development? Do you need to ensure your entire application stack can scale elastically to accommodate a fast-growing dataset and a surge in concurrent users? Are you struggling with the management of an existing datastore and want to offload administration to a service provider? Do you require high availability and disaster recovery redundancy across nodes, data centers, geographies or asynchronous mobile/client access to application data?

If you answered yes to any of the questions above, then you have probably started to explore NoSQL and/or Database-as-a-Service offerings. In this NoSQL course, we will provide an overview of the NoSQL database landscape, the benefits of using a Database-as-a-Service offering, and where Cloudant fits into the picture. Additionally, we’ll get you started with using Cloudant by providing tutorials on account sign up, creating and replicating databases, loading and querying data, and conclude by pointing you to additional resources to continue on your education.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program. If this course is part of the IBM Open Badges program, you will be awarded the badge upon the completion of the badge criteria. Please visit this page to find out more (http://bigdatauniversity.com/bdu-badge/). By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

Introduction to MapReduce Programming

This course explains the use of the mapper and reducer classes that make up a MapReduce application and where they get invoked in the application process. The student is walked through the development of a simple MapReduce application using a development environment based on Eclipse. Then the student goes through the same process but using a MapReduce development wizard to speed up development

Audience: Hadoop programmer beginners

Time to complete: 5 hours

Available in: English

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

This course was recently tested and updated for BigInsights Quick Start 4.0

http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/

 

Spark Fundamentals II

This is the second course in the Big Data University Spark curriculum, and expands concepts discussed in Spark Fundamentals. This course covers Spark’s architecture and goes in-depth on how data is distributed and tasks are parallelized. Students will have a better understanding for how to optimize their data for joins, using Spark’s memory caching, and use the more advanced operations available in the API.

This course was developed with the support of:

MetiStreamLogo MetiStream, Inc. (metistream.com), experts in Apache Spark implementations and training
ibm-logo-blu.transparent_background IBM Analytics (ibm.com/analytics) helps you make better decisions by gleaning new insights from the volume and variety of big data.

 

Hadoop Fundamentals I

Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. This Hadoop course is entirely free, and so are the materials and software provided. This is the third version of our most popular Hadoop course. Since Version 2 was published, several more detailed courses covering topics such as MapReduce, Hive, HBase, Pig, Oozie, and Zookeeper have been added.  We recommend you start here and then dig deeper into the specific Hadoop technology you wish to learn more about.

Learn Hadoop

This Hadoop course is designed to give you a basic understanding of key Big Data technologies. In this Hadoop tutorial, we first begin with describing what Big Data is and the need for Hadoop to be able to process that data in a timely manner. This is followed by describing the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) both from the command line and using the BigInsights Console that is supplied with InfoSphere BigInsights.

This Hadoop course was recently tested and updated for BigInsights Quick Start 4.1 (IBM's edition of Hadoop).


 

 Explorer Badge Earn an IBM Badge and Share it on Social Media!

By completing this course and passing the assessment, you will earn an IBM Badge! IBM's open badges provide an online representation of skills and achievements, allowing you to share your credential to LinkedIn, Facebook and Twitter. Or download and add to your badge to your e-mails and websites.

When someone clicks on your badge, they link to metadata describing your qualifications along with the process required to earn them. This additional level of information helps tell a more complete and compelling story of your achievements!

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling in this Hadoop course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Watson Analytics Fundamentals

Learn the fundamentals of Watson Analytics. Watson Analytics offers you the benefits of advanced analytics without the complexity. A smart data discovery service available on the cloud, it guides data exploration, automates predictive analytics and enables effortless dashboard and infographic creation. You can get answers and new insights to make confident decisions in minutes—all on your own.

Our learning paths are structured so as to enable you to linearly progress through each course. We suggest taking them in the order presented. Big Data University also offers a vast number of other courses on various analytics, big data, data science, and programming topics. Check out our other learning paths and our course catalog.

What is Big Data University?

An IBM community initiative, Big Data University is the world’s best education on big data. Learn about big data, data science and analytic technologies from experts using hands-on exercises and interactive videos. Best of all, it’s completely free.