Learn Data Science

The Data Science curriculum teaches you core knowledge to become a data scientist, from basic concepts and methodologies to advanced algorithms.

Featured Courses

Getting Started with Data Science (Beta Version)

"Harvard Business Review recently called data science 'The Sexiest Job of the 21st Century.' “

It's not just sexy: for millions of managers and students who need to solve business problems with big data, it's indispensable...

Using an interview-style format, Dr. Murtaza Haider describes introductory topics about Data Science, and provides interesting examples of how Data Science is used in the real world such as.....

  • Do attractive professors get better teaching evaluations?
  • Does adding a washroom or bedroom get you more when selling your home?
  • Does taking Calcium/Magnesium together suggest pregnancy?
  • Does Religion have an impact on extra marital affairs?

 

Murtaza Haider offers careful, jargon-free coverage of basic theory and technique, backed with plenty of clear examples and practice opportunities....

 

Introduction to OpenRefine

This introduction course is for a less technical user, business analyst or consultant interested to learn data science.

This course cover the foundation of OpenRefine and its scripting language GREL. You will learn how to:

  • use the facet/filter feature to mine and discover data ;
  • leverage OpenRefine point and click transformation and fuzzy matching function for quick but powerful data cleaning ;
  • write complex transformation in GREL, OpenRefine script language ;
  • call API and parse results in Refine.

All materials and sofware used are FREE!

Spark Fundamentals I

Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.

Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Spark Fundamentals II

This is the second course in the Big Data University Spark curriculum, and expands concepts discussed in Spark Fundamentals. This course covers Spark’s architecture and goes in-depth on how data is distributed and tasks are parallelized. Students will have a better understanding for how to optimize their data for joins, using Spark’s memory caching, and use the more advanced operations available in the API.

This course was developed with the support of:

MetiStreamLogo MetiStream, Inc. (metistream.com), experts in Apache Spark implementations and training
ibm-logo-blu.transparent_background IBM Analytics (ibm.com/analytics) helps you make better decisions by gleaning new insights from the volume and variety of big data.

 

Machine Learning and Scala (BETA)

An introduction to how to perform standard machine learning approaches in Scala, such as Receiver Operating Characteristic (ROC) curve, feature engineering and more.

Data Science Methodology

How does a data scientist think? What are the major steps involved in tackling a data science problem? In Data Science Methodology, John Rollins (Ph.D., Data Scientist at IBM) describes the major steps involved in practicing data science, with interesting real-world examples at each step: from forming a concrete business or research problem, to collecting and analyzing data, to building a model, and understanding the feedback after model deployment. This course is offered for free.

Introduction to Data Analysis using R

Learn how to tackle data analysis problems using the powerful open source language R. The course will take you from learning the basics of R to using it to explore many different types of data. You will learn how to prepare data for analysis, compute various statistical measures, create meaningful data visualizations, create reusable R functions, create R models to predict expected future outcomes, and more!

Watson Analytics Fundamentals

Learn the fundamentals of Watson Analytics. Watson Analytics offers you the benefits of advanced analytics without the complexity. A smart data discovery service available on the cloud, it guides data exploration, automates predictive analytics and enables effortless dashboard and infographic creation. You can get answers and new insights to make confident decisions in minutes—all on your own.

Introduction to NoSQL and DBaaS

Are you building a new application and want to utilize an operational datastore that has a flexible schema for fast and simple development? Do you need to ensure your entire application stack can scale elastically to accommodate a fast-growing dataset and a surge in concurrent users? Are you struggling with the management of an existing datastore and want to offload administration to a service provider? Do you require high availability and disaster recovery redundancy across nodes, data centers, geographies or asynchronous mobile/client access to application data?

If you answered yes to any of the questions above, then you have probably started to explore NoSQL and/or Database-as-a-Service offerings. In this NoSQL course, we will provide an overview of the NoSQL database landscape, the benefits of using a Database-as-a-Service offering, and where Cloudant fits into the picture. Additionally, we’ll get you started with using Cloudant by providing tutorials on account sign up, creating and replicating databases, loading and querying data, and conclude by pointing you to additional resources to continue on your education.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program. If this course is part of the IBM Open Badges program, you will be awarded the badge upon the completion of the badge criteria. Please visit this page to find out more (http://bigdatauniversity.com/bdu-badge/). By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

Introduction to R

With over 2 million users worldwide R is rapidly becoming the leading programming language in statistics and data science. Every year, the number of R users grows by 40%, and an increasing number of organizations are using it in their day-to-day activities.

In this introduction to R, you will master the basics of this beautiful open source language, such as factors, lists and data frames. With the knowledge gained in this course, you will be ready to undertake your first very own data analysis.

Special offer by Datacamp:  

Complete this course through Big Data University, and gain free access to the entire DataCamp catalog of courses for two weeks!

Introduction to MapReduce Programming

This course explains the use of the mapper and reducer classes that make up a MapReduce application and where they get invoked in the application process. The student is walked through the development of a simple MapReduce application using a development environment based on Eclipse. Then the student goes through the same process but using a MapReduce development wizard to speed up development

Audience: Hadoop programmer beginners

Time to complete: 5 hours

Available in: English

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

This course was recently tested and updated for BigInsights Quick Start 4.0

http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/

 

SQL Fundamentals

About this SQL course

Most software written today relies on relational databases such as MySQL or DB2. Within this free SQL course, you'll learn the basics of the relational database model and the SQL language using DB2 Express-C, the free version of IBM DB2 database server. You will learn SQL and how to create, read, update and delete data from a database. This SQL tutorial is aimed at beginners, but it will give you enough information to get you started working with databases.

Audience: Database Beginners Time to complete: 5 hours Available in: English

Hadoop Fundamentals I

Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. This Hadoop course is entirely free, and so are the materials and software provided. This is the third version of our most popular Hadoop course. Since Version 2 was published, several more detailed courses covering topics such as MapReduce, Hive, HBase, Pig, Oozie, and Zookeeper have been added.  We recommend you start here and then dig deeper into the specific Hadoop technology you wish to learn more about.

Learn Hadoop

This Hadoop course is designed to give you a basic understanding of key Big Data technologies. In this Hadoop tutorial, we first begin with describing what Big Data is and the need for Hadoop to be able to process that data in a timely manner. This is followed by describing the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) both from the command line and using the BigInsights Console that is supplied with InfoSphere BigInsights.

This Hadoop course was recently tested and updated for BigInsights Quick Start 4.1 (IBM's edition of Hadoop).


 

 Explorer Badge Earn an IBM Badge and Share it on Social Media!

By completing this course and passing the assessment, you will earn an IBM Badge! IBM's open badges provide an online representation of skills and achievements, allowing you to share your credential to LinkedIn, Facebook and Twitter. Or download and add to your badge to your e-mails and websites.

When someone clicks on your badge, they link to metadata describing your qualifications along with the process required to earn them. This additional level of information helps tell a more complete and compelling story of your achievements!

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling in this Hadoop course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Text mining in action: Analyzing Twitter data for Democratic General Elections (BETA Version)

In the last two decades the world has seen an incredible shift in text media from print-based to web-based

  • Literature has been made increasingly available online through resources like Google Books, Project Gutenberg, etc.
  • News organizations have responded by providing online resources with real-time newsfeeds
  • Social Media has become ubiquitous and enables an unprecedented access to the thoughts and opinions of ordinary people
  • What does this mean for us as data scientists? – More data!
    – More easily accessible data!
    – More opportunities for valuable analysis

Advanced Data Science and Scala (BETA)

Delve into deeper topics in Data Science with Scala, such as Random Forest and Support Vector Machines

Getting Started with Data Science (Beta Version)

"Harvard Business Review recently called data science 'The Sexiest Job of the 21st Century.' “

It's not just sexy: for millions of managers and students who need to solve business problems with big data, it's indispensable...

Using an interview-style format, Dr. Murtaza Haider describes introductory topics about Data Science, and provides interesting examples of how Data Science is used in the real world such as.....

  • Do attractive professors get better teaching evaluations?
  • Does adding a washroom or bedroom get you more when selling your home?
  • Does taking Calcium/Magnesium together suggest pregnancy?
  • Does Religion have an impact on extra marital affairs?

 

Murtaza Haider offers careful, jargon-free coverage of basic theory and technique, backed with plenty of clear examples and practice opportunities....

 

SQL Fundamentals

About this SQL course

Most software written today relies on relational databases such as MySQL or DB2. Within this free SQL course, you'll learn the basics of the relational database model and the SQL language using DB2 Express-C, the free version of IBM DB2 database server. You will learn SQL and how to create, read, update and delete data from a database. This SQL tutorial is aimed at beginners, but it will give you enough information to get you started working with databases.

Audience: Database Beginners Time to complete: 5 hours Available in: English

Introduction to MapReduce Programming

This course explains the use of the mapper and reducer classes that make up a MapReduce application and where they get invoked in the application process. The student is walked through the development of a simple MapReduce application using a development environment based on Eclipse. Then the student goes through the same process but using a MapReduce development wizard to speed up development

Audience: Hadoop programmer beginners

Time to complete: 5 hours

Available in: English

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

This course was recently tested and updated for BigInsights Quick Start 4.0

http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/

 

Watson Analytics Fundamentals

Learn the fundamentals of Watson Analytics. Watson Analytics offers you the benefits of advanced analytics without the complexity. A smart data discovery service available on the cloud, it guides data exploration, automates predictive analytics and enables effortless dashboard and infographic creation. You can get answers and new insights to make confident decisions in minutes—all on your own.

Spark Fundamentals II

This is the second course in the Big Data University Spark curriculum, and expands concepts discussed in Spark Fundamentals. This course covers Spark’s architecture and goes in-depth on how data is distributed and tasks are parallelized. Students will have a better understanding for how to optimize their data for joins, using Spark’s memory caching, and use the more advanced operations available in the API.

This course was developed with the support of:

MetiStreamLogo MetiStream, Inc. (metistream.com), experts in Apache Spark implementations and training
ibm-logo-blu.transparent_background IBM Analytics (ibm.com/analytics) helps you make better decisions by gleaning new insights from the volume and variety of big data.

 

Advanced Data Science and Scala (BETA)

Delve into deeper topics in Data Science with Scala, such as Random Forest and Support Vector Machines

Data Science Methodology

How does a data scientist think? What are the major steps involved in tackling a data science problem? In Data Science Methodology, John Rollins (Ph.D., Data Scientist at IBM) describes the major steps involved in practicing data science, with interesting real-world examples at each step: from forming a concrete business or research problem, to collecting and analyzing data, to building a model, and understanding the feedback after model deployment. This course is offered for free.

Introduction to OpenRefine

This introduction course is for a less technical user, business analyst or consultant interested to learn data science.

This course cover the foundation of OpenRefine and its scripting language GREL. You will learn how to:

  • use the facet/filter feature to mine and discover data ;
  • leverage OpenRefine point and click transformation and fuzzy matching function for quick but powerful data cleaning ;
  • write complex transformation in GREL, OpenRefine script language ;
  • call API and parse results in Refine.

All materials and sofware used are FREE!

Hadoop Fundamentals I

Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. This Hadoop course is entirely free, and so are the materials and software provided. This is the third version of our most popular Hadoop course. Since Version 2 was published, several more detailed courses covering topics such as MapReduce, Hive, HBase, Pig, Oozie, and Zookeeper have been added.  We recommend you start here and then dig deeper into the specific Hadoop technology you wish to learn more about.

Learn Hadoop

This Hadoop course is designed to give you a basic understanding of key Big Data technologies. In this Hadoop tutorial, we first begin with describing what Big Data is and the need for Hadoop to be able to process that data in a timely manner. This is followed by describing the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) both from the command line and using the BigInsights Console that is supplied with InfoSphere BigInsights.

This Hadoop course was recently tested and updated for BigInsights Quick Start 4.1 (IBM's edition of Hadoop).


 

 Explorer Badge Earn an IBM Badge and Share it on Social Media!

By completing this course and passing the assessment, you will earn an IBM Badge! IBM's open badges provide an online representation of skills and achievements, allowing you to share your credential to LinkedIn, Facebook and Twitter. Or download and add to your badge to your e-mails and websites.

When someone clicks on your badge, they link to metadata describing your qualifications along with the process required to earn them. This additional level of information helps tell a more complete and compelling story of your achievements!

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling in this Hadoop course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Introduction to NoSQL and DBaaS

Are you building a new application and want to utilize an operational datastore that has a flexible schema for fast and simple development? Do you need to ensure your entire application stack can scale elastically to accommodate a fast-growing dataset and a surge in concurrent users? Are you struggling with the management of an existing datastore and want to offload administration to a service provider? Do you require high availability and disaster recovery redundancy across nodes, data centers, geographies or asynchronous mobile/client access to application data?

If you answered yes to any of the questions above, then you have probably started to explore NoSQL and/or Database-as-a-Service offerings. In this NoSQL course, we will provide an overview of the NoSQL database landscape, the benefits of using a Database-as-a-Service offering, and where Cloudant fits into the picture. Additionally, we’ll get you started with using Cloudant by providing tutorials on account sign up, creating and replicating databases, loading and querying data, and conclude by pointing you to additional resources to continue on your education.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program. If this course is part of the IBM Open Badges program, you will be awarded the badge upon the completion of the badge criteria. Please visit this page to find out more (http://bigdatauniversity.com/bdu-badge/). By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

Machine Learning and Scala (BETA)

An introduction to how to perform standard machine learning approaches in Scala, such as Receiver Operating Characteristic (ROC) curve, feature engineering and more.

Introduction to R

With over 2 million users worldwide R is rapidly becoming the leading programming language in statistics and data science. Every year, the number of R users grows by 40%, and an increasing number of organizations are using it in their day-to-day activities.

In this introduction to R, you will master the basics of this beautiful open source language, such as factors, lists and data frames. With the knowledge gained in this course, you will be ready to undertake your first very own data analysis.

Special offer by Datacamp:  

Complete this course through Big Data University, and gain free access to the entire DataCamp catalog of courses for two weeks!

Introduction to Data Analysis using R

Learn how to tackle data analysis problems using the powerful open source language R. The course will take you from learning the basics of R to using it to explore many different types of data. You will learn how to prepare data for analysis, compute various statistical measures, create meaningful data visualizations, create reusable R functions, create R models to predict expected future outcomes, and more!

Spark Fundamentals I

Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.

Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Text mining in action: Analyzing Twitter data for Democratic General Elections (BETA Version)

In the last two decades the world has seen an incredible shift in text media from print-based to web-based

  • Literature has been made increasingly available online through resources like Google Books, Project Gutenberg, etc.
  • News organizations have responded by providing online resources with real-time newsfeeds
  • Social Media has become ubiquitous and enables an unprecedented access to the thoughts and opinions of ordinary people
  • What does this mean for us as data scientists? – More data!
    – More easily accessible data!
    – More opportunities for valuable analysis

Big Data University also offers a vast number of courses on various other analytics, big data, and data science topics. View our complete course catalog.

What is Big Data University?

An IBM community initiative, Big Data University is the world’s best education on big data. Learn about big data, data science and analytic technologies from experts using hands-on exercises and interactive videos. Best of all, it’s completely free.