Spark Fundamentals I

Take our free Spark Fundamentals I

Spark Fundamentals II (for Cognizant)

Spark Fundamentals I

with Henry Quach, Alan Barnes

Audience:
Data scientists, engineers, or anyone who is interested in learning about Spark.

Time to complete:
03:00

Available in:
English

Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.

Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!

Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program.  By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.

 

Course Syllabus

After completing this course, you should be able to:

  • Describe what Spark is all about know why you would want to use Spark
  • Use Resilient Distributed Datasets operations
  • Use Scala, Java, or Python to create and run a Spark application
  • Creating applications using Spark SQL, MLlib, Spark Streaming, and GraphX
  • Configure, monitor and tune Spark

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be taken as many times as you wish.
  • Students passing the course (by passing the final exam) will have immediate access to printing their online certificate of achievement.
  • Your name in the certificate will appear exactly as entered in your profile in BigDataUniversity.com.
  • If you did not pass the course, you can take it again at any time.

Pre-requisites

Before taking this course, you should have the following background:

  • Have taken the Hadoop Fundamentals – Version 3 on Big Data University

Recommended skills prior to taking this course

  • Basic understanding of Apache Hadoop and Big Data.
  • Basic Linux Operating System knowledge
  • Basic understanding of the Scala, Python, or Java programming languages.

Grading Scheme

The minimum passing mark for the course is 60%, where the final test is worth 100% of the course mark. You have 3 attempts to take the test