UCSC CMPS290H Large scale data integration – Text Analytics

Take our free course


UCSC CMPS290H Large scale data integration - Text Analytics

with Gary Robinson

UCSC students enrolled in course CMPS290H

Time to complete:
5 hours

Available in:

This course is offered by UC Santa Cruz as part of the Large scale data integration - Text Analytics course.   This course will teach you the basics of Text Analytics:  how to retrieve relevant text from structured, semi-structured or unstructured documents based on criteria you define. It uses a complete case study.  Apply these same criteria to big data by running them on top of a Hadoop cluster! All materials and software used are FREE!

Course Syllabus

  • Text analytics overview
  • Introducing AQL
  • Setting up a Text Analytics project in Eclipse
  • Extracting business information with AQL
  • Extracting paragraphs using a dictionary
  • Extracting sentences using split
  • Final processing and review of case study solution
  • Using UNION and CASE
  • Regular expressions in more detail (Optional)

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be taken as many times as you wish.
  • Labs can be performed on the Cloud, or using a 64-bit system. If using a 64-bit system, you can install the required software (Linux-only), or use the supplied VMWare image. More details are provided in the course.
  • Students passing the course (by passing the final exam) will have immediate access to printing their online certificate of achievement. Your name in the certificate will appear exactly as entered in your profile in BigDataUniversity.com.
  • If you did not pass the course, you can take it again at any time.


  • None

Recommended skills prior to taking this course

Grading Scheme

This course is not graded. It is used in support of the UCSC - CMPS290H university course which has its own grading scheme.