“Big Data” analysis is a hot and highly valuable skill. Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created in real time, all the time? Whether it’s clickstream data from a big website, sensor data from a massive “Internet of Things” deployment, financial data, or something else – Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time.
This workshop gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You’ll learn how to write and run real Spark Streaming jobs through live examples.
By the end of this workshop, you’ll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You’ll be surprised at how easy Spark Streaming makes it!
- Process massive streams of real-time data using Spark Streaming
- Create Spark applications using the Scala programming language
- Integrate Spark Streaming with data sources, likeCassandra and Kafka
- Output transformed real-time data to Cassandraor file systems
- Integrate Spark Streaming with Spark SQL to query streaming data in real time
- Train machine learning models with streaming data, and use those models for real-time predictions
- Ingest Apache access log data and transform streams of it
- Receive real-time streams of Twitter feeds
- Maintain stateful data across a continuous stream of input data
- Query streaming data across sliding windows of time
Students with some prior programming or scripting ability SHOULD take this course.
If you’re working for a company with “big data” that is being generated continuously, or want to jump into the domain of Big Data Analytics, this course is for you.
Students with no prior software engineering or programming experience should seek an introductory programming course first.