Skip to content

rmbayer/iaa_2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Data Processing Module - Dan Zaratsian, March 2020


IAA Module - Session 1 - Distributed Services and Platform Overview

Slides


IAA Module - Session 2 - SQL and NoSQL Services

Slides

  • Hadoop 101
  • Intro to Apache Hive
  • Apache Hive Syntax and Schema Design
  • Intro to Apache HBase and Apache Phoenix (NoSQL)
  • Apache HBase Schema Design & Best Practices
  • Apache Phoenix Syntax
  • Intro to Apache SparkSQL
  • Apache SparkSQL
  • BigQuery (Serverless SQL)
  • Google Cloud Firestore (NoSQL)

Assignment - Due on Friday, March 27, 2020


IAA Module - Session 3 - Realtime, Streaming Systems

Slides

  • Apache Kafka
  • Google PubSub
  • Spark Streaming
  • Apache Beam (Google Dataflow)

IAA Module - Session 4 - Spark Data Processing & Machine Learning

Slides

  • Apache Spark Overview
  • Spark Machine Learning (MLlib)
  • ML Pipelines
  • Building and deploying Spark machine learning models
  • Considerations for ML in distributed environments
  • Spark Best Practices and Tuning

Assignment (Coming)


IAA Module - Session 5 - Serverless Technology

Slides

  • Intro to Google Cloud Platform
  • Overview of Serverless
  • Google Cloud Functions
  • Cloud Run
  • Industry trends & Applications
  • Walk-through of Tools and Services

IAA Module - Session 6 - Cloud Overview, Features and Demos / Special Topics

Slides This session will be used as an overflow from previous sessions. If extra time is needed or a deeper dive is required for specific content, then this session will be used for that.

  • Machine Learning APIs
  • GCP AutoML
  • GCP AI Platform

References

About

Institute for Advanced Analytics - Class of 2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •