[Kaggle] Titanic: Feature Engineering and CForest (Accuracy = 0.81339)

Posted on Leave a commentPosted in Machine Learning, Project, R/Rstudio

This method was my first approach with the titanic data set. For some parts, I used this tutorial. The final accuracy was 0.81339 0 – Load libraries

  1 – Append test and training data sets 2 – Fare 2.1 – Create categories from Fare We create 3 categories: “30+”: Fare that are above […]

Tableau Training and Tutorials: Syllabus

Posted on Leave a commentPosted in Data Visualization, Tableau, Training

1 – Getting Started Getting Started The Tableau Interface Distributing and Publishing   2 – Connecting to Data Getting Started with Data Managing Metadata Managing Extracts Saving and Publishing Data Sources Data Prep with Text and Excel Files Join Types with Union Cross-database Joins Data Blending Additional Data Blending Topics Connecting to PDFs Connecting to […]

ESS 450 – Apache Pig Essentials: Syllabus

Posted on Leave a commentPosted in Hadoop, Training

Next Steps This course prepares you for DA 450 – Transform Data with Apache Pig. This curriculum will help prepare you for the MapR Certified Data Analyst (MCDA) certification exam. What’s Covered? Introduction to Apache Pig Define Apache Pig Bonus Activity 1.1: Connect to the Grunt Shell Describe How Apache Pig Fits in the Data Pipeline Understand Data Types in […]

ESS 440 – Apache Hive Essentials: Syllabus

Posted on Leave a commentPosted in Hadoop, Training

Next Steps This course prepares you for DA 440 – Query and Store Data with Apache Hive. This curriculum will help prepare you for the MapR Certified Data Analyst (MCDA) certification exam. What’s Covered? Introduction to Apache Hive Introduction to Apache Hive Define Apache Hive Bonus Activity 1.1: Connect to the Hive CLI Explain Apache Hive Use Cases Describe […]

ESS 400 – Apache Drill Essentials: Syllabus

Posted on Leave a commentPosted in Hadoop, Training

Next Steps This course prepares you for DA 400 – SQL Analytics with Apache Drill. This curriculum will help prepare you for the MapR Certified Data Analyst (MCDA) certification exam. What’s Covered? Introduction to Apache Drill Describe Apache Drill Explore Key Features of Apache Drill Bonus Activity 1.2a: Explore the Drill SQL Interfaces Bonus Activity 1.2b: Perform Drill SQL […]

ESS 101 – Apache Hadoop Essentials: Syllabus

Posted on Leave a commentPosted in Hadoop, Training

Lesson 3 – Core Elements of Apache Hadoop Compare and Contrast Local and Distributed File Systems Explain Data Management in the Hadoop File System Summarize the MapReduce Algorithm Lesson 4 – The Apache Hadoop Ecosystem Define the Apache Hadoop Ecosystem Components: Administration: ZooKeeper, YARN Ingestion: Flume, Oozie, Sqoop Processing: Spark, HBase, Pig Analysis: Hive, Drill, […]

MITx – 15.071x The Analytics Edge: Syllabus

Posted on Leave a commentPosted in Data Visualization, Machine Learning, R/Rstudio, Training

Unit 1: An Introduction to Analytics Welcome to Unit 1 Initial Evaluation The Analytics Edge: Intelligence, Happiness, and Health (Lecture Sequence) Working with Data: An Introduction to R Understanding Food: Nutritional Education with Data (Recitation) Assignment 1 Unit 2: Linear Regression Welcome to Unit 2 The Statistical Sommelier: An Introduction to Linear Regression Moneyball: The […]

MinesTelecom Fundamentals for Big Data: Syllabus

Posted on Leave a commentPosted in Data Visualization, Machine Learning, Python, R/Rstudio, Training

Week 1 Python – Part 1 Learning objectives Use the Python environment (as well as the interpreter) to write and run programs. Use the standard Python library and its modules in basic programs. Use containers, connections, and loops in Python programs. Use functions in writing Python programs. Write object-oriented Python programs using classes and their […]

Essential Design Principles for Tableau: Main Principles

Posted on Leave a commentPosted in Data Visualization, Tableau, Training

Cognitive vs Perceptual Cognitive: Automatic and immediate perception Example: Notice dot near cluster of other dots Perceptual: Slower and more deliberate cognition Example: is that dot an outlier worthy of investigation ? Best Practices Bar grapsh have traditionnally zero at the base line :misleading, exagerate variations Inverting the axis: inversing the perception of an increase […]

Creating Dashboards and Storytelling with Tableau: Best Practices

Posted on Leave a commentPosted in Data Visualization, Tableau, Training

The 3 Cs of storytelling: Context Challenge Conclusion 2 Important considerations Expressiveness: Do you have data to express story accurately Effectiveness: Does presentation style effectively convey data’s meaning Becoming a great storyteller: Cultivate critical thinking and empathy Be like investigative journalist or detective Ask questions, starting with stakeholders good practice Inventory stakeholder requirement Readings http://pages.ucsd.edu/~aronatas/project/academic/dawes%20on%20narratives.pdf […]

Creating Dashboards and Storytelling with Tableau: Syllabus

Posted on Leave a commentPosted in Data Visualization, Tableau, Training

Week 1 :Planning and Preproduction: Aligning your Audience, Stakeholders, and Data Objectives: Define what a story is Build a basic framework for any story Determine the who, what, why, and how of the story Discover the importance of planning before you begin Assess your stakeholders and audience to find the right story in the data […]

Essential Design Principles for Tableau: Control Charts

Posted on Leave a commentPosted in Data Visualization, Tableau, Training

Control Chart Theory Generally speaking, control charts are a graphical and statistical tools used to monitor the variation of a system,  and determine if these variations are significant – then the causes must be investigated – or not. Control Charts in 3 steps: Choose a metric that will be used to monitor the system Determine […]

Essential Design Principles in Tableau: Syllabus

Posted on Leave a commentPosted in Data Visualization, Tableau

Course Summary Week 1 : Getting Started in Effective and Ineffective Visuals Identify various types of visualizations in Tableau. Discuss the role of ethics in data visualization. Examine and improve an ineffective visualization. Getting Started and How the Human Brain Perceives Our Surroundings Course Introduction The Human Brain and Data Visualization Cognitive vs Perceptual Design […]

Fundamentals of Visualization with Tableau: Syllabus

Posted on Leave a commentPosted in Data Visualization, Tableau, Training

  Course Description Week 1 : Getting started & Introduction to Data Visualization Objectives: Discuss why we visualize data Define terminology related to visualization Identify what types of software options are available to do visualizations Operate installation procedures to install the Tableau Public software on your computer Create a visualization Summary: Introduction to Data Visualization […]