You are here

Data Science

Data Science


General Assembly
10 E 21st St
3rd Fl
New York, NY 10010
Register for Course
Tuesday, May 21, 2019 - 2:30pm


This is a part-time course. Skills & Tools: Use Python to mine datasets and predict patterns. Production Standard: Build statistical models--regression and classification--that generate usable information from raw data. The Big Picture: Master the basics of machine learning and harness the power of data to forecast what's next. What You'll Learn: Unit 1: Research Design and Exploratory Data Analysis What is Data Science Describe course syllabus and establish the classroom environment Answer the questions: "What is Data Science? What roles exist in Data Science?" Define the workflow, tools and approaches data scientists use to analyze data Research Design and Pandas Define a problem and identify appropriate data sets using the data science workflow Walkthrough the data science workflow using a case study in the Pandas library Import, format and clean data using the Pandas Library Statistics Fundamental I Use NumPy and Pandas libraries to analyze datasets using basic summary statistics: mean, median, mode, max, min, quartile, inter-quartile, range, variance, standard deviation and correlation Create data visualization scatter plots, scatter matrix, line graph, box blots, and histograms to discern characteristics and trends in a dataset Identify a normal distribution within a dataset using summary statistics and visualization Statistics Fundamental II Explain the difference between causation vs. correlation Test a hypothesis within a sample case study Validate your findings using statistical analysis (p-values, confidence intervals) Instructor Choice Focus on a topic selected by the instructor/class in order to provide deeper insight into exploratory data analysis Unit 2: Foundations of Data Modeling Introduction to Regression Define data modeling and linear regression Differentiate between categorical and continuous variables Build a linear regression model using a dataset that meets the linearity assumption using the scikit-learn library Evaluating Model Fit Define regularization, bias, and errors metrics; Evaluate model fit by using loss functions including mean absolute error, mean squared error, root mean squared error Select regression methods based on fit and complexity Introduction to Classification Define a classification model Build a KNearest Neighbors using the scikitlearn library Evaluate and tune model by using metrics such as classification accuracy error Introduction to Logistic Regression Build a Logistic regression classification model using the scikit learn library Describe the sigmoid function, odds, and odds ratios and how they relate to logistic regression Evaluate a model using metrics such as classification accuracy error, confusion matrix, ROC AOC curves, and loss functions Communicate Results from Logistic Regression Explain the tradeoff between the precision and recall of a model and articulate the cost of false positives vs. false negatives. Identify the components of a concise, convincing report and how they relate to specific audiences stakeholders Describe the difference between visualization for presentations vs. exploratory data analysis Flexible Class Session Focus on a topic selected by the instructor class in order to provide deeper insight into data modeling Unit 3: Data Science in the Real World Decision Trees and Random Forest Describe the difference between classification and regression trees and how to interpret these models Explain and communicate the tradeoffs of decision trees vs regression models Build decision trees and random forests using the scikit-learn library Natural Language Processing Demonstrate how to tokenize natural language text using NLTK Categorize and tag unstructured text data Explain how to build a text classification model using NLTK Dimensionality Reduction Explain how to perform a dimensional reduction using topic models Demonstrate how to refine data using latent dirichlet allocation (LDA) Extract information from a sample text dataset Working with Time Series Data Explain why time series data is different than other data and how to account for it Create rolling means and plot time series data using the Pandas library Perform autocorrelation on time series data Creating Models with Time Series Data Decompose time series data into trend and residual components Validate and cross-validate data from different data sets Use the ARIMA model to forecast and detect trends in time series data The Value of Databases Describe the use cases for different types of databases Explain differences between relational databases and document-based databases Write simple select queries to pull data from a database and use within Pandas Moving Forward with your Data Science Career Specify common models used within different industries Identify the use cases for common models Discuss next steps and additional resources for data science learning Flexible Class Session Focus on a topic selected by the instructorclass in order to provide deeper insight into data science in the real world Final Presentations Present final presentation to peers, instructor, and guest panelists who will identify strengths and areas for improvement

Register for Course

Additional Sessions