ByungKoo Kim

## [Summer 2024] Machine Learning for Social Scientists

This course provides an overview of machine learning methods and their applications to social science research. In this course, we will learn the basic framework of machine learning, popular machine learning methods -- supervised, semi-supervised, and unsupervised learning -- and their social science applications. The topics covered in this course include the bias-variance trade-off, the curse of dimensionality, decision trees, random forests, naive Bayes, boosting models, support vector machines, PCA, k-nearest neighbors, hierarchical clustering, kernel-based clustering techniques, and topic models for text data.

## [Summer 2024] Text Analysis for Social Scientists

In this course, we learn statistical/computational theories and tools for text analysis. The course has three parts. First, we will cover statistical/computational preliminaries such as Dirichlet distribution, multinomial distribution, and cosine similarity. The second part of the course is dedicated to learning core concepts and frameworks of text analysis. In this part, we learn how to represent text data for quantitative analysis. Specifically, students will be introduced to concepts such as tokens, document-feature matrix, bag-of-words, word embed- ding, tf-idf, and etc. Finally, we cover statistical/computational models for text data. These include LDA-based methods, word2vec, and a soft introduction of large language models such as ChatGPT.

## [Spring 2024] R Programming for Social Scientists

This course provides introductory and intermediate-level practice on R programming for Social Science applications. The topics include data cleaning, data wrangling, data visualization, tools for data analysis, loops, basic web scraping, and etc.

## [Fall 2024] Statistical Foundations for Data Science

This course covers essential mathematical and statistical theories for data science. The topics include integral, probability theory, vector space, eigenvalue, gradient, and numerical optimization. Together with R Programming for Social Scientists, this course serves as one of the prerequisites for other data science courses such as Machine Learning for Social Scientists and Text Analysis for Social Scientists.