Machine learning

INFO 555: Applied Natural Language Processing

Most of web data today consists of unstructured text. This course will cover the fundamental knowledge necessary to organize such texts, search them a meaningful way, and extract relevant information from them. This course will teach natural language processing through the design and development of end-to-end natural language understanding applications, including sentiment analysis (e.g., is this review positive or negative?), information extraction (e.g., extracting named entities and their relations from text), and question answering (retrieving exact answers to natural language questions such as "What is the capital of France" from large document collections). We will use several natural language processing toolkits, such as NLTK and Stanford's CoreNLP. The main programming language used in the course will be Python, but code written in Java or Scala will be accepted as well. Graduate-level requirements include implementing more complex, state-of-the-art algorithms for the three proposed projects. This will require additional reading of conference papers and journal articles.

Course Credits

Course Topics

Descriptive statistics

Machine learning

Python

Text

INFO 523: Data Mining and Discovery

This course will introduce students to the concepts and techniques of data mining for knowledge discovery. It includes methods developed in the fields of statistics, large-scale data analytics, machine learning, pattern recognition, database technology and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns. Topics include understanding varieties of data, data preprocessing, classification, association and correlation rule analysis, cluster analysis, outlier detection, and data mining trends and research frontiers. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course include laboratory exercises, with data mining case studies using data from many different resources such as social networks, linguistics, geo-spatial applications, marketing and/or psychology.

Course Credits

Course Topics

Categorical

Data management

Descriptive statistics

INFO 521: Introduction to Machine Learning

Machine learning describes the development of algorithms which can modify their internal parameters (i.e., "learn") to recognize patterns and make decisions based on example data. These examples can be provided by a human, or they can be gathered automatically as part of the learning algorithm itself. This course will introduce the fundamentals of machine learning, will describe how to implement several practical methods for pattern recognition, feature selection, clustering, and decision making for reward maximization, and will provide a foundation for the development of new machine learning algorithms.

Course Credits

Course Topics

Bayesian inference

Categorical

Descriptive statistics

Frequentist inference

INFO 510: Bayesian Modeling and Inference

Bayesian modeling and inference is a powerful modern approach to representing the statistics of the world, reasoning about the world in the face of uncertainty, and learning about it from data. It cleanly separates the notions of representation, reasoning, and learning. It provides a principled framework for combining multiple source of information such as prior knowledge about the world with evidence about a particular case in observed data. This course will provide a solid introduction to the methodology and associated techniques, and show how they are applied in diverse domains ranging from computer vision to molecular biology to astronomy. Graduate-level requirements include different exams requiring greater depth of understanding of topics, and will be assigned questions based on graduate-student specific assignments topics.

Course Credits

Course Topics

Bayesian inference

Categorical

Descriptive statistics

Machine learning

Numerical

Other Programming Language

Python

Science/Reproducibility

Visualization

PA 572: Digital Research in Politics and Policy

Quantitative methods in political science and policy research are changing rapidly. The rise of the internet has brought in new sources of text, network, geographical, image, video, and other data. Meanwhile, computing storage and processing capabilities continue to expand, while data and code sharing norms have made it so that anyone with a computer and internet connection can have access to a growing set of tools and methods for modeling and interpreting patterns. This course focuses on the extraordinary work that is emerging in politics and policy as a result of these recent advances, with a broad set of applications ranging from health and defense to environmental and agricultural policy. The course highlights current trends, challenges, and new directions for political and policy researchers in academia, government, and the private sector, focusing on how these new data sources and methodologies are being used to solve problems in social science and public policy.

Course Credits

Course Topics

Categorical

Descriptive statistics

Discrimination/Inequality

Environment/Sustainability

Ethics/Privacy

Frequentist inference

INFO 514: Computational Social Science

This course will guide students through advanced applications of computational methods for social science research. Students will be encouraged to consider social problems from across sectors, like health science, education, environmental policy and business. Particular attention will be given to the collection and use of data to study social networks, online communities, electronic commerce and digital marketing. Students will consider the many research designs used in contemporary social research and will learn to think critically about claims of causality, mechanisms, and generalization in big data studies. Graduate requirements include additional readings and a more in-depth final paper than is required at the undergraduate level.

Course Credits

Course Topics

Categorical

Data management

Descriptive statistics

Discrimination/Inequality

Ethics/Privacy

Frequentist inference