Advances in large language models recently popularized by ChatGPT represent a remarkable leap forward in language processing by machines. We invite you to join the conversation shaping the future of communication technology. What does this mean for us, how can we make the most of these advancements, and what are the risks? What research opportunities have opened up? What kinds of evaluation are called for? We will bring together a group of practitioners and experts for guided discussions, hands-on experimentation, and project critiques. If you want to join the class, please fill out this interest form and come to the first class on Wednesday, 2/8. Bring a laptop and be prepared to start experimenting!

This course will be formatted as a combination workshop and seminar. Students will engage through readings, class participation, and project work. Students may choose to either complete a project or produce a research project proposal. For the active project track, students will form teams, pitch projects, and get feedback along the way. For the project proposal track, they will present a literature review mid-semester, and submit a written research project proposal. Project should be focused on one of the main areas identified by the course. We will come together to share and critique projects through the semester, culminating in final project presentations. Students will also be expected to present to the class on readings and hands-on workshop output.



The current class schedule is below (subject to change)

Date Description Course Materials
Feb 8 Part 1: Background on LLMs [Slides]
  1. Introduction and motivation
  2. Class structure and logistics
  3. Language modeling overview
    1. Definitions
    2. A brief history of LMs
    3. LLM fundamentals
  4. Overview of ways to train, tune, and prompt LLMs
    1. Fine-tuning, zero-shot prompts, few-shot prompts, chain-of-thought prompts
    2. Prompt tuning
  5. Examples of LLM prompts
Part 2: Get your hands dirty with ChatGPT [Slides]
Create a prompting task in groups.
Required Readings:
  1. Percy Liang's introduction to LLMs
Feb 15 Part 1: Evaluating models [Slides]
How can we best evaluate these models for accuracy, fairness, bias, robustness, and other factors?

Speaker: Rishi Bommasani (Stanford)
Title: Holistically Evaluating Language Models on the Path to Evaluating Foundation Models

Part 2: LLMs in Applications [Slides]
People are increasingly interacting with human-facing tools that incorporate LLMs, like ChatGPT, writing assistants, and character generators. How might we go about evaluating these systems and their impacts on people? In this session we will consider 10 recent commercial and research applications of LLMs. Students will be asked to come prepared to critique the designs of one of these applications along different dimensions that we will describe in week 1.
Required Readings:
  1. Holistic Evaluation of Language Models (HELM)
Recommended Readings:
  1. On the Opportunities and Risks of Foundation Models
  2. Discovering Language Model Behaviors with Model-Written Evaluations
  3. All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text
  4. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
  5. How to do human evaluation: A brief introduction to user studies in NLP
  6. Dynabench: Rethinking Benchmarking in NLP
  7. Word Embeddings Quantify 100 years of Gender and Ethnic Stereotypes
Feb 22 Part 1: Using LLMs for Consensus Across Preferences [Slides]

Speaker: Michiel Bakker (DeepMind)
Title: Fine-tuning Language Models to Find Agreement among Humans with Diverse Preferences

Part 2: Project Pitch [Slides]
Students present their project idea and form teams.
Required Readings:
  1. Fine-tuning Language Models to Find Agreement among Humans with Diverse Preferences
  2. Engaging Politically Diverse Audiences on Social Media
Mar 1 Part 1: Emergent Abilities of LLMs [Slides]
This talk will cover broad intuitions about how large language models work. First, we will begin by examining some examples of what language models can learn by reading the internet. Second, we will consider why language models have gained traction recently and what new abilities they have that were not present in the past. Third, we will cover how language models can perform complex reasoning tasks. Finally, the talk will discuss how language models can have an improved user interface via instruction following.

Speaker: Jason Wei (OpenAI)
Title: Emergence in Large Language Models

Part 2: NLP Evaluation Methods and Red Teaming [Slides]
Required Readings:
  1. Emergent Abilities of Large Language Models
  2. Chain of Thought Prompting Elicits Reasoning in Large Language Models
  3. Scaling Instruction-Finetuned Language Models
Recommended Readings:
  1. Dissociating Language and Thought in Large Language Models: A Cognitive Perspective
  2. Discovering Latent Knowledge in Language Models Without Supervision
  3. Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
  4. Learning to Summarize with Human Feedback
Mar 8 Part 1: Evaluating Human-model Interactions [Slides]

Speaker: Mina Lee (Stanford)
Title: Designing and Evaluating Language Models for Human Interaction

Part 2: Human Experiments and Evaluation Methods [Slides]
Required Readings:
  1. Evaluating Human-Language Model Interaction
  2. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities
Recommended Readings:
  1. Power to the People? Opportunities and Challenges for Participatory AI
Mar 15 Media Lab Research Panel

Speaker: Ziv Epstein (PhD Student at MIT Media Lab, Human Dynamics)
Title: Social Science Methods for Understanding Generative AI

Speaker: Matt Groh (PhD Student at MIT Media Lab, Affective Computing)
Title: Deepfake Detection

Speaker: Trudy Painter (UROP/MEng at MIT Media Lab, Viral Communications)
Title: Latent Lab: Generative ML as an Exploration Partner

Speaker: Belén Saldias Fuentes (PhD Student at MIT Media Lab, MIT Center for Constructive Communication)
Title: Community-aligned Content Moderation with Rationale Generation

Speaker: Hang Jiang (PhD Student at MIT Media Lab, MIT Center for Constructive Communication)
Title: CommunityLM: Probing Partisan Worldviews from Language Models
Related Readings:

Ziv Epstein:
  1. Who Gets Credit for AI-Generated Art?
  2. Deceptive AI Systems That Give Explanations Are Just as Convincing as Honest AI Systems in Human-Machine Decision Making
Matthew Groh:
  1. Deepfake Detection by Human Crowds, Machines, and Machine-informed Crowds
  2. Human Detection of Political Deepfakes across Transcripts, Audio, and Video
Trudy Painter:
  1. Latent Lab
Belén Saldías Fuentes:
  1. Human-AI Collaboration for Content Curation @ Reddit
Hang Jiang:
  1. CommunityLM: Probing Partisan Worldviews from Language Models
  2. Relevant work from Eric Chu (CCC): Language Models Trained on Media Diets Can Predict Public Opinion
Mar 22 Part 1: AI-Mediated Communication [Slides]
This talk will discuss the phenomenon of AI-Mediated Communication (AI-MC) and its potential impact on human communication outcomes, language use, and interpersonal trust. The author outlines early experimental findings showing that AI involvement can shift written content and opinions, change message ownership, impact blame assignment, and affect trust evaluations, highlighting the need for new approaches to the development and deployment of these technologies.

Speaker: Mor Naaman (Cornell Tech)
Title: "My AI must have been broken": Understanding our Future of AI-Mediated Communication

Part 2: Discussion on the public policies on AI-generated content. [Slides]
Required Readings:
  1. Human Heuristics for AI-Generated Language Are Flawed
  2. AI-Mediated Communication: How the Perception that Profile Text was Written by AI Affects Trustworthiness
Recommended Readings:
  1. Interacting with Opinionated Language Models Changes Users’ Views
  2. Artificial Intelligence Can Persuade Humans on Political Issues
Mar 29 Break
Apr 5 Part 1: LLMs as Simulated Agents [Slides]

Speaker: John Horton (MIT)
Title: Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?

Part 2: Discussion on the call for 6-month AI morotorium: "Pause Giant AI Experiments: An Open Letter". [Slides]
Required Readings:
  1. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?
Recommended Readings:
  1. Language Models as Agent Models
  2. Out of One, Many: Using Language Models to Simulate Human Samples
  3. Quantifying the Narrative Flow of Imagined versus Autobiographical Stories
  4. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
  5. Generative Agents: Interactive Simulacra of Human Behavior
  6. Can AI Language Models Replace Human Participants?
  7. Social Simulacra: Creating Populated Prototypes for Social Computing Systems
  8. Want To Reduce Labeling Cost? GPT-3 Can Help
Apr 12 Societal Impacts of LLMs [Slides] Required Readings:
  1. Anatomy of an AI System
  2. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
  3. Lessons from the GPT-4chan Controversy
  4. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Recommended Readings:
  1. Foundation Models and Fair Use
  2. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
  3. Ethical and Social Risks of Harm from Language Models
  4. GPT-4 Chan Controversy
  5. Evaluating Verifiability in Generative Search Engines
  6. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
  7. What Happens When ChatGPT Starts to Feed on Its Own Writing?
Apr 19 Risks and Tools for Transparency [Slides]

Required Readings:
  1. Auditing Large Language Models: A Three-layered Approach
  2. Using Algorithm Audits to Understand AI
  3. Google denies Bard was trained with ChatGPT data
  4. Assessing the Risks of Language Model “Deepfakes” to Democracy
Recommended Readings:
  1. How ChatGPT Hijacks Democracy
  2. How generative AI impacts democratic engagement
  3. A Watermark for Large Language Models
  4. Extracting Training Data from Large Language Models
  5. Locating and Editing Factual Associations in GPT
  6. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
  7. Datasheets for Datasets
April 26 Final project presentations I
May 3 Final project presentations II
May 10 No Class (work on final papers)
May 17 Project Submission Deadline