CCAI9003 Artificial Intelligence
AI Safety: Catastrophe, Alignment, and Governance


Course Description

Artificial Intelligence (AI) systems are rapidly improving in capabilities. This rapid advancement poses a potentially catastrophic risk to humanity. This course is an interdisciplinary introduction to AI safety. Students will learn about (i) present and future capabilities of AI systems; (ii) the potentially catastrophic risk that AI systems pose to humanity; (iii) AI alignment, which seeks to give AI systems goals that reflect human values; (iv) AI governance, which seeks to design optimal regulations and institutions for controlling AI systems; and (v) foundational ethical and philosophical challenges in designing safe AI. Emphasizing an interdisciplinary approach, the course draws on perspectives from computer science, philosophy, political science, and complex systems theory. Through discussions, case studies, and policy briefs, students will critically analyse real-world scenarios where AI safety plays a pivotal role.

Course Learning Outcomes

On completing the course, students will be able to:

  1. Communicate effectively regarding core catastrophic risks surrounding AI systems.
  2. Understand core questions related to AI alignment.
  3. Explain key proposals in AI governance.
  4. Demonstrate competence in concepts related to safety engineering and complex systems.

Offer Semester and Day of Teaching

First semester (Wed)


Study Load

Activities Number of hours
Lectures 24
Tutorials 12
Reading / Self-study 60
Assessment: Essay / Report writing 24
Assessment: Presentation (incl preparation) 12
Total: 132

Assessment: 100% coursework

Assessment Tasks Weighting
Case analysis 20
Issue papers 20
Design proposal 20
Reflective journal 10
Group presentation 20
Tutorial discussion 10

Required Reading / Viewing

Overview of Catastrophic AI Risks

  • Hendrycks, et. al. (2024). Overview of AI Catastrophic Risk.

AI Fundamentals

Single Agent Safety

  • Carlsmith, J. (2022). Is Power-Seeking AI An Existential Risk? arXiv. From https://arxiv.org/abs/2206.13353
  • Yohan, J. J., Caldwell, L., McCoy, D. E., & Braganza, O. (2023). Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems. Behavioral and Brain Sciences, 47, e67. From https://doi.org/10.1017/S0140525X23002753

Safety Engineering, Complex Systems

  • Hendrycks, D. (2025). AI Safety, Ethics, and Society. [Chap. 5]

Beneficial AI and Machine Ethics

Collective Action Problems

  • Fearon, J. D. (1995). Rationalist Explanations for War. International Organization, 49(3), 379-414.

Governance

  • Erdil, E. & Besiroglu, T. (2023). Explosive growth from AI automation: A review of the arguments. arXiv. From https://arxiv.org/abs/2309.11690
  • Shavit, Y. (2023). What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring. arXiv. From https://arxiv.org/abs/2303.11341

Course Co-ordinator and Teacher(s)

Course Co-ordinator Contact
Professor S.D. Goldstein
School of Humanities (Philosophy), Faculty of Arts
Tel:
Email: sgold@hku.hk
Teacher(s) Contact
Professor S.D. Goldstein
School of Humanities (Philosophy), Faculty of Arts
Tel:
Email: sgold@hku.hk