Speaker: Andrej Risteski, Associate Professor, Machine Learning Department, Carnegie Mellon University
Registration for all CUID holders is preferred. If you do not have an active CUID, registration is required and is due at 12:00 PM the day prior to the seminar. Unfortunately, we cannot guarantee entrance to Columbia’s Morningside campus if you register following 12:00 PM the day prior to the seminar. Thank you for understanding!
REGISTER
Title: Architectural Choices in Scientific ML: A View Through the Lens of Theory
Abstract:
In deep learning, small architectural changes—such as residual connections or normalization layers—have often had outsized impact. This talk examines how similar effects arise in recent applications of deep learning to the sciences. The central theme is that the architectural changes we identify are not suggested by current benchmarks, which remain much less mature than they are in image or language domains. Instead, they become visible through the right theoretical lenses. We will showcase several vignettes spanning graph neural networks (GNNs), time-dependent partial-differential equations (PDEs), and steady-state PDEs.
The first setting concerns graphs with bottlenecks or hubs: augmenting GNNs with edge-level state yields (provable) gains under constraints on depth and memory. We establish this using techniques from time–space tradeoffs in theoretical computer science, and show that neither “symmetry-only” theoretical accounts nor standard GNN benchmarks would detect this separation. The next setting concerns time-dependent PDEs, where adding an explicit memory layer via state-space models (e.g. S4) has negligible effect under full observability, but substantial impact under partial observation. This kind of phenomenon is predicted by Mori–Zwanzig theory—which also inspired the architectural change. Finally, in steady-state PDEs and operator learning, we show that Deep Equilibrium Model (DEQ)-based architectural changes have efficiency and robustness benefits. Here, the design is motivated by representation-theoretic constructions that simulate “unrolled” gradient descent in function space.