Episode 18 - John Urschel

Kevin Knudson: Welcome to My Favorite Theorem. I’m your host Kevin Knudson, professor of mathematics at the University of Florida. I’m joined by your cohost.

Evelyn Lamb: Hi, I’m Evelyn Lamb. I’m a math and science writer in Salt Lake City, Utah, where it is very cold now, and so I’m very jealous of Kevin living in Florida.

KK: It’s a dreary day here today. It’s raining and it’s “cold.” Our listeners can’t see me doing the air quotes. It’s only about 60 degrees and rainy. It’s actually kind of lousy. but it’s our department holiday party today, and I have my festive candy cane tie on, and I’m good to go. And I’m super excited.

John Urschel: So I haven’t been introduced yet, but can I jump in on this weather conversation? I’m in Cambridge right now, and I must say, I think it’s probably nicer in Cambridge, Massachusetts than it is in Utah right now. It’s a nice breezy day, high 40s, low 50s, put on a little sweater and you’re good to go.

EL: Yeah, I’m jealous of both of you.

KK: Evelyn, I don’t know about you, but I’m super excited about this one. I mean, I’m always excited to do these, but it’s the rare day you get to talk to a professional athlete about math. This is really very cool. So our guest on this episode is John Urschel. John, do you want to tell everyone about yourself?

JU: Yes, I’d be happy to. I think I might actually be the only person, the only professional athlete you can ask high-level math about.

KK: That might be true. Emily Riehl, Emily Riehl counts, right?

EL: Yeah.

KK: She’s a category theorist at Johns Hopkins. She’s on the US women’s Australian rules football team.

EL: Yeah,

JU: Australian rules football? You mean rugby?

KK: Australian rules football is like rugby, but it’s a little different. See, you guys aren’t old enough. I’m old enough to remember ESPN in the early days when they didn’t have the high-end contracts, they’d show things like Australian rules football. It’s fascinating. It’s kind of like rugby, but not really at the same time. It’s very weird.

JU: What are the main differences?

EL: You punch the ball sometimes.

KK: They don’t have a scrum, but they have this thing where they bounce the ball really hard. (We should get Emily on here.) They bounce the ball up in the air, and they jump up to get it. You can run with it, and you can sort of punch the ball underhanded, and you can kick it through these three posts on either end [Editor's note: there are 4 poles on either end.]. It’s sort of this big oval-shaped field, and there are three poles at either end, and you try to kick it. If you get it through the middle pair, that’s a goal. If you get it on either of the sides, that’s called a “behind.” The referees wear a coat and tie and a little hat. I used to love watching it.

JU: Wait, you say the field is an oval shape?

KK: It’s like an oval pitch, yeah.

JU: Interesting.

KK: Yeah. You should look this up. It’s very cool. It is a bit like rugby in that there are no pads, and they’re wearing shorts and all of that.

JU: And it’s a very continuous game like rugby?

KK: Yes, very fast. It’s great.

JU: Gotcha.

KK: Anyway, that’s enough of us. You didn’t tell us about yourself.

JU: Oh yeah. My name is John Urschel. I’m a retired NFL offensive lineman. I played for the Baltimore Ravens. I’m also a mathematician. I am getting my Ph.D. in applied math at MIT.

KK: Good for you.

EL: Yeah.

KK: Do you miss the NFL? I don’t want to belabor the football thing, but do you miss playing in the NFL?

JU: No, not really. I really loved playing in the NFL, and it was a really amazing experience to be an elite, elite at whatever sport you love, but at the same time I’m very happy to be focusing on math full-time, focusing on my Ph.D. I’m in my third year right now, and being able to sort of devote more time to this passion of mine, which is ideally going to be my lifelong career.

EL: Right. Yeah, so not to be creepy, but I have followed your career and the writing you’ve done and stuff like that, and it’s been really cool to see what you’ve written about combining being an athlete with being a mathematician and how you’ve changed your focus as you’ve left playing in the NFL and moved to doing this full-time. It’s very neat.

KK: So, John, what’s your favorite theorem?

JU: Yes, so I guess this is the name of the podcast?

KK: Yeah.

JU: So I should probably give you a theorem. So my favorite theorem is a theorem by Batson, Spielman, and Srivastava.

EL: No, I don’t. Please educate us.

JU: Good! So this is perfect because I’m about to introduce you to my mathematical idol.

KK: Okay, great.

JU: Pretty much who I think is the most amazing applied mathematician of this generation, Dan Spielman at Yale. Dan Spielman got his Ph.D. at MIT. He was advised by Mike Sipser, and he was a professor at MIT and eventually moved to Yale. He’s done amazing work in a number of fields, but this paper, it’s a very elegant paper in applied math that doesn’t really have direct algorithmic applications but has some elegance. The formulation is as follows. So suppose you have some graph, vertices and edges. What I want to tell you is that there exists some other weighted graph with at most a constant times the order of the graph number of edges, so linear in number of edges with respect to vertices, that approximates the Laplacian of this original very dense graph, no matter how dense it is.

So I’m doing not the very best job of explaining this, but let me put it like this. You have a graph. It’s very dense. You have this elliptic operator on this graph, and there’s somehow some way to find a graph that’s not dense at all, but extremely, extremely sparse, but somehow with the exact, well not exact, but nearly the exact same properties. These operators are very, very close.

KK: Can you remind our reader—readers, our listeners—what the Laplacian is?

JU: Yeah, so the graph Laplacian, what you can do, the way I like to introduce it, especially for people not in graph theory type things, is you can define a gradient on a graph. You take every edge, directed in some way, and you can think of the gradient as being a discrete derivative along the edge. And now, as in the continuous case, you take this gradient, you get your Laplacian, and the same way you get a Laplacian in the continuous case, this is how you get your graph Laplacian.

KK: This theorem, so the problem is that dense graphs are kind of hard to work with because, well, they’re dense?

EL: So can I jump in? Dense meaning a lot of edges, I assume?

JU: Lots of edges, as many edges as you want.

KK: So a high degree on every vertex.

JU: Lots of edges, edges going everywhere.

EL: And then with the weighting, that might also mean something like, not that many total edges, but they have a high weight? Does that also make it dense, or is that a different property?

JU: No, in that case, we wouldn’t really consider it very dense.

KK: But the new graph you construct is weighted?

JU: And the old graph can be weighted as well.

KK: All right. What do the weights tell you?

JU: What do you mean?

KK: On the new graph. You generate this new graph that’s more sparse, but it’s weighted. Why do you want the weights? What do the weights get you?

JU: The benefit of the weights is it gives you additional leeway about how you’re scaling things because the weights actually come into the Laplacian because for weighted graphs, when you take this Laplacian, it’s the difference between the average of each node, of all its neighbors, and the node, in a way, and the weights tell you how much each edge counts for. In that way, it allows you greater leeway. If you weren’t able to weight this very sparse graph, this wouldn’t work very well at all.

KK: Right, because like you said, you think of sort of having a gradient on your graph, so this new graph should somehow have the same kind of dynamics as your original.

JU: Exactly. And the really interesting thing is that you can capture these dynamics. Not only can you capture them, but you can capture them with a linear number of edges, linear in the order of the graph.

KK: Right.

JU: So Dan Spielman is famous for many things. One of the things he’s famous for is he was one of the first people to give provable guarantees for algorithms that can solve, like, a Laplacian system of equations in near-linear time, so O(n) plus some logs. From his work there have been many, many different sorts of improvements, and this one is extremely interesting to me because you only use a linear number of edges, which implies that this technique, given this graph you have should be extremely efficient. And that’s exactly what you want because it’s a linear number of edges, you apply this via some iterative algorithm, and you can use this guy as a sort of preconditioner, and things get very nice. The issue is, I believe—and it has been a little bit since I’ve read the paper—I believe the amount of time it takes to find this graph, I think is cubic.

EL: Okay.

JU: So it’s not a sort of paper where it’s extremely useful algorithmically, I would say, but it is a paper that is very beautiful from a mathematical perspective.

KK: Has the algorithm been improved? Has somebody found a better than cubic way to generate this thing?

JU: Don’t quote me on that, I do not know, but I think that no one has found a good way yet. And by good I mean good enough to make it algorithmically useful. For instance, if the amount of time it takes to find this thing is quadratic, or even maybe n to the 1.5 or something like that, this is already not useful for anything greater than near-linear. It’s a very interesting thing, and it’s something that really spoke to me, and I really just fell in love with it, and I think what I like about it most is that it is a very sort of applied area, and it is applied mathematics, theoretical computer science type things, but it is very theoretical and very elegant. Though I am an applied mathematician, I do like very clean things. I do like very nice looking things. And perhaps I can be a bad applied mathematician because I don’t always care about applications. Which kind of makes you a bad applied mathematicians, but in all my papers I’m not sure I’ve ever really, really cared about the applications, in the sense that if I see a very interesting problem that someone brings to me, and it happens to have, like some of the things I’ve gotten to do in machine learning, great, this is like the cherry on top, but that isn’t the motivating thing. If it’s an amazing application but some ugly, ugly thing, I’m not touching it.

EL: Well, before we actually started recording, we talked a little bit about how there are different flavors of applied math. There are ones that are more on the theoretical side, and probably people who do a lot of things with theoretical computer science would tend towards that more, and then there are the people who are actually looking at a biological system and solving differential equations or something like this, where they’re really getting their hands dirty. It sounds like you’re more interested in the theoretical side of applied math.

JU: Yeah.

KK: Applied math needs good theory, though.

JU: That’s just true.

KK: You’ve got to develop good theory so that you know your algorithms work, and you want them to be efficient and all that, but if you can’t prove that they actually work, then you’re a physicist.

JU: There’s nothing I hate more than heuristics. But heuristics do have a place in this world. They’re an important thing, but there’s nothing I dislike more in this world than doing things with heuristics without being able to give any guarantees.

EL: So where did you first encounter this theorem? Was it in the research you’ve been doing, the study you’ve been doing for your Ph.D.?

JU: Yes, I did encounter this, I think it was when I was preparing for my qualifying exams. I was reading a number of different things on so-called spectral graph theory, which is this whole field of, you have a graph and some sort of elliptic operator on it, and this paper obviously falls under this category. I saw a lecture on it, and I was just fascinated. You know it’s a very nice result when you hear about it and you’re almost in disbelief.

KK: Right.

JU: I heard about it and I thought I didn’t quite hear the formulation correctly, but in fact I did.

KK: And I seem to remember reading in Sports Illustrated — that’s an odd sentence to say — that you were working on some version of the traveling salesman problem.

JU: That is true. But I would say,

KK: That’s hard.

JU: Just because I’m working on the asymmetric traveling salesman problem does not mean you should be holding your breath for me to produce something on the traveling salesman problem. This is an interesting thing because I am getting my Ph.D., and you do want, you want to try to find a research project where yes, it’s tough and it’s challenging you, but at the end of your four or five years you have something to show for it.

KK: Right. Is this version of the problem NP-hard?

JU: Yes, it is. But this version, there isn’t any sort of inapproximability result as in some of the other versions of TSP. But my advisor Michele Gomez [spelling], who—for the record, I’m convinced I have the single best advisor in the world, like he is amazing, amazing. He has a strong background in combinatorial optimization, which is the idea that you have some set of discrete objects. You need to pick your best option when the number of choices you have is often not polynomial in the size of your input. But you need to pick the best option in some reasonable amount of time that perhaps is polynomial.

EL: Yeah, so are these results that will say something like, we know we can get within 3 percent of the optimal…

JU: Exactly. These sorts of things are called approximation algorithms. If it runs in polynomial time and you can guarantee it’s within, say, a constant factor of the optimal solution, then you have a constant approximation algorithm. We’ve been reading up on some of the more recent breakthroughs on ATSP. There was a breakthrough this August someone proved the first constant approximation algorithm for the asymmetric traveling salesman problem, and Michele Gomez, who also is the department head at MIT of math, he had the previous best paper on this. He had a log log approximation algorithm from maybe 2008 or 2009, but don’t quote me on this. Late 2000s, so this is something we’ve been reading about and thinking about.

EL: Trying to chip away a little bit at that.

JU: Exactly. It’s interesting because this constant approximation algorithm that came out, it used this approach that, I think Michele won’t mind me saying this, it used an approach that Michele didn’t think was the right way to go about it, and so it’s very interesting. There are different ways to construct an approximation algorithm. At its core, you have something you’re trying to solve, and this thing is hard, but now you have to ask yourself, what makes it hard? Then you need to sort of take one of the things that makes it hard and you need to loosen that. And his approach in his previous paper was quite different than their approach, so it’s interesting.

KK: So the other thing we like to do on this show is to ask our guest to pair their theorem with something. So what have you chosen to pair your theorem with?

JU: I still haven’t fully thought about this, but you’ve put me on the spot, and so I’m going to say this: I would pair this with, I think this is a thing, Miller 64. That’s a thing, right?

KK: This is a beer?

JU: Yeah, the beer.

KK: It’s a super low-calorie beer?

JU: It’s a beer, and they advertise it on TV.

KK: I see, it’s very sparse.

JU: People weightlifting, people running, and then drinking a 64-calorie beer. It’s the beer for athletes.

EL: Okay.

JU: I think it’s a very, very good beer because it at least claims to taste like a beer, be very much like a beer, and yet be very sparse.

EL: Okay, so it’s, yeah, I guess I don’t know a good name for this kind of graphs, but it’s this graph of beers.

JU: Yes, it’s like, these things are called spectral sparsifiers.

EL: Okay, it’s the spectral sparsifier of beers.

KK: That’s it.

EL: So they’ve used the “Champagne of beers” slogan before, but I really think they should switch to the “spectral sparsifier of beers.” That’s a free idea, by the way, Miller, you can just take that.

JU: Hold on.

KK: John’s all about the endorsements, right?

JU: Let’s not start giving things away for free now.

KK: John has representation.

EL: That’s true.

JU: We will give this to you guys, but you need to sponsor the podcast. This needs to be done.

EL: Okay. I’m sure if they try to expand their market share of mathematicians, this will be the first podcast they come to.

KK: That’s right. So hey, do you want to talk some smack? Were you actually the smartest athlete in the NFL?

JU: I am not the person to ask about that.

KK: I knew you would defer.

JU: Trust me, I’ve gone through many, many hours of media training. You need something a little more high-level to catch me than that.

KK: I’m sure. You know, I wasn’t really trying to catch you. You know, Aaron Rodgers looked good on Jeopardy. I don’t know if you saw him on Celebrity Jeopardy a couple years ago.

JU: No.

KK: He won his game. My mother—sorry—was a huge Packers fan. She grew up near Green Bay, and she loved Aaron Rodgers, and I think she recorded that episode of Jeopardy and watched it all the time.

JU: I was invited to go on Family Feud once, the celebrity Family Feud.

KK: Yeah?

JU: But I don’t know why, but I wasn’t really about that life. I wasn’t really into it.

KK: You didn’t want Steve Harvey making fun of you?

JU: Also, I’m not sure I’m great at guessing what people think.

EL: Yeah.

JU: That’s not one of my talents.

EL: Finger isn’t on the pulse of America?

JU: No, my finger is not on the pulse. What do people, what’s people’s favorite, I can’t even think of a question.

EL: Yeah.

KK: Well, John, this has been great. Thanks for joining us.

JU: Thanks for having me. I can say this with certainty, this is my second favorite podcast I have ever done.

KK: Okay. We’ll take that. We won’t even put you on the spot and ask you what the favorite was. We won’t even ask.

JU: When I started the sentence, know that I was going to say favorite, and then I remembered that one other. I’ve done many podcasts, and this is one of my favorites. It’s a fascinating idea, and I think my favorite thing about the podcast is that the audience is really the people I really like.

KK: Thanks, John.

EL: Thanks for being here.

[end stuff]