WEBVTT
00:00:00.000 --> 00:00:00.000
Okay, maybe it's a good time to time to get started.
00:00:00.000 --> 00:00:04.000
So, do you want to start sharing your slides.
00:00:04.000 --> 00:00:05.000
Yeah.
00:00:05.000 --> 00:00:13.000
Let me share my screen.
00:00:13.000 --> 00:00:15.000
Great.
00:00:15.000 --> 00:00:17.000
Okay so hello everyone.
00:00:17.000 --> 00:00:30.000
So today will be the first CDs lunch seminar, you know, typically this used to be kind of a free lunch situation where you go to CDs and you get some lunch unfortunately we're still bound to doing this on zoom.
00:00:30.000 --> 00:00:36.000
But one of the benefits that we get to invite a lot of more diverse speakers from across the world and, at least the country.
00:00:36.000 --> 00:00:47.000
And so today we're excited to have legal la from Stanford, who will talk about conformal inference, and counterfactual and of counterfactual and individual treatment effects.
00:00:47.000 --> 00:00:52.000
So likewise right now postdoc at Stanford working with the manual canvas.
00:00:52.000 --> 00:01:00.000
And he's been there for the past couple years and before that he was a PhD student at Berkeley, working with Peter pickle and Michael Jordan.
00:01:00.000 --> 00:01:03.000
So Leo, the floor is yours.
00:01:03.000 --> 00:01:12.000
Thank you so much for the nice introduction and so thanks for having me in the seminar, and it's my great pleasure to share one of my recent work on for formatting first.
00:01:12.000 --> 00:01:17.000
And this is a joint work with my postdoc advisor Emmanuel Candace.
00:01:17.000 --> 00:01:24.000
So, by the way. During talk. If you have any question please feel free to interrupt me.
00:01:24.000 --> 00:01:42.000
So nowadays were deploying machine learning tools in many areas, including those high stakes to the show making like self driving cars disease diagnosis, so essential question for statisticians to ask is that can we have reliable surgeon quantification.
00:01:42.000 --> 00:01:48.000
We're confident in this predictions, because they're making critical decisions.
00:01:48.000 --> 00:01:53.000
And it's important to have confidence.
00:01:53.000 --> 00:02:03.000
The one problem is today's predictive algorithms are super complicated, and they could be things like random forest gradient boosting where neural nets.
00:02:03.000 --> 00:02:11.000
And they're too complicated to analyze compared to many classical statistical tactics.
00:02:11.000 --> 00:02:29.000
So while there are many techniques. There are many attempts to open the black box and analyze this out resumes, the opening involves significant simplification of this algorithms, while you practice, you usually have a very complicated processing steps,
00:02:29.000 --> 00:02:41.000
and also the tuning process so so complicated. And in this case, there will be a gap between the end, analysis, and the real performance of this algorithms.
00:02:41.000 --> 00:02:45.000
But still, we want to get confidence.
00:02:45.000 --> 00:02:53.000
Fortunately, there's something called conformity in France, which can provide certain kind of absurdity qualification.
00:02:53.000 --> 00:03:08.000
So roughly speaking, conforming France is wrapping up a predictive model, and then turn that and pass that into a protection Lee and turn that into an interest rate.
00:03:08.000 --> 00:03:25.000
So more specifically, this conforming inference will take it simple, Id training samples, and then give it a test sample with only the coherent conforming inference is trying to estimate the missing outcome.
00:03:25.000 --> 00:03:31.000
But the estimate is in the form of an interview. ready down point
00:03:31.000 --> 00:03:33.000
unconquerable.
00:03:33.000 --> 00:03:53.000
It will try to construct a predictive interval which depends on the Corvair value. So it's covered dependent, such that this interval estimate covers future outcome why was property 90%, where the randomness code from both x and y, and the training set.
00:03:53.000 --> 00:04:06.000
So what's amazing about conforming first is that this guarantee can be achieved in finance samples, without any distribution, without any assumption on the distribution of x and y.
00:04:06.000 --> 00:04:19.000
And without any assumption on the predictive algorithm, meaning that you can use rain forest you can use gradient boosting, you can use newer net, and you can use arbitrarily complex tuning process, and that's fine.
00:04:19.000 --> 00:04:28.000
And this procedure will wrap around the black box and turn that into some Valley interval.
00:04:28.000 --> 00:04:39.000
But, as you can see here a implicit assumption is that the outcome why I should be observable for every training sample.
00:04:39.000 --> 00:04:44.000
But in modern science, we often move from the factors to counter factors. And this counterfactual reasoning has became ubiquitous in modern science.
00:04:44.000 --> 00:04:52.000
And this counterfactual reasoning has became ubiquitous in modern science. So for example, it's mostly study in calling friends.
00:04:52.000 --> 00:04:59.000
And they're the counterfactual means, what would have been once response, had been taken the treatment.
00:04:59.000 --> 00:05:09.000
And it's also been popular in machine learning, like offline policy evaluation algorithmic fairness, explain what machine learning.
00:05:09.000 --> 00:05:18.000
So kind of factor is such a important concept, but the problem is that counterfactual by definition, cannot be observed for everybody.
00:05:18.000 --> 00:05:23.000
So we're in a situation where the why eyes are not fully observer.
00:05:23.000 --> 00:05:30.000
And in this case, we cannot directly apply conforming for. So we have to do some adjustment.
00:05:30.000 --> 00:05:34.000
And that will be the central topic of this talk.
00:05:34.000 --> 00:05:42.000
And of course, kind of factor is a very philosophical concept, and there are many frameworks to define this.
00:05:42.000 --> 00:05:48.000
And just to set up the stage, we will focus on the potential outcome framework in this talk.
00:05:48.000 --> 00:05:57.000
Although if you are an expert in causal inference. You might know that there are many other frameworks like causal diagrams invariant prediction, etc.
00:05:57.000 --> 00:06:08.000
and turns out our methods can work on all of them, but just to make things more concise, and more precise, I will focus on the potential outcomes.
00:06:08.000 --> 00:06:16.000
So, the potential outcomes can be characterized by this table, which is called science table by don't Ruby.
00:06:16.000 --> 00:06:19.000
And just to illustrate, suppose we have 10 units.
00:06:19.000 --> 00:06:22.000
Five credit and five control.
00:06:22.000 --> 00:06:26.000
Then we can observe all of their Coverity values.
00:06:26.000 --> 00:06:30.000
And this are often called pretreatment covariance.
00:06:30.000 --> 00:06:34.000
And then we can observe their treatment assignment.
00:06:34.000 --> 00:06:48.000
And we will assume that's for the treaty units, we can observe them why one well for the control units we can observe their wise, zero yyy zero is a pair of potential outcomes for each unit.
00:06:48.000 --> 00:06:58.000
So you can conceptualize this by considering to parallel universes. And in each of them, they are identical 10 units there.
00:06:58.000 --> 00:07:03.000
And the only thing different is that their trauma assignments are different.
00:07:03.000 --> 00:07:14.000
So then, there will be two outcomes into power universes and they're caught potential, but in the real world, in our universe.
00:07:14.000 --> 00:07:28.000
Each unit can be only assigned into one group. So there's only one observed outcome, and we assume that is equal to, why one, if this treated and the y zero if it is in control.
00:07:28.000 --> 00:07:41.000
So you can see that this is already as embed embedding some assumption. So for example, we assume that for each unit. There are only two potential outcomes, depending on her own treatment assignment.
00:07:41.000 --> 00:07:55.000
And this exclusive, the case where my outcome can depends on other people's assignment. For example, like when we are in a network, and people can interact in a very complicated way.
00:07:55.000 --> 00:07:59.000
So, this science table has already simplify that.
00:07:59.000 --> 00:08:07.000
And this, in the literature is sometimes called super stable unit of value assumption.
00:08:07.000 --> 00:08:18.000
And the other two assumptions were going to make RDI the assumption which assumes that this quadruple XITIY one and y zero or ID.
00:08:18.000 --> 00:08:32.000
And the other one is called strong normal the were uncomfortable uncomfortableness in some other areas. So it basically said that conditioning on the covariance, we have a completely randomized experiment.
00:08:32.000 --> 00:08:37.000
So I would like to emphasize that all the three assumptions are strong.
00:08:37.000 --> 00:08:39.000
But on the other hand, there are standard.
00:08:39.000 --> 00:08:51.000
So there are many ways in the classical causal inference literature, trying to relax this, like, for the first one there are finite population analysis, were designed based on analysis, and for the second one.
00:08:51.000 --> 00:09:05.000
There's a recent literature on interference from the Fifth Third one down many attempts, including some no standard identification strategies or sensitivity analysis.
00:09:05.000 --> 00:09:15.000
So, I think all of the three assumptions can be can also be relaxed under conform inference, and actually there are some ongoing works that trying to relax, some of them.
00:09:15.000 --> 00:09:26.000
But in this talk, you can take this as a starting point. So I will focus on the simplified assumptions, just to make things clear.
00:09:26.000 --> 00:09:33.000
And based on this notation, we can define our goal as the falls. So our goal is to find an interval mastery.
00:09:33.000 --> 00:09:46.000
And again, it's covered the patent, such that it covers by one for future unit, who is in the control group was probably 90%.
00:09:46.000 --> 00:09:49.000
So, let me parse this a bit.
00:09:49.000 --> 00:09:57.000
So first, this conditioning event is conditioning on that this person is in the control.
00:09:57.000 --> 00:10:08.000
And by that, we can see this why one would be the missing counterfactual, so we cannot observe why one for control units, by definition, so that's why this is a content factory.
00:10:08.000 --> 00:10:18.000
But in general later all. We will show that this conditional conditional event is not a session. So you can define it in many different ways.
00:10:18.000 --> 00:10:23.000
And I choose this in this talk, just to make things clear.
00:10:23.000 --> 00:10:25.000
And the second.
00:10:25.000 --> 00:10:35.000
When you see this criteria, you might connect this to confidence interval for Strictly speaking, and it's very different from confidence intervals.
00:10:35.000 --> 00:10:48.000
Because, first, this interval is very dependent. So it's not something like a fixed interval. And more importantly, are influential target why one is a random variable.
00:10:48.000 --> 00:10:56.000
Instead of a parameter. So the confidence interval is to cover a parameter, but here we try to cover a random variable.
00:10:56.000 --> 00:11:05.000
So in the literature This is often called prediction and hour ago it's pretty, pretty much in the similar spirit to that.
00:11:05.000 --> 00:11:11.000
The final this 90% is not a set shop. So I just use this number.
00:11:11.000 --> 00:11:14.000
To illustrate the procedures.
00:11:14.000 --> 00:11:25.000
So maybe let me take a brief pause here to see if there's any question on the philosophical side of this.
00:11:25.000 --> 00:11:28.000
Great. So if there's no question.
00:11:28.000 --> 00:11:35.000
Go to move on. So before talking about our methods are to talk about the data generated practices.
00:11:35.000 --> 00:11:47.000
So here, let's say, suppose we have 10 units. And then each of them will have a propensity score defined as the property have been treated, given the Corvair value.
00:11:47.000 --> 00:11:55.000
And for each unit here, we will toss a unfair coin to decide better to get this person treated or control.
00:11:55.000 --> 00:11:59.000
So for illustration let's say we have five credit and five control.
00:11:59.000 --> 00:12:02.000
I see a question pop up.
00:12:02.000 --> 00:12:04.000
Okay.
00:12:04.000 --> 00:12:17.000
Yeah, and then by definition by suit bar. We know that for the observe treaty units, we can observe their whitewash while they're y zero or observed contractors.
00:12:17.000 --> 00:12:25.000
Well first control units, we can observe their why zero. Why there why while are observed contractors.
00:12:25.000 --> 00:12:33.000
And now our ego becomes ensuring this pink icon for this blue
00:12:33.000 --> 00:12:45.000
in history for idea is to use observe treating others as the study population, and try to learn from something from there and generalize that to the hidden population here.
00:12:45.000 --> 00:12:56.000
And that's because this observed treaty unit that this observed critique group has the observation file. So it's very natural to go from here to make a difference.
00:12:56.000 --> 00:13:01.000
But there's a problem and the problem is the distribution mismatch.
00:13:01.000 --> 00:13:17.000
So it turns out that under the ignorable ignore ability assumption we can show that the distribution of X, by one for the observed treaty units can be decomposed in this way, with the first component is the Coverity distribution.
00:13:17.000 --> 00:13:31.000
And you can see there's a conditioning event equals one here that's because this is the treaty group. So we need to conditioning on that this people are industry group.
00:13:31.000 --> 00:13:45.000
yy and keep it, both accent T, reduce to this. So it's no longer depends on what this person is indicated, or control group, but only depends on the courtyard
00:13:45.000 --> 00:13:54.000
and applying the same logic to the hidden population here, we can show that the conditional distribution is still the same.
00:13:54.000 --> 00:13:59.000
And that's good because that means this two populations are sharing some information.
00:13:59.000 --> 00:14:04.000
Because otherwise, there's no hope that you can make a difference from a completely different populations.
00:14:04.000 --> 00:14:09.000
So this part is good, but because this group is to control.
00:14:09.000 --> 00:14:16.000
So they're Coverity distribution would be the distribution of X given t equals to zero.
00:14:16.000 --> 00:14:23.000
And unless you have a completely randomized experiment. This tube covered distributions are generally different.
00:14:23.000 --> 00:14:30.000
And in particular, This is called Coverity shift, emotion or the literature.
00:14:30.000 --> 00:14:35.000
So, having talked about this, we can illustrate what's going on here.
00:14:35.000 --> 00:14:39.000
So in the left column, we have the real word path.
00:14:39.000 --> 00:14:52.000
Well in the right column we have the counterfactual work department, and in the top row, we have the scatter plot of Batman vs x. And you can see that this two plus looks fair that look very different.
00:14:52.000 --> 00:14:58.000
But the strong nor both yourself and guarantee step for each slice of x.
00:14:58.000 --> 00:15:09.000
The distribution point one, it's the same to the whole difference is driven by the covariance shift or difference between the coverage distributions.
00:15:09.000 --> 00:15:13.000
And that's roughly what's going on behind this problem.
00:15:13.000 --> 00:15:27.000
And now we can rephrase our goals as you see it samples from one distribution to construct a prediction interval for the outcome under another distribution, which has a career shift.
00:15:27.000 --> 00:15:43.000
And the our problem. Applying a simple base formula, we can show that the coverage shift defined as the density ratio between the two covered distributions is proportional to one minus the propensity score, divided by the propensity score.
00:15:43.000 --> 00:15:50.000
So if you work in calling Prince, and this is a pretty standard IPW type. Wait.
00:15:50.000 --> 00:16:05.000
But if you haven't worked in causing Francis totally fine. So the only takeaway message here is that the selection process drives the whole coverage shift, so it's only depending on the selection process.
00:16:05.000 --> 00:16:20.000
So that's pretty much about cars or backward of this, of this work. So maybe let me take a pause again to see if there's any quick questions on this.
00:16:20.000 --> 00:16:28.000
Great. So now I'm going to move on to the actual meat of this talk, which is a form of interest bar.
00:16:28.000 --> 00:16:40.000
So as I alluded to, at the beginning of the talk, conforming inference is a very flexible framework, which take it samples as input.
00:16:40.000 --> 00:16:57.000
And then it can wrap around any black box algorithm and turn that into interval estimate, such that for future units from the same distribution that interval would cover the outcome was probably he say 90% in final samples.
00:16:57.000 --> 00:17:07.000
And of course we cannot apply this directly because we have this coverage shift our target population is not the same as our study population.
00:17:07.000 --> 00:17:21.000
So here, we will apply a variant of the conforming first, which is called weighty conforming France, proposed by Tim Shawnee Barbara Kennison roundness and similar to the standard conforming friends.
00:17:21.000 --> 00:17:26.000
This framework will take it simple so again as input.
00:17:26.000 --> 00:17:44.000
And then as long as the covariance shift is no, it can still wrapper around any black box algorithm and turn that into a interval estimate that covers the outcome was probably 90%, but now it works on Discovery shift.
00:17:44.000 --> 00:17:53.000
So, at this moment is still very abstract. So, in the next few slides, I'm going to illustrate one concrete instance of this framework, which is Scott.
00:17:53.000 --> 00:18:00.000
Wait is pre confirm was quote her question. We're waiting secure or for sure.
00:18:00.000 --> 00:18:02.000
So let me start on that.
00:18:02.000 --> 00:18:12.000
So this procedure was starts randomly pleading the observe tricky units into two folds. One proper training set, and one calibration set.
00:18:12.000 --> 00:18:15.000
And then owner training set.
00:18:15.000 --> 00:18:23.000
You can apply your favorite algorithm to estimate the fix and 95th percentile of why one given x.
00:18:23.000 --> 00:18:40.000
So by that, I mean, you can apply it the linear regression. You can apply the quarter random forest, you can apply for Quantel your net, where you can apply any be the method.
00:18:40.000 --> 00:18:46.000
As long as you believe they can give you a good estimate of this to courthouse.
00:18:46.000 --> 00:18:52.000
And then once you get this to blockers, you apply them on to the calibration.
00:18:52.000 --> 00:19:06.000
And then for each point in the calibration set we calculate something caught the sign distance, which is defined as the distance between each point to one of these two envelopes, which is closer.
00:19:06.000 --> 00:19:13.000
multiplied by a plus one. If it's outside the range or minus one, if this inside.
00:19:13.000 --> 00:19:20.000
So here's the mathematical formula for this side distance, but the formula is not important.
00:19:20.000 --> 00:19:27.000
So roughly speaking, it's just this distance, which is allowed to be inactive.
00:19:27.000 --> 00:19:31.000
Suppose we have 1000 points in this calibration set.
00:19:31.000 --> 00:19:36.000
Then we can calculate 1000 scientists.
00:19:36.000 --> 00:19:41.000
And we can collect all of them to get a histogram.
00:19:41.000 --> 00:19:47.000
So the standard conformal influence with directly operate on this histogram.
00:19:47.000 --> 00:19:55.000
But here because of the coverage shift. We need to do one more step, which is the waiting step.
00:19:55.000 --> 00:19:59.000
So in particular, suppose here we have a one dimensional x.
00:19:59.000 --> 00:20:06.000
And suppose this shift is Montoya increasing, just for illustration.
00:20:06.000 --> 00:20:21.000
Then, roughly speaking, we're going to operate the same distances. On the right, or in the places where the governorship this large, meaning that in the places where our target population.
00:20:21.000 --> 00:20:24.000
The first most from our training population. Then we're going to update those points.
00:20:24.000 --> 00:20:31.000
Then we're going to update those points. And also vice versa.
00:20:31.000 --> 00:20:35.000
And in this case, we can see that we need to shift the mass to the right.
00:20:35.000 --> 00:20:37.000
In this histogram.
00:20:37.000 --> 00:20:45.000
So this gives us a blue histogram, which you can see shift a lot of pass to the right.
00:20:45.000 --> 00:21:07.000
And specifically, this waiting step is given by this former were for the, for, for the same distance at each point. We will relate this point by this amount, where this amount is roughly proportional to the coverage shift.
00:21:07.000 --> 00:21:15.000
And again, this formula is important. So the only thing important here is that once you know the career shift.
00:21:15.000 --> 00:21:22.000
Or, if you can estimate the governorship, then you can at least estimate the remaining escape.
00:21:22.000 --> 00:21:28.000
And this will give you the histogram, that we want.
00:21:28.000 --> 00:21:37.000
And after that, we will take the 90th percentile of this histogram unconditional and car that q acts.
00:21:37.000 --> 00:21:46.000
And finally, this procedure will produce an interval, as this two quarter estimate, plus and minus qx.
00:21:46.000 --> 00:21:55.000
So that's the procedure is very simple and you can see it's very flexible, because this to quota estimate can be anything.
00:21:55.000 --> 00:22:03.000
But this there at this intro for a bit and try to understand what's this to access do.
00:22:03.000 --> 00:22:12.000
So suppose we don't have this qx, then that will give us the raw interval, which is given by two fifths and 95th percentile.
00:22:12.000 --> 00:22:17.000
So you can see that if our personnel estimates are very accurate.
00:22:17.000 --> 00:22:26.000
Then we do need to add and minus this qx because this raw interval can already do first 90% coverage.
00:22:26.000 --> 00:22:30.000
But that's usually not the case in the practice.
00:22:30.000 --> 00:22:48.000
And one improve observation in practice is that many machine learning algorithms will be overconfident, in the sense that the fifth person how much actually asked me the 50th percentile by the 95th percentile might actually estimate 85th percentile.
00:22:48.000 --> 00:22:52.000
So in that case, without this actual qx.
00:22:52.000 --> 00:22:58.000
We only have 70% coverage, instead of 90% coverage.
00:22:58.000 --> 00:23:01.000
And that's very bad.
00:23:01.000 --> 00:23:17.000
So turns out this qx is a recalibration.
00:23:17.000 --> 00:23:38.000
And this is the procedure, but maybe let me take a pass again to see if there's anything unclear about the operation of this, of this procedure.
00:23:38.000 --> 00:23:38.000
Yeah.
00:23:38.000 --> 00:23:41.000
Yeah.
00:23:41.000 --> 00:23:44.000
So just have a small question here.
00:23:44.000 --> 00:23:59.000
Does x has to be one dimensional or it can be in any dimension. In this picture, it can be anything. It can be one dimensional can be high dimensional, it can be mixed time, there could be missing, and it can be unstructured data.
00:23:59.000 --> 00:24:16.000
So as long as you can estimate the quantum so for example if you have an op problem, where x is, even on structure. But suppose you have a great dp or not, that he can give me this quarter estimate for a certain outcome, that's fine.
00:24:16.000 --> 00:24:24.000
Don't need that was a good shift the low us like
00:24:24.000 --> 00:24:36.000
to have two loaves of wise, depending on whether you have the scholarships or not, and those you need to have like this to lose having distinct phases
00:24:36.000 --> 00:24:37.000
discovery process.
00:24:37.000 --> 00:24:43.000
Sorry I didn't completely fall it. Could you repeat the question.
00:24:43.000 --> 00:24:47.000
So, um, well under the current shift.
00:24:47.000 --> 00:24:50.000
Why has an adult Oh right.
00:24:50.000 --> 00:25:02.000
Why so So, what is what is a viable way. Yeah. How's it different distribution, depending on whether you're in the scholarship or not.
00:25:02.000 --> 00:25:14.000
Right, right. But here we assume that we have a generic coverage ship so if there's no coverage shift, then you just don't need to do rebating, you just directly operate on this rap histogram.
00:25:14.000 --> 00:25:27.000
So my question was, what do you need to assume on the low on documented shift can be anything or those needed. You need some strong assumption on what is under coach leaves like it doesn't need to be done, it can be very different from the original Oh,
00:25:27.000 --> 00:25:44.000
It can be very different from the ordinary Oh, yeah, yeah. So, um, I will get you to assumptions in the next few slides. Okay, roughly speaking, we know, because if you record the slides, we know that if we know the propensity score it and we know the
00:25:44.000 --> 00:25:56.000
cover our shift. So in randomized experiments, we know this. So, in other words, in that case, we know the covert shift and turns out.
00:25:56.000 --> 00:26:03.000
Yeah, maybe I can get you more details in the next slide on the theoretical property, and that's, That's going to be more clear. Thank you.
00:26:03.000 --> 00:26:08.000
Thanks for the question.
00:26:08.000 --> 00:26:09.000
Okay.
00:26:09.000 --> 00:26:17.000
Okay, so now let's talk about the theoretical kt. So the first one is on the randomized experiments.
00:26:17.000 --> 00:26:27.000
And in this case we know the propensity score. so we can directly plug into our procedure and turns out what we can show us without any actual assumption.
00:26:27.000 --> 00:26:36.000
Then the super population suit by, and the strong nor ability, we, we can have this gap coverage guarantee.
00:26:36.000 --> 00:26:48.000
And this guarantee again is in finance samples and a host for any condition of distribution for any sample size for any procedure to fit conditional pounds.
00:26:48.000 --> 00:27:06.000
But on the other hand, we might say that if we just want this guarantee we can simply set see what has to be minus infinity to infinity. That will give us, 100% coverage, and clearly that's not something we want to do any statistics, we often talk about
00:27:06.000 --> 00:27:13.000
one side error, while trying to do something to minimize the other side of there. And here, we want to do the same thing.
00:27:13.000 --> 00:27:20.000
So turns out we can also show that this coverages app or bounded by 90%, plus a small accounts and divided by n.
00:27:20.000 --> 00:27:24.000
If the scientist senses are almost surely the steak.
00:27:24.000 --> 00:27:31.000
And moreover, a overlap condition holds for the coverage shift, so to speak.
00:27:31.000 --> 00:27:48.000
And if you haven't seen this before. This overlap. Sometimes it's referred to as positivity, or common support, and it refers to the assumption that everybody should have at least a positive probability to get into the treatment and control.
00:27:48.000 --> 00:28:04.000
So that's probably. That's probably answering the question that Winston's just asked. So, in order to get an upper bound for this coverage. We do need to discover a shift to be say for example now in zero or now.
00:28:04.000 --> 00:28:08.000
Sorry, it can be zero but not infinity.
00:28:08.000 --> 00:28:18.000
Because, if there are some conversion equals to infinity that means this set of subset of the target population is completely different from our training population.
00:28:18.000 --> 00:28:29.000
In which case, unless we make strong assumptions, to extrapolate our findings to those population, there's no way that we can get any information on that.
00:28:29.000 --> 00:28:49.000
So our interval will be necessarily white, but to yeah so that's why we need this, we need this to care about, but for lower about we don't need that because this just gave us a very wide interval, and it won't affect our coverage.
00:28:49.000 --> 00:28:53.000
So maybe does that address some of your question, wisdom.
00:28:53.000 --> 00:28:58.000
Yes, definitely make sense.
00:28:58.000 --> 00:29:08.000
Okay, so this is about randomized experiment. Of course, we want to do is more general observational studies and turns out.
00:29:08.000 --> 00:29:14.000
In this case we can estimate the propensity score and use that in our procedure.
00:29:14.000 --> 00:29:19.000
And what we can show us very informally.
00:29:19.000 --> 00:29:28.000
This coverage will be approximately 90% if either the propensity score or the conditional courthouse are well estimated.
00:29:28.000 --> 00:29:35.000
So I highlighted, or here to emphasize that this is something like a double robustness.
00:29:35.000 --> 00:29:42.000
So it says very similar to double robustness for average effect if you're working causal inference.
00:29:42.000 --> 00:29:57.000
But it is a very different concept because here we are talking about the coverage for interval estimate, while the classical double robustness is in terms of the consistency of a point estimate.
00:29:57.000 --> 00:30:03.000
So, though they shared a similar spirit, but I would say they're fundamentally different.
00:30:03.000 --> 00:30:10.000
So like the underlying reasoning and the proof tactics are completely different.
00:30:10.000 --> 00:30:12.000
And Hoover.
00:30:12.000 --> 00:30:33.000
If we can estimate are conditional quote house well, then we can achieve a stronger guarantee, which is the conditional carpet. So you see that in this coverage, we see that we only talk about a average coverage over all possible individuals.
00:30:33.000 --> 00:30:42.000
That's what we want to achieve in practice is that either a particular individual. We want to see them, we want to say that this interval covers the outcome.
00:30:42.000 --> 00:30:45.000
So that's the conditional coverage and turns out.
00:30:45.000 --> 00:30:52.000
This can be achieved if we can estimate control courthouse well.
00:30:52.000 --> 00:30:58.000
So another way to interpret this result is a no freelance result.
00:30:58.000 --> 00:31:02.000
So, you can see that
00:31:02.000 --> 00:31:10.000
if we can estimate our propensity score well the previous slide shows that we can't afford, Abi surely that quote how ask them.
00:31:10.000 --> 00:31:18.000
In order to achieve this coverage. But that shouldn't be true, right, in practice, if you can estimate both components. Well, you should do our best.
00:31:18.000 --> 00:31:20.000
and this results just that.
00:31:20.000 --> 00:31:33.000
Even if you can estimate the professor's score well you should still try your best to estimate those conditional courthouse, because if they happens to be good, you can achieve a more desirable property.
00:31:33.000 --> 00:31:45.000
And that's the story here. But the good thing that the other side is that even if you happen to mess it up. You can still get some character, some production.
00:31:45.000 --> 00:31:53.000
And that that's basically the theoretical picture of of this work.
00:31:53.000 --> 00:32:07.000
And before moving on to the next pause. I want to highlight that first. This can be generalized to other types of counterfactual influence. So you can see that previously we only talk about the coverage over to control units.
00:32:07.000 --> 00:32:12.000
But in general, we can't allow for any coverage coverage shift.
00:32:12.000 --> 00:32:27.000
And if you set this to x, instead of setting it to be the coverage distribution for controls. You can set it to be the general population for the x distribution for the general population that's similar to a tee, and causal inference.
00:32:27.000 --> 00:32:40.000
And also you can define att and APC but more interestingly, you can study generalization or transport transport ability, which is a very popular topic being causal inference recently.
00:32:40.000 --> 00:32:48.000
So in that case, so suppose you are running your study in California, and you want to generalize this study to New York.
00:32:48.000 --> 00:33:01.000
So there you see that other than the selection process in California. There's another shift, due to the difference of the characteristics between two populations between California and New Yorkers.
00:33:01.000 --> 00:33:09.000
So there there's the actual covariance, but you can still adjust for it. Using the same way in the literature.
00:33:09.000 --> 00:33:16.000
And you can apply the procedure. In the same way as long as he replaced the W x as this one.
00:33:16.000 --> 00:33:23.000
And everything still works on all the theoretical properties still hope.
00:33:23.000 --> 00:33:37.000
So, before talking about the simulation results maybe let me take another pass to see if there's any question on the theory.
00:33:37.000 --> 00:33:53.000
But you might say one more time what exactly you're estimating with the conditional quantity is it kind of that to the actual response contest or is it the distance from the contents, the score function that just won't have the score contract if you just
00:33:53.000 --> 00:33:54.000
read everything.
00:33:54.000 --> 00:34:15.000
Yeah, that's a great question. Thanks for bringing this up. So this by this one. I mean, the estimate of the quota of why won't give x. Now the score give x, so it's best that you give me why you give me x, and I use any tactics to get this quarter estimate.
00:34:15.000 --> 00:34:30.000
So for example if you're a Basie and you just fit your favorite model of why I'm x, and then you calculate the post heroes, and you use say this one will be the fifth and 95th percentile of the posterior distribution.
00:34:30.000 --> 00:34:48.000
It's our capital Q or backs that was, oh yeah, sorry. Yeah, that one is the 90th percentile of this histogram unconditionally. So here, suppose we have 5000 point you just take 900 900 largest one in this histogram.
00:34:48.000 --> 00:34:50.000
And that's the capital correct.
00:34:50.000 --> 00:34:53.000
Why is it a function of x.
00:34:53.000 --> 00:35:08.000
Oh yeah, yeah. Okay, yeah that's that's a little bit tactical. So, if there's no covariance shift, then this will not depends on x, Because this is just a own way to histogram.
00:35:08.000 --> 00:35:20.000
But here we have a covert ship and you can see that the formula for the waiting scheme involves a w x in the denominator. So in other words, the weight, depends on your testing point.
00:35:20.000 --> 00:35:24.000
So for different testing point, you will have different weight.
00:35:24.000 --> 00:35:36.000
And suppose your testing points are not too different from your training samples, you can see that this term is almost negligible, because you see that there's a giant some in the denominator.
00:35:36.000 --> 00:35:48.000
And if this term is not too large, it will not affect this weight. So in that case, this photo, big capital qx depends on x, but the dependence very weak.
00:35:48.000 --> 00:35:57.000
But the reason we need to include this term is that we want to guard against the cases where the talking points very far away from the training distribution.
00:35:57.000 --> 00:36:07.000
So in that case, suppose w access infinite, then the or close to infinite, then this term will be affected by x a lot.
00:36:07.000 --> 00:36:21.000
So, yeah, so in a case of all my friends with no covariance shift adjustment to the interval is simply adding and subtracting incontinent to make it bigger, and as a slight tweak to account for the shift.
00:36:21.000 --> 00:36:23.000
Yeah.
00:36:23.000 --> 00:36:28.000
Yeah. In that case, this, this pi is just one over n plus one.
00:36:28.000 --> 00:36:31.000
If there's no conversion.
00:36:31.000 --> 00:36:36.000
Thanks for the question. That's a great question.
00:36:36.000 --> 00:36:43.000
Okay.
00:36:43.000 --> 00:36:53.000
So you know like usually the same, the Gulf simulation results is to confirm that our method works well and you know like if if that's that important.
00:36:53.000 --> 00:36:59.000
The hopefully that this simulation results will involve something that is a little bit surprising.
00:36:59.000 --> 00:37:04.000
And it's surprising part it's not all our method, but.
00:37:04.000 --> 00:37:13.000
So here I consider a very simple data general process with 1000 samples with 100 dimensional x, which are correlated calcium.
00:37:13.000 --> 00:37:24.000
Then we set y zero to be equal to zero, such that the treatment effect reduced to kind of factors, because only one why one needs to be for.
00:37:24.000 --> 00:37:40.000
And we simple why one as a conditional calcium variable with various moves and sparse main function, and standard deviation can be either Hollis candlestick or had her scholastic while In the latter case is also a very smooth and sparse.
00:37:40.000 --> 00:37:48.000
And finally, the propensity score is smooth sense bars and also has great overlap.
00:37:48.000 --> 00:37:52.000
So, you see that this is a very simple dataset general process.
00:37:52.000 --> 00:38:03.000
And the point of this is to is to show that, although under such a process every method should be expected to work, but in fact is.
00:38:03.000 --> 00:38:05.000
That's not true.
00:38:05.000 --> 00:38:12.000
So I want to use the simple data journey process to show how challenging this problem is.
00:38:12.000 --> 00:38:20.000
And before showing you the results. Here we have our package, which is free to download from my to help.
00:38:20.000 --> 00:38:35.000
And this package implements the general framework of confectionery influence, as well as some convenient wrappers, which are easy to use. So if you're interested you can just download it and play with it.
00:38:35.000 --> 00:38:53.000
And the first figure I'm going to show you is the module coverage of something called Kate, which is not our thing, but is that is the conditional expectation of YYQX, because this is more standard in the literature, like whenever we talk about trim.
00:38:53.000 --> 00:39:01.000
In fact, we're talking about some conditional expectation. We're talking about certain average, and usually people are investigating Kate.
00:39:01.000 --> 00:39:04.000
And here I include six methods.
00:39:04.000 --> 00:39:22.000
And the first three of them are our method, wrapping around Bart a beta method and quarter boosting content run force and the other three are three very popular methods in the heterogeneous treatment effects culture, which are part, x learner, and cause
00:39:22.000 --> 00:39:36.000
forest. So, there are good, not because they are publishing into top journals, not because they have very good theory, but also they have very good software's, so I include them because they are.
00:39:36.000 --> 00:39:41.000
I mean, every perspective of this method talk, are great.
00:39:41.000 --> 00:39:43.000
And.
00:39:43.000 --> 00:39:51.000
And also another reason for including this free methods, is that they rely on qualitatively different strategies to quantify certainty.
00:39:51.000 --> 00:40:06.000
So the bar is the basic method. So it's going to use a credible interview to quantify uncertainty and an ax learner is using bootstrap well call the forest is using infinite has more Jacqueline.
00:40:06.000 --> 00:40:15.000
And so here you can see that all the three methods are targeted on covering Kate, But our method is not talking to you.
00:40:15.000 --> 00:40:22.000
And you can see in both the Hamas kinetic and Harris capacity cases, we see that our methods are conservative.
00:40:22.000 --> 00:40:40.000
So it's not ideal. We don't want to be conservative but compared to being a conservative is slightly better to be conservative, in many cases, and for the other three method we can see that, Bart is achieving a better coverage in Hamas capacity case,
00:40:40.000 --> 00:40:45.000
which is great. But the undercovers in address good that's the case.
00:40:45.000 --> 00:40:53.000
Well for the other two methods. You can see that the undercover in most cases.
00:40:53.000 --> 00:41:01.000
And there's a clear covers deficit in both scenarios, even if this data journey processes simple.
00:41:01.000 --> 00:41:15.000
So, when I present this figure, a lot of people are getting confused because, as I mentioned this procedures are our show, to show to be great. Both empirically and theoretically.
00:41:15.000 --> 00:41:26.000
Why can we get this fear. So, one point I wanted to highlight is that the coverage is a fundamentally different concept was the accuracy.
00:41:26.000 --> 00:41:35.000
So, in principle, you can get a very accurate point estimate, but you can get a very poor inch, and on the other hand, you can get a very good in trouble.
00:41:35.000 --> 00:41:42.000
But it's a very poor point estimate, like, the first case could be.
00:41:42.000 --> 00:41:56.000
You have a very good point estimate but because your procedure is too complicated, you tend to underestimate your sender error, while the second case could be partial an application, in which case you cannot even estimate.
00:41:56.000 --> 00:42:03.000
Getting a point estimate, but you may be able to catch a reasonable intro.
00:42:03.000 --> 00:42:12.000
So the point is having a good accuracy doesn't mean having a good coverage, and that's what this picture picture shows.
00:42:12.000 --> 00:42:20.000
So the second picture is our actual meat, which is the coverage on white other random for. And now you see, for our methods.
00:42:20.000 --> 00:42:27.000
We cover. We almost achieved a coverage, Australia or fear.
00:42:27.000 --> 00:42:41.000
And for Bart's because barred as a flexible Bazin algorithm. So if we replace the credible interval, by for Sherpa predictive interval, then it is supposed to covering white.
00:42:41.000 --> 00:42:51.000
And in this case, in the sky that's the case, we can see that is indeed the case. And actually, it has smaller variability, compared to our method, meaning that this better.
00:42:51.000 --> 00:43:06.000
In this case, but in Africa plastic is it an undercover.
00:43:06.000 --> 00:43:16.000
And that's the one side of the story, because covered, it's just a one sided her. And now we can look at the tightness with type one error.
00:43:16.000 --> 00:43:26.000
So here the blue light gifts that are the Oracle lens, sorry, the lens of the Oracle intervals, pretending that you know the fifths and 95th percentile.
00:43:26.000 --> 00:43:33.000
And now you can see in a homeschool that's the case Bart's is having a very good performance.
00:43:33.000 --> 00:43:42.000
And what's interesting is if we wrap around Bart would just lose a little bit. And we can preserve to good performance.
00:43:42.000 --> 00:43:54.000
But in house good as the case Bard is undercover. So we know it's not doing a good job. And somehow, the interval ends of our procedure becomes much more variable.
00:43:54.000 --> 00:44:02.000
I think that's a good thing because if somehow the tax that bar is not doing a good job. It has to compensate for that.
00:44:02.000 --> 00:44:11.000
Second, if we reference, some algorithms that are more stable to have this capacity, we can see that they're pretty stable.
00:44:11.000 --> 00:44:19.000
And finally, the reason that this two method have pretty short as it's just because they undercover.
00:44:19.000 --> 00:44:32.000
The last figure I'm going to show you is the conditional covers, because as I mentioned before, the theoretical guarantees mostly all the marginal coverage, but in practice we do want to get conditional coverage.
00:44:32.000 --> 00:44:36.000
So here because we have 100 dimensional compared, which is very hard to visualize.
00:44:36.000 --> 00:44:49.000
So I stratified the covers using a one dimensional summary, which is the condition of areas because intuitively if the control variances large, that is harder to cover.
00:44:49.000 --> 00:45:06.000
And we do see this month on patent if are using bark without calibration. You see that even if the overall coverage is around 80%. But, towards the right hand control coverage is only 50%, which is quite worrisome.
00:45:06.000 --> 00:45:13.000
But if we wrap around, Bart, we can see that, although this mental trends still persist, but it's much better.
00:45:13.000 --> 00:45:26.000
And finally, if we wrap around random forest and boosting, we can see a pretty flat coverage. Meaning that, at least for this case, we can achieve conditional car.
00:45:26.000 --> 00:45:39.000
So that's the story, given by the simulations, and maybe let me pass again to see if there's any question on this part.
00:45:39.000 --> 00:45:49.000
Great. So a question. Did you see how you show that the, your coverage of the tape is is kind of conservative.
00:45:49.000 --> 00:45:54.000
And then the coverage of why one seems very nice.
00:45:54.000 --> 00:46:05.000
In terms of the lines. Did you ever look at the length of the intervals for. Did you ever look kind of evaluate method as it in terms of how cuz on planet is for estimating Kate.
00:46:05.000 --> 00:46:08.000
Yeah. New Year's is not really designed to estimate Kate. Right.
00:46:08.000 --> 00:46:26.000
Right. That's a great question. So, um, so I guess I can answer this question from two perspectives. So first, the consumer influence is really designed for the outcomes so there's no simple way to to modify it into estimating me.
00:46:26.000 --> 00:46:34.000
And also there are some impossible three results, which I'm going to touch upon. In the next few slides. That's.
00:46:34.000 --> 00:46:46.000
Without further assumption own this function is impossible to get you on certain qualification for conditional expectation, if x is continuous.
00:46:46.000 --> 00:47:00.000
So, so this may sounds very surprising but I think, on the other hand is very intuitive So suppose you know this function is is smooth, but you don't know how smooth at us, that essentially you know nothing about this function because basically anything
00:47:00.000 --> 00:47:15.000
you can observe in practice can be more or less approximated by some differential additional differentiable function. But if the slope is very steep, then there's no way we can estimate it using a small sample.
00:47:15.000 --> 00:47:33.000
So I guess that's one part of the answer, but the audit part is that there are ways to two cats, a calibrated predictions for Kate, but in another sense so so that's one of my ongoing work but I think the setting will be quite different oh so the criteria
00:47:33.000 --> 00:47:38.000
is very different. So maybe we can check off I prefer interest.
00:47:38.000 --> 00:47:39.000
Yeah, thanks a lot.
00:47:39.000 --> 00:47:43.000
Thanks Thanks for the question.
00:47:43.000 --> 00:47:45.000
Okay, So if there's no question.
00:47:45.000 --> 00:47:52.000
Now ask how, how much time do you have about 10 minutes left. Okay, that's that's that's all. Thanks.
00:47:52.000 --> 00:48:02.000
So maybe in the last five to 10 minutes, I will talk about how to generalize the contract intervals to individual term effects.
00:48:02.000 --> 00:48:08.000
And, roughly speaking, you just get intervals for y zero and white one separately.
00:48:08.000 --> 00:48:11.000
and somehow contrast this to intervals.
00:48:11.000 --> 00:48:28.000
And one knife ways, you got intro for yy an interval fight for y zero. And then the opera and for the intro for Tremaine fact, it's just taking the upper end of why one, minus the lower end of y zero, and then you can similarly define the lower end for
00:48:28.000 --> 00:48:35.000
the intro. So I'm not going to talk about the methods you can look at more details in your paper.
00:48:35.000 --> 00:48:52.000
But what I want to emphasize is a little bit philosophical point, that might be of interest to many people. So here, our guarantee is it defined as why one minus y zero, not Kate, and in particular kid can be formulated as the conditional expectation
00:48:52.000 --> 00:48:55.000
of it, Max.
00:48:55.000 --> 00:49:08.000
So, I want to talk a bit about what's the difference between and which one we should look at impactors, because this is one of the most fundamental problems in this in this problem in this literature.
00:49:08.000 --> 00:49:15.000
So, and that's it is a deterministic function of x, which is rarely the case in practice.
00:49:15.000 --> 00:49:18.000
It is not equal to Kate.
00:49:18.000 --> 00:49:32.000
So, I think, Keith has a lot of advantages like, for example, you don't need to worry about the dependency structures be to whine, whine, whine zero because Kate is additive, so you can decouple why why the wiser.
00:49:32.000 --> 00:49:46.000
But Kate and also there has been a lot of good methods to estimate and cover Kate, so I think Kate is a great estimate, but some applications kid does have some very critical limitations.
00:49:46.000 --> 00:49:59.000
So the first one is Kate's ignores the uncertainty of the response around the function around a conditional expectation. So for example, like this is an old story.
00:49:59.000 --> 00:50:05.000
So suppose you have a stratum given by age gender Hi this bulking.
00:50:05.000 --> 00:50:10.000
In, you may have different scenarios where kx exactly the same.
00:50:10.000 --> 00:50:19.000
Well, in the first one. Everybody got a positive it while the in the last one, only 20% of gather positive it.
00:50:19.000 --> 00:50:29.000
So I think we would make different decisions. If we were in if we were able to distinguish those three scenarios.
00:50:29.000 --> 00:50:30.000
Okay.
00:50:30.000 --> 00:50:50.000
So the second limitation is that because Kate is actually related to David's question. So, Kate is a conditional expectation, and the uncertainty of estimator do to finance impulse, it's very difficult to quantify.
00:50:50.000 --> 00:51:05.000
And one thing from theory point perspective, we have this barber paper by Rena buffer last year, showing that showing what I just said. Suppose you don't have any assumption, OYQX, even if White's binary.
00:51:05.000 --> 00:51:20.000
Getting reasonable, or self quantification for expectation of X is very difficult, in the sense that, even if you have enough samples, the intervals we're not shrink to a point,
00:51:20.000 --> 00:51:35.000
but also improved boy, this is the finger I show you a couple slides ago that you can see all this methods have more or less theoretical guarantees on covering Kate, but you see that even on there such simple data driven process.
00:51:35.000 --> 00:51:38.000
The theoretical guaranteed does it kicking.
00:51:38.000 --> 00:51:49.000
Or like their theoretical guarantees more or less a symbolic, but the simplistic regime doesn't happen in this in this problems.
00:51:49.000 --> 00:52:00.000
So the final thing is a quote from today a person who has spent a lot of time, advocating distinguishing it from Kate.
00:52:00.000 --> 00:52:07.000
And because Kate is still a average. So usually I will quite heterogeneous average effect.
00:52:07.000 --> 00:52:19.000
But the individualized treatment factor is really why one minus y zero, because the individual in at the individual level, there are still some uncertainty that cannot be explained away by the covariance.
00:52:19.000 --> 00:52:21.000
In most of the cases.
00:52:21.000 --> 00:52:27.000
So that's about the philosophical side of the difference between Kate it.
00:52:27.000 --> 00:52:42.000
And now, let me wrap up. So in this talk, we propose a conformal inference methods for a kind of factors and an individual trigger effects, which is reliable, in the sense that for a randomized experiment, it can achieve a near exact coverage in finance
00:52:42.000 --> 00:52:44.000
areas with any black box.
00:52:44.000 --> 00:52:51.000
Well, in observational studies, it can achieve aw robust guarantee of coverage.
00:52:51.000 --> 00:53:09.000
And also, this conforming insurance is hugely flexible and customizable, and I'm a big fan of it, and it's not limited to contractual inference. So, are resorting involved that extending this to survival perfect analysis, and also to computer vision,
00:53:09.000 --> 00:53:22.000
and to outlier or automatic distribution detection. So if you're interested, check them out. And with that, I would like to thank you all for your attention.
00:53:22.000 --> 00:53:25.000
Fantastic. Great, thanks.
00:53:25.000 --> 00:53:38.000
Thanks, I'm seeing some applause and the, the chat as well or in the participants list. So we do have a few minutes for any additional questions so if I know we had many questions throughout the talk, but if folks have additional questions you're welcome
00:53:38.000 --> 00:53:47.000
to raise your hand and you put it on the chat, however you feel so moved.
00:53:47.000 --> 00:53:57.000
Well, while we pause as people think about any lingering questions they have one sort of very basic question that I had is.
00:53:57.000 --> 00:54:10.000
So the, the waiting procedure requires knowing or be able, being able to measure the covariance shift, and that seems like a non trivial thing to be able to do so I would just love to hear your thoughts on sort of that challenge.
00:54:10.000 --> 00:54:30.000
Yeah. Yeah, that's a great question. So, there's one thing I didn't mention, which is that here because we only need to know this ratio. So that means we only need to estimate the governorship up to a multiplicative cast it, so we don't need to exactly
00:54:30.000 --> 00:54:32.000
know that.
00:54:32.000 --> 00:54:44.000
And although this sounds like a simple trick or a simple thing, but actually make this estimation problem much easier. So what we know is what we can do us.
00:54:44.000 --> 00:54:56.000
So given the treatment assignment, you can feed a probabilistic classifier, of that on to x. So for example, you can fit a logistic regression and turns out.
00:54:56.000 --> 00:55:11.000
Turns out the probability estimate given by the model differs from the true coverage of just by a counselor, so you can use that directly. So now instead of estimating a high dimensional data which is almost impossible.
00:55:11.000 --> 00:55:11.000
Yeah, we just need to estimate a problem.
00:55:11.000 --> 00:55:28.000
Yeah, we just need to estimate a probability, and we have a lot of flexible tours and moreover, that's something special in color inference and also survival analysis in the sense that, you know, mathematically estimating w x is almost as hard as estimating
00:55:28.000 --> 00:55:44.000
the control clubhouse because one is he given x DRS yq Max, but in practice, they're often asymmetric because the selection process is often driven by a few important factors, while the outcome process can be driven by a lot of other things.
00:55:44.000 --> 00:55:47.000
So like think of a policy side.
00:55:47.000 --> 00:56:11.000
So let's say if you want to evaluate certain law on certain biological aspect of, of the population, then the law making or policymaking usually depends on some things that we know much better than how those covers affects say the BMI of people.
00:56:11.000 --> 00:56:20.000
So yeah, this asymmetry also allows us to leverage the information from the selection process.
00:56:20.000 --> 00:56:22.000
Great, thank you. thank you so much.
00:56:22.000 --> 00:56:31.000
And so we are just about out of time I will pause again in case anybody has any final questions they want to ask.
00:56:31.000 --> 00:56:38.000
But, seeing no hands are unmuted. Feel free to talk over me if you're just waiting for me to stop talking.
00:56:38.000 --> 00:56:53.000
But seeing no hands are unmuted. I will thank our speaker, one last time and thank you all for coming to this kickoff of our of our fall speaker series so hopefully at some point will be in person again.
00:56:53.000 --> 00:56:57.000
This was a really fantastic talk so thank you so much and thanks everyone for coming.
00:56:57.000 --> 00:57:12.000
Thank you all do another round of applause.