Logo

CDS Data Science Seminar Series - Shared screen with speaker view
Nate Gruver
32:07
The results for CoAtNet seem to imply adding the locality property to the self-attention helps a lot
Naomi Subtlety Saphra
32:40
+ ConViT too
kamalesh palanisamy
01:00:59
Why do we look at the accuracy of task 2 since forgetting happens with task 1? For freeze layers.
kamalesh palanisamy
01:08:25
I cannot unmute myself sorry.
kamalesh palanisamy
01:09:26
Okay that makes sense. Thank you for the talk!
mimee
01:11:52
Related: an empirical study on whether all examples get forgotten: https://arxiv.org/abs/1812.05159