Feel free to type questions into the chat; I’ll moderate them when Yonatan breaks to take questions.
(You can send your questions to “panelists and attendees” instead of just “panelists”, knowing what other people are planning to ask may be useful to other attendees)
where do you get your prompts and how many do you use? Keita Kurita et al. (2019) have some that I think are better than most. For this sort of template project, my impression is that the choice of templates/word pairs is pretty important (from having done a few studies on this myself). How brittle are bias templates/word lists in your opinion?
Sorry, didn’t realize participants couldn’t unmute themselves
Is this specific to certain heads or layers? For example, if we remove these layers/heads, I suppose then some other heads would have large indirect effect?
How should we interpret the y axis in the indirect effect plots? Is 0.05 big?
How do you choose hardcoded counterfactual values for the mediators when measuring direct/indirect effects? Is the choice inspired by how the mediator (e.g., attention head) works or is it random?
(sorry my internet connection is very bad)
thanks Yonaton, very neat stuff!