| Title: |
Discovering Language Model Behaviors with Model-Written Evaluations |
| Authors: |
Perez, Ethan; Ringer, Sam; Lukosiute, Kamile; Nguyen, Karina; Chen, Edwin; Heiner, Scott; Pettit, Craig; Olsson, Catherine; Kundu, Sandipan; Kadavath, Saurav; Jones, Andy; Chen, Anna; Mann, Benjamin; Israel, Brian; Seethor, Bryan; McKinnon, Cameron; Olah, Christopher; Yan, Da; Amodei, Daniela; Amodei, Dario; Drain, Dawn; Li, Dustin; Tran-Johnson, Eli; Khundadze, Guro; Kernion, Jackson; Landis, James; Kerr, Jamie; Mueller, Jared; Hyun, Jeeyoon; Landau, Joshua; Ndousse, Kamal; Goldberg, Landon; Lovitt, Liane; Lucas, Martin; Sellitto, Michael; Zhang, Miranda; Kingsland, Neerav; Elhage, Nelson; Joseph, Nicholas; Mercado, Noemi; DasSarma, Nova; Rausch, Oliver; Larson, Robin; McCandlish, Sam; Johnston, Scott; Kravec, Shauna; El Showk, Sheer; Lanham, Tamera; Telleen-Lawton, Timothy; Brown, Tom |
| Source: |
Findings of the Association for Computational Linguistics: ACL 2023 ; page 13387-13434 |
| Publisher Information: |
Association for Computational Linguistics |
| Publication Year: |
2023 |
| Document Type: |
conference object |
| Language: |
unknown |
| ISBN: |
978-1-338-71343-5; 1-338-71343-4 |
| DOI: |
10.18653/v1/2023.findings-acl.847 |
| Availability: |
https://doi.org/10.18653/v1/2023.findings-acl.847 |
| Accession Number: |
edsbas.195A0F23 |
| Database: |
BASE |