Reading List
Anthropic and other researchers detail "subliminal learning", where LLMs learn traits from model-generated data that is semantically unrelated to those traits (Anthropic) from Techmeme RSS feed.
Anthropic and other researchers detail "subliminal learning", where LLMs learn traits from model-generated data that is semantically unrelated to those traits (Anthropic)
Anthropic:
Anthropic and other researchers detail “subliminal learning”, where LLMs learn traits from model-generated data that is semantically unrelated to those traits — We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits.