As i understand, AI alignment has been studied for many years, mostly as an abstract problem. Now LLM seem like an object that can truly be used to test many alignment theories (IDA, mesa optimizer, etc.) . As AI alignment researchers, how would you view GPT4? What experiments have you done to GPT4?
LLMs (Foundation models) in general make alignment very difficult (sheer size of neural networks make interpretability very difficult). Perhaps even completely intractable