LESSWRONG
LW

Fazl

Posts

Sorted by New

4Visualizing neural network planning

25d

0

48Mechanistic Interpretability Workshop Happening at ICML 2024!

1mo

6

11Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

8mo

0

8Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results

1y

0

Wiki Contributions

Comments