This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Fazl
Posts
Sorted by New
4
Visualizing neural network planning
Ω
25d
Ω
0
48
Mechanistic Interpretability Workshop Happening at ICML 2024!
Ω
1mo
Ω
6
11
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
8mo
0
8
Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results
1y
0
Wiki Contributions
Comments