Follow
Joar Skalse
Joar Skalse
DPhil Student in Computer Science, Oxford University
Verified email at cs.ox.ac.uk
Title
Cited by
Cited by
Year
Defining and characterizing reward gaming
J Skalse, N Howe, D Krasheninnikov, D Krueger
Advances in Neural Information Processing Systems 35, 9460-9471, 2022
982022
Risks from learned optimization in advanced machine learning systems
E Hubinger, C van Merwijk, V Mikulik, J Skalse, S Garrabrant
arXiv preprint arXiv:1906.01820, 2019
942019
Is SGD a Bayesian sampler? Well, almost
C Mingard, G Valle-Pérez, J Skalse, AA Louis
Journal of Machine Learning Research 22 (79), 1-64, 2021
442021
Invariance in policy optimisation and partial identifiability in reward learning
JMV Skalse, M Farrugia-Roberts, S Russell, A Abate, A Gleave
International Conference on Machine Learning, 32033-32058, 2023
242023
Neural networks are a priori biased towards boolean functions with low entropy
C Mingard, J Skalse, G Valle-Pérez, D Martínez-Rubio, V Mikulik, ...
arXiv preprint arXiv:1909.11522, 2019
242019
Neural networks are a priori biased towards boolean functions with low entropy
C Mingard, J Skalse, G Valle-Pérez, D Martínez-Rubio, V Mikulik, ...
arXiv preprint arXiv:1909.11522, 2019
242019
Misspecification in inverse reinforcement learning
J Skalse, A Abate
Proceedings of the AAAI Conference on Artificial Intelligence 37 (12), 15136 …, 2023
152023
Reinforcement learning in Newcomblike environments
J Bell, L Linsefors, C Oesterheld, J Skalse
Advances in Neural Information Processing Systems 34, 22146-22157, 2021
142021
Lexicographic multi-objective reinforcement learning
J Skalse, L Hammond, C Griffin, A Abate
arXiv preprint arXiv:2212.13769, 2022
132022
Risks from learned optimization in advanced machine learning systems. arXiv
E Hubinger, C van Merwijk, V Mikulik, J Skalse, S Garrabrant
arXiv preprint arXiv:1906.01820, 2019
132019
The reward hypothesis is false
JMV Skalse, A Abate
42022
STARC: A General Framework For Quantifying Differences Between Reward Functions
J Skalse, L Farnik, SR Motwani, E Jenner, A Gleave, A Abate
arXiv preprint arXiv:2309.15257, 2023
32023
On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks
J Skalse, A Abate
Uncertainty in Artificial Intelligence, 1974-1984, 2023
22023
A General Counterexample to any Decision Theory and Some Responses
J Skalse
arXiv preprint arXiv:2101.00280, 2021
22021
Goodhart's Law in Reinforcement Learning
J Karwowski, O Hayman, X Bai, K Kiendlhofer, C Griffin, J Skalse
arXiv preprint arXiv:2310.09144, 2023
12023
A general framework for reward function distances
E Jenner, JMV Skalse, A Gleave
NeurIPS ML Safety Workshop, 2022
12022
All’s Well That Ends Well: Avoiding Side Effects with Distance-Impact Penalties
C Griffin, JMV Skalse, L Hammond, A Abate
NeurIPS ML Safety Workshop, 2022
12022
Safety Properties of Inductive Logic Programming.
G Leech, N Schoots, J Skalse
SafeAI@ AAAI, 2021
12021
Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification
J Skalse, A Abate
arXiv preprint arXiv:2403.06854, 2024
2024
On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
R Subramani, M Williams, M Heitmann, H Holm, C Griffin, J Skalse
arXiv preprint arXiv:2310.11840, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–20