One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention. (arXiv:2307.03576v1 [cs.LG])

12 Jul 2023

一層の線形自己注意NNにおいて，in-context learningが1ステップの勾配降下を実現していることを示した．入力データの共分散は勾配法の前処理行列に影響する一方，ターゲットの非線形性はアルゴリズムを変えないことを示した．Ahn et al 2023やZhang et al 2023も参照．

基本情報

@misc{mahankali2023step,
      title={One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention}, 
      author={Arvind Mahankali and Tatsunori B. Hashimoto and Tengyu Ma},
      year={2023},
      eprint={2307.03576},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

備忘録機械学習，コンピュータビジョン，時々物理

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention. (arXiv:2307.03576v1 [cs.LG])

基本情報

論文リンク

著者・所属

新規性

手法

結果

議論・コメント

関連文献

Tags

備忘録 機械学習，コンピュータビジョン，時々物理

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention. (arXiv:2307.03576v1 [cs.LG])

基本情報

論文リンク

著者・所属

新規性

手法

結果

議論・コメント

関連文献

Tags

Related Posts

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions 11 Feb 2024

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames. (arXiv:2311.17241v1 [cs.CV]) 08 Feb 2024

Region-Based Representations Revisited 07 Feb 2024

備忘録機械学習，コンピュータビジョン，時々物理