- 博客(492)
- 资源 (1)
- 收藏
- 关注
原创 [Arxiv 2025] O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
[Arxiv 2025] O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
2025-05-07 09:35:45
801
原创 [Arxiv 2025] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
[Arxiv 2025] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
2025-04-29 21:00:40
792
原创 [Arxiv 2025] Rethinking Layer Removal: Preserving Critical Components with Task-Aware SVD
[Arxiv 2025] Rethinking Layer Removal: Preserving Critical Components with Task-Aware SVD
2025-01-03 20:47:57
752
原创 [Arxiv 2024] ProcessBench: Identifying Process Errors in Mathematical Reasoning
[Arxiv 2024] ProcessBench: Identifying Process Errors in Mathematical Reasoning
2024-12-23 10:16:43
363
原创 [ACL 2024] ReFT: Reasoning with REinforced Fine-Tuning
[ACL 2024] ReFT: Reasoning with REinforced Fine-Tuning
2024-12-09 23:27:28
640
原创 [Arxiv 2024] Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM‘s Reasoning
[Arxiv 2024] Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning
2024-12-09 17:29:54
1095
原创 [Arxiv 2024] Subtle Errors Matter: Preference Learning via Error-injected Self-editing
[Arxiv 2024] Subtle Errors Matter: Preference Learning via Error-injected Self-editing
2024-12-09 16:29:51
982
原创 [Arxiv 2024] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
[Arxiv 2024] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
2024-12-09 15:37:26
688
原创 [Arxiv 2024] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
[Arxiv 2024] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
2024-12-04 21:51:47
796
原创 [Arxiv 2024] rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
[Arxiv 2024] rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
2024-12-04 20:54:24
1137
原创 [NeurIPS 2022] Leveraging Inter-Layer Dependency for Post-Training Quantization
[NeurIPS 2022] Leveraging Inter-Layer Dependency for Post-Training Quantization
2024-11-28 23:31:42
639
原创 LLM 量化新篇章,4-bit 权重激活量化几乎无损!FlatQuant 的平坦之道
本文介绍来自华为诺亚方舟实验室、清华大学和香港中文大学联合在大语言模型量化上的最新工作 **FlatQuant (Fast and Learnable Affine Transformation)**。FlatQuant 通过为每个线性层适配轻量的可学习的仿射变换,有效平滑 LLM 离群值,得到更加平坦的权重和激活值分布,有效降低量化损失。相比此前的量化方法 [1][2],本方法首次在 **LLaMA-3-70B 上达到 W4A4
2024-10-22 19:08:00
2140
原创 [Arxiv 2024] PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
[Arxiv 2024] PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
2024-10-17 19:43:13
1256
原创 [Arxiv 2024] Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs
[Arxiv 2024] Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs
2024-10-17 13:19:42
282
原创 [NeurIPSW 2024] Self-Data Distillation for Recovering Quality in Pruned Large Language Models
[NeurIPSW 2024] Self-Data Distillation for Recovering Quality in Pruned Large Language Models
2024-10-17 10:29:38
473
原创 [NeurIPS 2022] STaR: Bootstrapping Reasoning With Reasoning
[NeurIPS 2022] STaR: Bootstrapping Reasoning With Reasoning
2024-10-05 21:03:33
1026
原创 [ICLR 2024] Let‘s Verify Step by Step
[ICLR 2024] Let's Verify Step by Step
2024-10-05 10:03:30
987
原创 [Arxiv 2024] Self-Rewarding Language Models
[Arxiv 2024] Self-Rewarding Language Models
2024-08-28 11:42:02
1192
原创 [NeurIPS 2024] Self-Refine: Iterative Refinement with Self-Feedback
[NeurIPS 2024] Self-Refine: Iterative Refinement with Self-Feedback
2024-08-25 10:57:32
397
原创 [ACL 2024] Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
[ACL 2024] Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
2024-08-23 00:46:47
1032
原创 [ACL 2024] Revisiting Knowledge Distillation for Autoregressive Language Models
[ACL 2024] Revisiting Knowledge Distillation for Autoregressive Language Models
2024-08-21 10:55:25
906
原创 [Arxiv 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Spec Dec
[Arxiv 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Spec Dec
2024-08-05 15:59:28
386
原创 [Arxiv 2024] EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
[Arxiv 2024] EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
2024-08-05 15:09:40
795
原创 [ICLR 2024] On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
[ICLR 2024] On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
2024-08-04 21:40:55
1272
原创 [ACL 2023] Distilling Step-by-Step! Outperforming LLMs with Less Data and Smaller Model
[ACL 2023] Distilling Step-by-Step! Outperforming LLMs with Less Data and Smaller Model
2024-08-04 11:31:14
929
原创 [NeurIPS 2022] Chain-of-thought prompting elicits reasoning in large language models
[NeurIPS 2022] Chain-of-thought prompting elicits reasoning in large language models
2024-08-04 09:54:14
275
原创 Multi-Head Latent Attention: Boosting Inference Efficiency
Multi-Head Latent Attention: Boosting Inference Efficiency
2024-08-01 16:17:48
3161
原创 LLM Preference Alignment (PPO, DPO, SimPO, GRPO)
LLM Preference Alignment (PPO, DPO, SimPO)
2024-08-01 11:18:05
1419
原创 Introduction to Deep Reinforcement Learning (Policy Gradient, Actor-Critic, PPO)
Introduction to Deep Reinforcement Learning (Policy Gradient, Actor-Critic, PPO)
2024-07-30 10:48:51
1131
原创 Introduction to popular LLM components
Introduction to popular LLM components
2024-06-05 16:51:43
656
原创 [ICLR 2025] SpinQuant: LLM Quantization with Learned Rotations
[Arxiv 2024] SpinQuant: LLM Quantization with Learned Rotations
2024-05-29 18:48:03
931
原创 [NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
[NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
2024-05-29 15:47:34
1182
原创 一个小技巧轻松提升量化精度!IntactKV:保持关键词元无损的大语言模型量化方法
本文介绍我们针对大语言模型量化的工作 IntactKV,可以作为插件有效提升 GPTQ、AWQ、QuaRot 等现有主流量化方法效果。论文作者来自清华大学、华为诺亚、中科院自动化所和香港中文大学。论文代码已经开源,欢迎大家使用!
2024-05-29 15:07:29
1472
原创 [SC 2020] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
[SC 2020] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2024-05-10 14:52:27
813
原创 [Blog 2023] Flash-Decoding for long-context inference
[Blog 2023] Flash-Decoding for long-context inference
2024-05-07 21:08:31
594
原创 [ICLR 2024] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
[ICLR 2024] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
2024-05-07 18:32:36
849
软件加密解密.rar
2021-02-07
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人