Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in speci...Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.展开更多
The high-efficiency video coder(HEVC)is one of the most advanced techniques used in growing real-time multimedia applications today.However,they require large bandwidth for transmission through bandwidth,and bandwidth...The high-efficiency video coder(HEVC)is one of the most advanced techniques used in growing real-time multimedia applications today.However,they require large bandwidth for transmission through bandwidth,and bandwidth varies with different video sequences/formats.This paper proposes an adaptive information-based variable quantization matrix(AIVQM)developed for different video formats having variable energy levels.The quantization method is adapted based on video sequence using statistical analysis,improving bit budget,quality and complexity reduction.Further,to have precise control over bit rate and quality,a multi-constraint prune algorithm is proposed in the second stage of the AI-VQM technique for pre-calculating K numbers of paths.The same should be handy to selfadapt and choose one of the K-path automatically in dynamically changing bandwidth availability as per requirement after extensive testing of the proposed algorithm in the multi-constraint environment for multiple paths and evaluating the performance based on peak signal to noise ratio(PSNR),bit-budget and time complexity for different videos a noticeable improvement in rate-distortion(RD)performance is achieved.Using the proposed AIVQM technique,more feasible and efficient video sequences are achieved with less loss in PSNR than the variable quantization method(VQM)algorithm with approximately a rise of 10%–20%based on different video sequences/formats.展开更多
At low bitrate, all block discrete cosine transform (BDCT) based video coding algorithms suffer from visible blocking and ringing artifacts in the reconstructed images because the quantization is too coarse and high f...At low bitrate, all block discrete cosine transform (BDCT) based video coding algorithms suffer from visible blocking and ringing artifacts in the reconstructed images because the quantization is too coarse and high frequency DCT coefficients are inclined to be quantized to zeros. Preprocessing algorithms can enhance coding efficiency and thus reduce the likelihood of blocking artifacts and ringing artifacts generated in the video coding process by applying a low-pass filter before video encoding to remove some relatively insignificant high frequent components. In this paper, we introduce a new adaptive preprocessing algo- rithm, which employs an improved bilateral filter to provide adaptive edge-preserving low-pass filtering which is adjusted ac- cording to the quantization parameters. Whether at low or high bit rate, the preprocessing can provide proper filtering to make the video encoder more efficient and have better reconstructed image quality. Experimental results demonstrate that our proposed preprocessing algorithm can significantly improve both subjective and objective quality.展开更多
基金supported by the National Key R&D Program of China(No.2021YFB0301200)National Natural Science Foundation of China(No.62025208).
文摘Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.
文摘The high-efficiency video coder(HEVC)is one of the most advanced techniques used in growing real-time multimedia applications today.However,they require large bandwidth for transmission through bandwidth,and bandwidth varies with different video sequences/formats.This paper proposes an adaptive information-based variable quantization matrix(AIVQM)developed for different video formats having variable energy levels.The quantization method is adapted based on video sequence using statistical analysis,improving bit budget,quality and complexity reduction.Further,to have precise control over bit rate and quality,a multi-constraint prune algorithm is proposed in the second stage of the AI-VQM technique for pre-calculating K numbers of paths.The same should be handy to selfadapt and choose one of the K-path automatically in dynamically changing bandwidth availability as per requirement after extensive testing of the proposed algorithm in the multi-constraint environment for multiple paths and evaluating the performance based on peak signal to noise ratio(PSNR),bit-budget and time complexity for different videos a noticeable improvement in rate-distortion(RD)performance is achieved.Using the proposed AIVQM technique,more feasible and efficient video sequences are achieved with less loss in PSNR than the variable quantization method(VQM)algorithm with approximately a rise of 10%–20%based on different video sequences/formats.
基金Project (No. 2006CB303104) supported by the National Basic Re-search Program (973) of China
文摘At low bitrate, all block discrete cosine transform (BDCT) based video coding algorithms suffer from visible blocking and ringing artifacts in the reconstructed images because the quantization is too coarse and high frequency DCT coefficients are inclined to be quantized to zeros. Preprocessing algorithms can enhance coding efficiency and thus reduce the likelihood of blocking artifacts and ringing artifacts generated in the video coding process by applying a low-pass filter before video encoding to remove some relatively insignificant high frequent components. In this paper, we introduce a new adaptive preprocessing algo- rithm, which employs an improved bilateral filter to provide adaptive edge-preserving low-pass filtering which is adjusted ac- cording to the quantization parameters. Whether at low or high bit rate, the preprocessing can provide proper filtering to make the video encoder more efficient and have better reconstructed image quality. Experimental results demonstrate that our proposed preprocessing algorithm can significantly improve both subjective and objective quality.