×

Accepted papers

poster
What is Lost in Knowledge Distillation?
poster
NLLB-CLIP - train performant multilingual image retrieval model on a budget
poster
DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
poster
LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
poster
DYAD: A Descriptive Yet Abjuring Density Efficient Approximation to Linear Neural Network Layers
poster
Transfer Learning for Structured Pruning under Limited Task Data
poster
Embedding User-Generated Content using Structural Supervision and Generative Models
poster
Parameter Efficient Finetuning for Reducing Activation Density in Transformers
poster
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
poster
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL
poster
Structure Discovery in Prompted Weak Supervision
poster
SPEED: Speculative Pipelined Execution for Efficient Decoding
poster
Efficiently Adapting Pretrained Language Models to New Languages
poster
MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning
poster
Efficient LLM Inference on CPUs
poster
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
poster
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
poster
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
poster
On the Zero-Shot Generalization of Machine-Generated Text Detectors
poster
Intra-Class Similarity-Guided Feature Distillation
poster
Less is More! A slim architecture, optimal for language tasks
poster
Comprehensive Bench-marking of Entropy and Margin Based Scoring Metrics for Data Selection
poster
Lightweight Retrieval Tuning for Black-Box Language Models
poster
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
poster
Investigating the Impact of Compression on Parametric Knowledge in Language Models
poster
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
poster
Exploiting Transformer Activation Sparsity with Dynamic Inference
poster
Retrieval Augmented Generation for Dialog Modeling
poster
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data
poster
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
poster
Ensemble of low-rank adapters for large language model fine-tuning
poster
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
poster
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
poster
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
poster
Sparse Fine-Tuning for Inference Acceleration of Large Language Models
poster
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
poster
LoDA: Low-Dimensional Adaptation of Large Language Models
poster
MUX-PLMs: Data Multiplexing for High-throughput Language Models
poster
Towards End-to-end 4-Bit Inference on Generative Large Language Models
poster
SortedNet, a Place for Every Network and Every Network in its Place
poster
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
poster
KronA: Parameter Efficient Tuning with Kronecker Adapter
poster
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
poster
SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling
poster
MatFormer: Nested Transformer for Elastic Inference
poster
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
poster
Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer
poster
Efficient Online Data Mixing For Language Model Pre-Training
poster
Student as an Inherent Denoiser of Noisy Teacher
poster
UT5: Pretraining Non autoregressive T5 with unrolled denoising
poster
LatticeGen: A Cooperative Framework Which Hides Generated Text in A Lattice For Privacy-Aware Generation on Cloud
poster
Measuring and Improving Recall in Convolutional Language Models
poster
Multimodal Multi-Hop Question Answering Through a Conversation Between Tools and Efficiently Finetuned Large Language Models
poster
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
poster
Continual Pre-Training of Large Language Models: How to (re)warm your model?
poster
Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion
poster
Improving Linear Attention via Softmax Mimicry
poster
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
poster
DiffTune: A Diffusion-Based Approach to Diverse Instruction-Tuning Data Generation
poster
PaSS: Parallel Speculative Sampling
poster
QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning
poster
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
poster
Group Preference Optimization: Few-Shot Alignment of Large Language Models
poster
Fast-ELECTRA for Efficient Pre-training
poster
Parameter-Efficient Fine-tuning of InstructBLIP for Visual Reasoning Tasks
poster
Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models
poster
A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
poster
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
poster
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
poster
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM
poster
Multimodal Data and Resource Efficient Device-directed Speech Detection with Large Foundation Models
poster
Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition
poster
ASR Data Selection from Multiple Sources: A Practical Approach on Performance Scaling
poster
Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures
poster
Recursive Joint Cross-Attention for Audio-Visual Speaker Verification
poster
Efficient infusion of self-supervised representations in Automatic Speech Recognition
poster
An efficient clustering algorithm for self-supervised speaker recognition
poster
HateXplain Space Model: Fusing Robustness with Explainability in Hate Speech Analysis
poster
Revealing the Bias in Large Language Models via Reward Based Questioning
poster
Evaluating task specific finetuning for protein language models