What is the difference between pipeline() and AutoModel?

Pipeline is the high-level API — one line to run inference on any supported task. pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english') creates a ready-to-use sentiment analyzer. AutoModel is the lower-level API — you control tokenization, forward pass, and output processing. Required when pipeline doesn't exist for your task, when you need custom processing, when integrating into larger architectures, or when fine-tuning. For production inference where you need batching, custom preprocessing, or post-processing: AutoModel. For quick experimentation and standard tasks: pipeline.

How do I fine-tune a Hugging Face model?

Fine-tuning workflow: 1) Load pretrained model and tokenizer with AutoModelForTask and AutoTokenizer. 2) Prepare dataset — tokenize inputs, ensure labels match task format. 3) Define TrainingArguments (learning rate, batch size, epochs, output directory). 4) Create Trainer with model, args, train/eval datasets, and data collator. 5) Call trainer.train(). 6) Evaluate and save model. For memory-efficient fine-tuning of large models: use PEFT (Parameter-Efficient Fine-Tuning) with LoRA — trains only a small fraction of parameters, fits large models on consumer GPUs. Fine-tuning BERT-base on a custom classifier takes 10-30 minutes on a single GPU.

How do I use Hugging Face models for text generation?

Load a generative model (GPT-2, LLaMA, Mistral) with AutoModelForCausalLM. Tokenize input, call model.generate() with parameters: max_new_tokens, temperature (creativity), top_p (nucleus sampling), top_k (top-k sampling), repetition_penalty (avoid repetition), do_sample=True for non-deterministic output. For instruction-tuned models (chat): use the tokenizer's apply_chat_template() to format messages in the model's expected format. Hugging Face's text-generation-inference (TGI) server provides OpenAI-compatible API for production deployment of HF models.

What is PEFT and when should I use it?

PEFT (Parameter-Efficient Fine-Tuning) is a library for fine-tuning large models while training only a small fraction of parameters. Main method: LoRA (Low-Rank Adaptation) — adds small trainable matrices to existing weight matrices, trains these instead of the full model. A 7B parameter model fine-tuned with LoRA trains ~0.1% of parameters — fits on a single consumer GPU vs. requiring 8 A100s for full fine-tuning. Use PEFT when: the model is too large for full fine-tuning on your hardware; you want faster fine-tuning; you want multiple fine-tunes from one base model (different LoRA adapters). QLoRA (quantized LoRA) adds 4-bit quantization, enabling fine-tuning of 70B models on a single 48GB GPU.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

AI application development code in Python editor — hugging face transformers tutorial

Ai Development

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

⚡ Quick Answer

Hugging Face Transformers tutorial — load, fine-tune, and deploy pretrained models for text classification, generation, summarization, and translation with practical Python examples.

AiTechWorlds Team May 27, 2026 6 min read

#hugging-face-transformers #transformers-tutorial #huggingface-python #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

When I started working with language models, the barrier was enormous — implementing BERT from scratch, managing training loops, debugging tensor shapes. Hugging Face Transformers changed this entirely.

The library provides access to hundreds of thousands of pretrained models with a consistent API. Today, sentiment analysis on a custom dataset is 50 lines of Python. Fine-tuning BERT for text classification takes an afternoon. This guide covers the patterns you'll use in 90% of real projects.

Installation

pip install transformers datasets evaluate accelerate
pip install torch torchvision  # Or tensorflow
pip install peft  # For efficient fine-tuning

# Optional: GPU acceleration
pip install bitsandbytes  # 4-bit/8-bit quantization

The Pipeline API: One-Line Inference

from transformers import pipeline

# Sentiment analysis
sentiment = pipeline("sentiment-analysis")
result = sentiment("I absolutely loved this product!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# Batch processing
texts = [
    "This is great!",
    "Not what I expected.",
    "Completely useless.",
]
results = sentiment(texts)

# Named entity recognition
ner = pipeline("ner", grouped_entities=True)
entities = ner("Apple CEO Tim Cook announced new products in San Francisco.")
for e in entities:
    print(f"{e['entity_group']}: {e['word']} ({e['score']:.2f})")

# Question answering
qa = pipeline("question-answering")
context = "LangChain is a framework for building LLM applications. It was created in 2022."
answer = qa(question="When was LangChain created?", context=context)
print(f"Answer: {answer['answer']} (score: {answer['score']:.3f})")

# Text generation
generator = pipeline("text-generation", model="gpt2", max_length=100)
output = generator("The future of artificial intelligence is")
print(output[0]["generated_text"])

# Summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
long_text = """[Your long article text here...]"""
summary = summarizer(long_text, max_length=130, min_length=30)
print(summary[0]["summary_text"])

# Translation
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
translated = translator("Hello, how are you doing today?")
print(translated[0]["translation_text"])  # "Bonjour, comment allez-vous aujourd'hui?"

# Zero-shot classification (no training needed)
classifier = pipeline("zero-shot-classification")
result = classifier(
    "I need to cancel my subscription immediately.",
    candidate_labels=["account management", "billing", "technical support", "complaint"]
)
print(result["labels"][0])  # Most likely label

AutoModel and AutoTokenizer

For more control:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Tokenization
text = "This movie was absolutely fantastic!"
inputs = tokenizer(
    text,
    return_tensors="pt",       # PyTorch tensors
    truncation=True,            # Truncate to max_length
    padding=True,               # Pad to same length in batch
    max_length=512
)

print(f"Input IDs shape: {inputs['input_ids'].shape}")
print(f"Token IDs: {inputs['input_ids'][0][:10].tolist()}")

# Forward pass
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
probabilities = torch.softmax(logits, dim=1)
predicted_class = torch.argmax(logits, dim=1).item()

labels = model.config.id2label
print(f"Prediction: {labels[predicted_class]} ({probabilities[0][predicted_class]:.3f})")

# Batch processing efficiently
texts = ["I love this!", "I hate this.", "It's okay.", "Absolutely amazing!"]

inputs = tokenizer(
    texts,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)

predictions = torch.softmax(outputs.logits, dim=1)
for text, pred in zip(texts, predictions):
    label = labels[torch.argmax(pred).item()]
    confidence = torch.max(pred).item()
    print(f"[{label} {confidence:.2f}] {text}")

Text Generation with LLaMA

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",        # Distribute across available GPUs
)

# Format as instruction following (LLaMA 3.1 chat format)
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to find prime numbers up to n."}
]

# Apply chat template (handles special tokens automatically)
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.3,        # Lower = more deterministic for code
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
    repetition_penalty=1.1   # Avoid repetitive output
)

# Decode only the generated tokens (not the prompt)
new_tokens = outputs[0][inputs['input_ids'].shape[-1]:]
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
print(response)

Fine-Tuning for Classification

from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    TrainingArguments, Trainer, DataCollatorWithPadding
)
from datasets import Dataset
import evaluate
import numpy as np

# Prepare your data
train_data = {
    "text": ["Great product!", "Terrible experience", "Works as expected", "Love it!", "Waste of money"],
    "label": [1, 0, 1, 1, 0]  # 1 = positive, 0 = negative
}
eval_data = {
    "text": ["Really enjoyed it", "Not satisfied"],
    "label": [1, 0]
}

train_dataset = Dataset.from_dict(train_data)
eval_dataset = Dataset.from_dict(eval_data)

# Load model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=128
    )

train_tokenized = train_dataset.map(tokenize_function, batched=True)
eval_tokenized = eval_dataset.map(tokenize_function, batched=True)

# Data collator (handles padding)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Metrics
accuracy = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=100,
    weight_decay=0.01,
    learning_rate=2e-5,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_dir="./logs",
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=eval_tokenized,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()
trainer.save_model("./my-sentiment-model")

# Use fine-tuned model
fine_tuned = pipeline("sentiment-analysis", model="./my-sentiment-model")
print(fine_tuned("This is wonderful!"))

Efficient Fine-Tuning with PEFT/LoRA

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TrainingArguments, Trainer

# Load a larger model that would normally be too big to fine-tune
model_name = "google/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    r=16,                    # LoRA rank (lower = fewer parameters)
    lora_alpha=32,           # Scaling factor
    lora_dropout=0.1,
    target_modules=["q", "v"]  # Which weight matrices to add LoRA to
)

# Wrap model with LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 2,359,296 || all params: 783,150,080 (0.30%)
# Only 0.3% of parameters are trained!

# Now train as normal — much smaller memory footprint
# ... (rest of training is identical to standard Trainer usage)

# Save LoRA adapter only (much smaller than full model)
model.save_pretrained("./lora-adapter")
# Later: load base model + LoRA adapter together

Pushing Models to Hugging Face Hub

from huggingface_hub import HfApi
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Login (get token from huggingface.co/settings/tokens)
from huggingface_hub import notebook_login
notebook_login()  # Or: huggingface-cli login in terminal

# Push model to Hub
model.push_to_hub("your-username/my-sentiment-model")
tokenizer.push_to_hub("your-username/my-sentiment-model")

# Pull your model anywhere
model = AutoModelForSequenceClassification.from_pretrained("your-username/my-sentiment-model")

Conclusion

The Hugging Face ecosystem — Transformers, Datasets, PEFT, Evaluate, Hub — forms a complete ML platform. The pipeline API makes inference trivial; AutoModel gives you full control; Trainer handles the fine-tuning boilerplate; PEFT enables large model fine-tuning on consumer hardware.

The pattern that works in practice: start with a pretrained model from the Hub, fine-tune with LoRA on your domain data, evaluate with Evaluate metrics, and deploy via the Transformers pipeline or TGI server.

For using Hugging Face models in RAG pipelines, see our RAG system tutorial. For understanding the transformer architecture these models are built on, see our transformer architecture guide.

Frequently Asked Questions

Transformers is an open-source Python library providing access to 500,000+ pretrained models for NLP, computer vision, audio, and multimodal tasks. It abstracts the complexity of running BERT, GPT, LLaMA, T5, and hundreds of other architectures behind a consistent API. Core abstractions: Pipeline (simplest — one-line inference), AutoModel/AutoTokenizer (flexible loading), Trainer (fine-tuning). The Hugging Face Hub is the model repository — you reference models by 'organization/model-name' and the library downloads them automatically. Essential for any Python ML engineer working with language models.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI application development code in Python editor — ai api cost management

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI application development code in Python editor — build an ai chatbot with python build ai chatbot python

AI Learning

🔥 Trending

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

May 27, 2026 7 min read

AI application development code in Python editor — build a personal ai assistant build personal ai assistant

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI application development code in Python editor — crewai tutorial

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

Go deeper on this topic

NotesPrompt Engineering Cheat Sheet NotesLLM Core Concepts Explained NotesChatGPT Tips & Tricks Cheat Sheet NotesAI Agent Development Notes NotesTransformer Architecture Cheat Sheet NotesPrompt Engineering vs Fine-Tuning vs RLHF

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Ai Development

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

⚡ Quick Answer

Hugging Face Transformers tutorial — load, fine-tune, and deploy pretrained models for text classification, generation, summarization, and translation with practical Python examples.

AiTechWorlds Team May 27, 2026 6 min read

#hugging-face-transformers #transformers-tutorial #huggingface-python #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

Installation

pip install transformers datasets evaluate accelerate
pip install torch torchvision  # Or tensorflow
pip install peft  # For efficient fine-tuning

# Optional: GPU acceleration
pip install bitsandbytes  # 4-bit/8-bit quantization

The Pipeline API: One-Line Inference

from transformers import pipeline

# Sentiment analysis
sentiment = pipeline("sentiment-analysis")
result = sentiment("I absolutely loved this product!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# Batch processing
texts = [
    "This is great!",
    "Not what I expected.",
    "Completely useless.",
]
results = sentiment(texts)

# Named entity recognition
ner = pipeline("ner", grouped_entities=True)
entities = ner("Apple CEO Tim Cook announced new products in San Francisco.")
for e in entities:
    print(f"{e['entity_group']}: {e['word']} ({e['score']:.2f})")

# Question answering
qa = pipeline("question-answering")
context = "LangChain is a framework for building LLM applications. It was created in 2022."
answer = qa(question="When was LangChain created?", context=context)
print(f"Answer: {answer['answer']} (score: {answer['score']:.3f})")

# Text generation
generator = pipeline("text-generation", model="gpt2", max_length=100)
output = generator("The future of artificial intelligence is")
print(output[0]["generated_text"])

# Summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
long_text = """[Your long article text here...]"""
summary = summarizer(long_text, max_length=130, min_length=30)
print(summary[0]["summary_text"])

# Translation
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
translated = translator("Hello, how are you doing today?")
print(translated[0]["translation_text"])  # "Bonjour, comment allez-vous aujourd'hui?"

# Zero-shot classification (no training needed)
classifier = pipeline("zero-shot-classification")
result = classifier(
    "I need to cancel my subscription immediately.",
    candidate_labels=["account management", "billing", "technical support", "complaint"]
)
print(result["labels"][0])  # Most likely label

AutoModel and AutoTokenizer

For more control:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Tokenization
text = "This movie was absolutely fantastic!"
inputs = tokenizer(
    text,
    return_tensors="pt",       # PyTorch tensors
    truncation=True,            # Truncate to max_length
    padding=True,               # Pad to same length in batch
    max_length=512
)

print(f"Input IDs shape: {inputs['input_ids'].shape}")
print(f"Token IDs: {inputs['input_ids'][0][:10].tolist()}")

# Forward pass
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
probabilities = torch.softmax(logits, dim=1)
predicted_class = torch.argmax(logits, dim=1).item()

labels = model.config.id2label
print(f"Prediction: {labels[predicted_class]} ({probabilities[0][predicted_class]:.3f})")

# Batch processing efficiently
texts = ["I love this!", "I hate this.", "It's okay.", "Absolutely amazing!"]

inputs = tokenizer(
    texts,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)

predictions = torch.softmax(outputs.logits, dim=1)
for text, pred in zip(texts, predictions):
    label = labels[torch.argmax(pred).item()]
    confidence = torch.max(pred).item()
    print(f"[{label} {confidence:.2f}] {text}")

Text Generation with LLaMA

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",        # Distribute across available GPUs
)

# Format as instruction following (LLaMA 3.1 chat format)
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to find prime numbers up to n."}
]

# Apply chat template (handles special tokens automatically)
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.3,        # Lower = more deterministic for code
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
    repetition_penalty=1.1   # Avoid repetitive output
)

# Decode only the generated tokens (not the prompt)
new_tokens = outputs[0][inputs['input_ids'].shape[-1]:]
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
print(response)

Fine-Tuning for Classification

from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    TrainingArguments, Trainer, DataCollatorWithPadding
)
from datasets import Dataset
import evaluate
import numpy as np

# Prepare your data
train_data = {
    "text": ["Great product!", "Terrible experience", "Works as expected", "Love it!", "Waste of money"],
    "label": [1, 0, 1, 1, 0]  # 1 = positive, 0 = negative
}
eval_data = {
    "text": ["Really enjoyed it", "Not satisfied"],
    "label": [1, 0]
}

train_dataset = Dataset.from_dict(train_data)
eval_dataset = Dataset.from_dict(eval_data)

# Load model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=128
    )

train_tokenized = train_dataset.map(tokenize_function, batched=True)
eval_tokenized = eval_dataset.map(tokenize_function, batched=True)

# Data collator (handles padding)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Metrics
accuracy = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=100,
    weight_decay=0.01,
    learning_rate=2e-5,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_dir="./logs",
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=eval_tokenized,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()
trainer.save_model("./my-sentiment-model")

# Use fine-tuned model
fine_tuned = pipeline("sentiment-analysis", model="./my-sentiment-model")
print(fine_tuned("This is wonderful!"))

Efficient Fine-Tuning with PEFT/LoRA

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TrainingArguments, Trainer

# Load a larger model that would normally be too big to fine-tune
model_name = "google/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    r=16,                    # LoRA rank (lower = fewer parameters)
    lora_alpha=32,           # Scaling factor
    lora_dropout=0.1,
    target_modules=["q", "v"]  # Which weight matrices to add LoRA to
)

# Wrap model with LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 2,359,296 || all params: 783,150,080 (0.30%)
# Only 0.3% of parameters are trained!

# Now train as normal — much smaller memory footprint
# ... (rest of training is identical to standard Trainer usage)

# Save LoRA adapter only (much smaller than full model)
model.save_pretrained("./lora-adapter")
# Later: load base model + LoRA adapter together

Pushing Models to Hugging Face Hub

from huggingface_hub import HfApi
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Login (get token from huggingface.co/settings/tokens)
from huggingface_hub import notebook_login
notebook_login()  # Or: huggingface-cli login in terminal

# Push model to Hub
model.push_to_hub("your-username/my-sentiment-model")
tokenizer.push_to_hub("your-username/my-sentiment-model")

# Pull your model anywhere
model = AutoModelForSequenceClassification.from_pretrained("your-username/my-sentiment-model")

Conclusion

For using Hugging Face models in RAG pipelines, see our RAG system tutorial. For understanding the transformer architecture these models are built on, see our transformer architecture guide.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI Learning

🔥 Trending

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

May 27, 2026 7 min read

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

Installation

The Pipeline API: One-Line Inference

AutoModel and AutoTokenizer

Text Generation with LLaMA

Fine-Tuning for Classification

Efficient Fine-Tuning with PEFT/LoRA

Pushing Models to Hugging Face Hub

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Go deeper on this topic

Get Free AI Notes Daily

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models

Installation

The Pipeline API: One-Line Inference

AutoModel and AutoTokenizer

Text Generation with LLaMA

Fine-Tuning for Classification

Efficient Fine-Tuning with PEFT/LoRA

Pushing Models to Hugging Face Hub

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Go deeper on this topic

Get Free AI Notes Daily