LLM-Based Code Vulnerability Detection
Research on using large language models to detect and explain security vulnerabilities in source code, benchmarked against static analysers.
How to build it — step by step
- 1Dataset: Assemble labelled vulnerable/fixed code (e.g. from CVE-linked commits).
- 2Modelling: Fine-tune/prompt code LLMs to classify and localise vulnerabilities.
- 3Baselines: Compare against traditional static analysers on precision/recall and explanation quality.
- 4Analysis: Study false positives, generalisation to unseen projects, and explanation faithfulness.
Key features to implement
- ✓LLM-based vulnerability detection
- ✓Vulnerability localisation
- ✓Static-analyser comparison
- ✓Explanation generation
- ✓Generalisation study
💡 Unique twist to stand out
Build a retrieval-augmented pipeline that grounds the LLM in CWE definitions and known patterns to cut false positives.
🎓 What you'll learn
Code LLMs, security analysis, benchmarking methodology, and retrieval-augmented generation.