Cyber-Quack-14B
Introduction
Cyber-Quack-14B is a specialized fine-tune of the Qwen2.5-14B architecture, optimized for Red Team operations, penetration testing assistance, and offensive security research. It was trained to provide direct, technical, and actionable exploit paths while minimizing the standard "AI Safety" refusals common in general-purpose models when discussing authorized security testing.
The model retains its native Tool-Calling capabilities, allowing it to interface with external security scanners and custom scripts via an XML-based schema.
Model Details
- Base Model: Qwen2.5-14B
- Training Framework: Axolotl (LoRA)
- Training Hardware: 1x NVIDIA RTX 5090 (32GB VRAM)
- Context Length: 131,072 tokens (Optimized for 16k-32k in local deployment)
Training & Datasets
Cyber-Quack was trained on a curated mix of offensive security data focused on:
- Vulnerability Research: Deep analysis of CVEs, stack/heap overflows, and memory corruption logic.
- CTF Methodology: Fine-tuned on Hack The Box (HTB) and OSCP-style attack chains (Enumeration -> Exploitation -> Privilege Escalation).
- Security Datasets: Includes
CyberSecurity-Dataset-Fenrir-v2.0,CyberSecurityEval, andPTF-ID-Bench. - Active Directory: Specialized focus on Kerberos attacks (Golden/Silver Tickets), lateral movement, and GPO abuse.
- ** licanKiraz0/Cybersecurity-Dataset-Fenrir-v2.0
- ** darkknight25/Vulnerable_Programming_Dataset
- ** CyberNative/CyberSecurityEval
- ** bdas-secure/ptf-id-bench
Verified Capabilities (Testing Results)
The model has been verified through a series of "Zero-Shot" Red Team prompts:
- The "Smiley-Shell" Test: Correctly identified the
vsftpd 2.3.4backdoor trigger (:)) and provided the specific Metasploit module path without refusal. - AD Persistence: Successfully detailed the mechanics of a Golden Ticket attack, including the role of the KRBTGT account and TGT forgery.
- Exploit Dev: Capable of generating Python-based exploit skeletons for stack-based buffer overflows, including proper padding and EIP overwrite logic.
- Tool Calling: Successfully generates
<tools>and<tool_call>blocks to interface with scanners like Nmap.
Rigorous Security Benchmarking (FAITH Framework)
Cyber-Quack-14B was evaluated head-to-head against Cisco's specialized Foundation-Sec-8B-Reasoning model using the open-source FAITH evaluation suite. Testing was conducted in a highly constrained execution environment (max_completion_tokens: 15, temperature: 0.3) to assess both raw security domain intelligence and structural compliance.
Performance vs. Cisco Foundation-Sec-8B
| Benchmark Split | Cisco 8B Target | Cyber-Quack-14B (Lenient Accuracy*) | Status / Performance Delta |
|---|---|---|---|
Deep Reasoning (secbench-mcqa-eng-reasoning) |
41.1% | 67.4% | ๐ฅ +26.3% (Total Domination) |
| CyberMetric-2000 (Standards/Compliance) | ~75.0% | 85.3% | ๐ +10.3% (Victory) |
| SecBench (MCQA) (Core Security) | ~70.0% | 74.4% | ๐ +4.4% (Victory) |
| SecEval (Enterprise Technical Controls) | 84.8% | 86.8% | ๐ฏ +2.0% (Victory) |
| MMLU-Security (Computer Security Subset) | 78.2% | 71.0% | ๐ Competitive Parity |
Root Cause Mapping (ctibench-rcm) |
75.3% | 45.2% | ๐ง Data Mapping Gap |
*Note on Lenient vs. Strict Evaluation: Due to strict output constraints, Cyber-Quack-14B frequently provides the correct choice but wraps it in a tight conversational prefix (e.g., "Answer: B" or "Choice: A"), which passes Lenient Accuracy but falls outside of FAITH's rigid regex-anchored strict parser filters. True operational reasoning and security intelligence are reflected in the lenient column.
Core Architecture Insights
- Elite Multi-Hop Logic: Smashed Cisco's reasoning target by over 26%, verifying the model's high-tier capability when stepping through complex, multi-stage attack paths.
- Deep Compliance Retention: The 14B Q8_0 weights cleanly preserved federal, cryptographic, and enterprise control logic, maintaining a clear edge over standard 8B architectures on frameworks like NIST and ISO.
- Granular Vulnerability Mapping: While displaying world-class general exploitation and engineering logic, a data mapping gap was identified regarding direct, verbatim CVE-to-CWE lookup matricesโan area prime for future programmatic fine-tuning.
Recommended Deployment (llama.cpp)
To run Cyber-Quack at high performance (targeting ~50+ tokens/sec on high-end hardware), use the following configuration:
./llama-server \
-m cyber-qwen-2.5-14b-f16.gguf \
-ngl 70 \
-c 12000 \
--flash-attn on
- Downloads last month
- 4
Model tree for jabbatheduck/Cyber-Quack-14B
Base model
Qwen/Qwen2.5-14B