--- license: apache-2.0 base_model: ServiceNow-AI/Apriel-1.5-15b-Thinker tags: - cybersecurity - network-security - intrusion-detection - raft - dora - vision-language-model - llava datasets: - NSL-KDD language: - en library_name: transformers pipeline_tag: image-text-to-text --- # Apriel-1.5-15b-Thinker-CYBERSEC-MERGED This is a **fully merged production model** based on [ServiceNow-AI/Apriel-1.5-15b-Thinker](https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker) fine-tuned for cybersecurity network traffic analysis and intrusion detection. ## Model Description **Developed by:** Sainikhil Juluri **Model type:** Vision-Language Model (LLaVA-based, 15B parameters) **Language(s):** English **License:** Apache 2.0 **Finetuned from:** ServiceNow-AI/Apriel-1.5-15b-Thinker This model combines the power of a large vision-language model with specialized cybersecurity training using DoRA (Weight-Decomposed Low-Rank Adaptation) and RAFT (Retrieval Augmented Fine-Tuning) methodologies. ### Model Type: Full Merged Model ✅ This is a **complete, production-ready model** with adapters fully merged into the base model. It can be used directly via Inference Endpoints or loaded with standard transformers code. ## Training Details ### Training Data **Dataset:** NSL-KDD (Network Security Laboratory - Knowledge Discovery in Databases) - **Training examples:** 49,997 - **Data distribution:** - Normal Traffic: 53.5% - DoS (Denial of Service): 36.5% - Probe: 9.3% - R2L (Remote to Local): 0.8% - U2R (User to Root): 0.04% ### Training Strategy: RAFT (Retrieval Augmented Fine-Tuning) The model was trained using RAFT methodology with three modes: - **Oracle Mode (19.9%):** Learning from relevant documents - **Distractor Mode (60.4%):** Learning to identify irrelevant context - **No Context (19.8%):** Learning to generate without external context This approach teaches the model to: 1. Generate responses with proper citations 2. Distinguish relevant from irrelevant information 3. Function effectively in RAG (Retrieval Augmented Generation) systems ### Training Configuration **Fine-tuning Method:** DoRA (Weight-Decomposed Low-Rank Adaptation) - **Total Parameters:** 15B (8,416,026,624) - **Trainable Parameters:** 275.8M (3.28% of total) - **LoRA Rank:** 64 - **LoRA Alpha:** 128 - **Target Modules:** 7 attention layers - **Vision Components:** Frozen (217M parameters) **Training Hyperparameters:** - **Epochs:** 1 - **Training Steps:** 3,125 - **Learning Rate:** 2e-5 - **Batch Size (device):** 4 - **Gradient Accumulation:** 4 - **Effective Batch Size:** 16 - **Optimizer:** AdamW with patched implementation - **Precision:** 4-bit quantization (QLoRA) **Training Performance:** - **Initial Loss:** 3.14 - **Final Loss:** 0.038-0.092 - **Convergence:** Excellent (97% loss reduction) - **Training Duration:** 12 hours - **Hardware:** NVIDIA A100 GPU (40GB) - **Platform:** Google Colab Pro ## Intended Uses ### Direct Use This model is designed for: - **Network traffic analysis and intrusion detection** - **Cybersecurity threat classification** - **Security incident response support** - **Educational purposes in cybersecurity training** - **RAG-based cybersecurity question answering systems** ### Attack Detection Capabilities The model can identify and analyze: - **DoS/DDoS attacks:** Denial of Service and Distributed Denial of Service - **Probe attacks:** Port scanning, vulnerability scanning - **R2L attacks:** Remote to Local unauthorized access attempts - **U2R attacks:** User to Root privilege escalation - **Normal traffic:** Baseline network behavior ### Out-of-Scope Use ⚠️ This model should **NOT** be used: - As the sole authority for security decisions without human oversight - For real-time critical infrastructure protection without validation - On network architectures or attack vectors not represented in NSL-KDD - For production security without thorough testing and validation ## Usage ### Basic Usage ```python from transformers import AutoModelForVision2Seq, AutoProcessor import torch # Load model and processor model = AutoModelForVision2Seq.from_pretrained( "sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto" ) processor = AutoProcessor.from_pretrained( "sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED", trust_remote_code=True ) # Prepare conversation messages = [ { "role": "system", "content": "You are a cybersecurity expert specializing in network intrusion detection and analysis." }, { "role": "user", "content": "Based on the provided network traffic analysis documents, identify potential security threats in this connection pattern." } ] # Generate response text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=text, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, do_sample=True, top_p=0.95 ) response = processor.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### RAG Integration This model is optimized for RAG (Retrieval Augmented Generation) workflows: ```python # Example with document context documents = [ "Document 1: Network traffic shows 50 SYN packets per second...", "Document 2: Connection attempts from IP 192.168.1.100...", ] context = "\n\n".join([f"Document {i+1}: {doc}" for i, doc in enumerate(documents)]) messages = [ {"role": "system", "content": "You are a cybersecurity expert. Cite sources when analyzing."}, {"role": "user", "content": f"Context:\n{context}\n\nQuestion: What type of attack is this?"} ] # Model will generate response with citations ``` ## Limitations 1. **Dataset Specificity:** Trained on NSL-KDD patterns; may not generalize to all network architectures 2. **Text-Only Training:** Vision capabilities were frozen during fine-tuning 3. **Temporal Coverage:** Training data may not reflect the latest attack vectors 4. **Citation Dependency:** Trained for RAG workflows; works best with document context 5. **Language:** English only; multilingual capabilities not validated ## Bias, Risks, and Limitations ### Known Biases - **Attack Type Imbalance:** Heavy bias toward DoS (36.5%) and Normal traffic (53.5%); limited exposure to U2R attacks (0.04%) - **Synthetic Data:** NSL-KDD is derived from older network patterns; may not reflect modern cloud/IoT environments ### Risks - **False Positives/Negatives:** Should not be sole arbiter of security decisions - **Adversarial Robustness:** Not explicitly trained against adversarial attacks - **Evolving Threats:** Requires continuous updating for new attack patterns ### Recommendations Users should: - ✅ Use as a decision support tool alongside human expertise - ✅ Validate outputs in production environments - ✅ Regularly update with new threat intelligence - ✅ Test thoroughly on their specific network architecture - ✅ Implement proper monitoring and feedback loops ## Evaluation ### Training Performance - **Loss Convergence:** 3.14 → 0.038-0.092 (97% reduction) - **Training Stability:** Consistent convergence across 3,125 steps - **Checkpoint Consistency:** Stable performance maintained throughout training ### Validation Approach The model uses RAFT methodology which inherently validates: - Ability to identify relevant vs. irrelevant documents - Citation accuracy and source attribution - Context-aware response generation ## Technical Specifications ### Model Architecture - **Base:** Apriel-1.5-15b-Thinker (LLaVA-based architecture) - **Vision Encoder:** Frozen during training - **Text Decoder:** Fine-tuned with DoRA adapters - **Precision:** BFloat16 - **Context Length:** 578 tokens (optimal for training data) ### Compute Infrastructure **Hardware:** - GPU: NVIDIA A100 (40GB VRAM) - Platform: Google Colab Pro - Training Time: 12 hours - Estimated Cost: ~$24 (A100 @ $1.95/hr) **Software:** - Framework: HuggingFace Transformers (v4.46.0) - PEFT: v0.17.0 - Training: TRL + bitsandbytes (4-bit quantization) - PyTorch: Latest stable ## Environmental Impact **Estimated Carbon Emissions:** - Training Duration: 12 hours on A100 GPU - Cloud Provider: Google Cloud Platform - Estimated emissions: ~5-6 kg CO2eq (based on average cloud GPU usage) Note: This is a conservative estimate. Actual emissions depend on datacenter location and energy sources. ## Citation If you use this model in your research or applications, please cite: ```bibtex @misc{apriel-cybersec-2024, author = {Juluri, Sainikhil}, title = {Apriel Cybersecurity Model: DoRA + RAFT Fine-tuned for Network Intrusion Detection}, year = {2024}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED}} } ``` ## Acknowledgments - **Base Model:** ServiceNow-AI for Apriel-1.5-15b-Thinker - **Dataset:** NSL-KDD (Canadian Institute for Cybersecurity) - **Methodology:** DoRA (Liu et al.) and RAFT (Zhang et al.) - **Training Platform:** Google Colab Pro ## Model Card Contact **Author:** Sainikhil Juluri **GitHub:** [Include if public] **Email:** [Include if public] **Project:** Cybersecurity AI System (College Project) For questions, issues, or collaboration opportunities, please open an issue on the model repository or contact via HuggingFace. --- **Last Updated:** November 2025 **Model Version:** 1.0 **Status:** Production Ready ✅