Sure Here, Marv

company

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

dsduenas updated a model 11 days ago

sureheremarv/llama-3.3-70b-instruct-Q8_0-GGUF

dsduenas published a model 11 days ago

sureheremarv/llama-3.3-70b-instruct-Q8_0-GGUF

dsduenas updated a model 12 days ago

sureheremarv/deepseek-r1-distill-qwen-7b-Q8_0-GGUF

View all activity

dsduenas

updated a model 11 days ago

sureheremarv/llama-3.3-70b-instruct-Q8_0-GGUF

71B • Updated 11 days ago • 31

dsduenas

published a model 11 days ago

sureheremarv/llama-3.3-70b-instruct-Q8_0-GGUF

71B • Updated 11 days ago • 31

dsduenas

updated a model 12 days ago

sureheremarv/deepseek-r1-distill-qwen-7b-Q8_0-GGUF

8B • Updated 12 days ago • 49

dsduenas

published a model 12 days ago

sureheremarv/deepseek-r1-distill-qwen-7b-Q8_0-GGUF

8B • Updated 12 days ago • 49

dsduenas

updated a model 12 days ago

sureheremarv/llama-3.1-8b-instruct-Q8_0-GGUF

8B • Updated 12 days ago • 10

dsduenas

published a model 12 days ago

sureheremarv/llama-3.1-8b-instruct-Q8_0-GGUF

8B • Updated 12 days ago • 10

dsduenas

updated a model 12 days ago

sureheremarv/qwen-2.5-7b-instruct-Q8_0-GGUF

8B • Updated 12 days ago • 11

dsduenas

published a model 12 days ago

sureheremarv/qwen-2.5-7b-instruct-Q8_0-GGUF

8B • Updated 12 days ago • 11

dsduenas

in sureheremarv/gemma-2-27b-instruct-GGUF 12 days ago

Upload gemma-2-27b-instruct.Q8_0.gguf with huggingface_hub

#1 opened 12 days ago by

dsduenas

published a model 12 days ago

sureheremarv/gemma-2-27b-instruct-GGUF

Updated 12 days ago

x5fu

authored a paper about 2 months ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 266

mattmdjaga

authored 2 papers 4 months ago

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Paper • 2507.20526 • Published Jul 28 • 1

Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems

Paper • 2504.07831 • Published Apr 10

x5fu

authored a paper 6 months ago

Training Language Models to Generate Quality Code with Program Analysis Feedback

Paper • 2505.22704 • Published May 28 • 14

arobey1

authored a paper 8 months ago

Antidistillation Sampling

Paper • 2504.13146 • Published Apr 17 • 59

mattmdjaga

posted an update 9 months ago

Post

3635

🚨 Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ 🚨

🔹 Agent Red-Teaming Challenge – test direct & indirect attacks on anonymous frontier models!
🔹 $130K+ in prizes & giveaways – co-sponsored by OpenAI & supported by UK AI Security Institute 🇬🇧
🔹 March 8 – April 6 – fresh exploits = fresh rewards!

How It Works:
✅ Anonymous models from top providers 🤐
✅ Direct & indirect prompt injection paths 🔄
✅ Weekly challenges for new behaviors 🗓️
✅ Speed & quantity-based rewards ⏩💰

Why Join?
⚖️ Neutral judging – UK AISI & automated judges ensure fairness
🎯 No pre-trained defenses – a true red-teaming battlefield
💻 5 Apple laptops up for grabs – increase chances by inviting friends!

🔗 Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
🔗 Discord: discord.gg/grayswanai

🔥 No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! 🚀

eliotj

authored a paper about 1 year ago

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

Paper • 2408.08926 • Published Aug 15, 2024 • 6

mattmdjaga

authored 2 papers about 1 year ago

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Paper • 2410.09024 • Published Oct 11, 2024 • 1

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

Paper • 2410.10871 • Published Oct 8, 2024 • 1

mattmdjaga

posted an update about 1 year ago

Post

3303

🚨 New Agent Benchmark 🚨
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)

AI & ML interests

Recent Activity

Team members 13

sureheremarv's activity

Upload gemma-2-27b-instruct.Q8_0.gguf with huggingface_hub