arxiv:2508.02921
Shane Caldwell PRO
SJCaldwell
ยท
AI & ML interests
cybersecurity + ml
Recent Activity
authored a paper 1 day ago
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models authored a paper 1 day ago
PentestJudge: Judging Agent Behavior Against Operational Requirements liked a dataset 4 months ago
allenai/c4