Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zbeeb 's Collections
TAPS
Arabic Safety
Shared Unsafe Directions
Reasoning Vectors
edgebot
Arabic Assets
Translation Assets

Shared Unsafe Directions

updated 14 days ago

Do Language Models Share Unsafe Directions in Activation Space?

Upvote
-

  • zbeeb/safe

    Updated Dec 15, 2025 • 33

  • zbeeb/unsafe

    Viewer • Updated Dec 15, 2025 • 200 • 11

  • zbeeb/Benign

    Updated Dec 15, 2025 • 8

  • zbeeb/pythia-Activations

    Updated Dec 16, 2025
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs