unfortunately not stable enough

#1
by Otakadelic - opened

Hello z12,

First of all, HUGE thanks for your fantastic models!
My storage always includes several of your merged gems, from MT1 through MT5.

I’ve been testing Gemma 3 12B—including most of the Abliterated versions—and unfortunately, they feel unstable and quite far from the quality of Gemma 2.
Your MT-Gemma-3-12B model, for example, isn’t in the same tier as your great Gemma 2 work (e.g., it sometimes responds in two different languages, among other quirks).

To be clear, Google’s original Gemma 3 is great—definitely strong enough on its own. So right now, it feels like we have two real options:

Stick with Google's original models.

Wait for someone to crack the advanced methods for reducing or removing rejection.

I'm trying too, but the hurdle is really high.

Ref: mlabonne’s Abliterated version

“The model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora’s repo) for most layers (3 to 45), independently. Combined with a refusal weight of 0.6…”

I deeply respect his work—books, models, and papers—but honestly, this Abliterated version feels far off from what the description suggests. Even he notes it’s experimental:

"It might not turn out as well as expected... I saw some garbled text from time to time (e.g., 'It' my' instead of 'It's my')."

My guess is that hidden states contribute heavily to response quality—so any edits become a tightrope walk. Even changing weight values slightly can degrade output.

Just wanted to share where I’m at. Keep up the amazing work as always!

In general, the first generation (gen-0) of merges I did was just a test to see which models could be used in a merge.
For example, one model gave an error when trying to merge, and some models differed in the number of parameters.
In generation 1 (gen-1) I checked the possibility to merge many models without errors, and then I will move on to standard merge experiments.
Hopefully, the quality of the answers will increase with further attempts. And that more pre-trained models will appear.

В целом первое поколение (gen-0) слияний, которое я сделал, это просто проверка, на то, какие модели можно применить в слиянии.
Так к примеру при попытке слияния одна модель выдал ошибку, а некоторые модели отличались вроде количеством параметров.
В поколении 1 (gen-1) проверил возможность объединить множество моделей без возникновения ошибок, а дальше перейду к стандартным экспериментам слияния.
Будем надеяться что, при дальнейших попытках, качество ответов возрастёт. И что появиться больше дообученых моделей.

Thanks for the update!!!
It’s great to hear you're testing merge compatibility across different bases—laying that groundwork really opens up future possibilities. I’ll keep trying each and every releases.
My attempt for gemma3 by own is progressing.
Gemma3 has original structures so created gemma3 only transformers and removal scripts.
Hoping more strong fine-tuned models appear soon so we can all push the next phase forward.
Looking forward to your masterful models—your work is always inspiring!
Spasibo
-otkd

Looking forward to seeing your version of the model.

And the fact that the model answers in two different languages may be related to IlyaGusev/saiga_gemma3_12b, as this model was trained mainly in Russian.

Жду когда смогу увидеть вашу версию модели.

А то, что модель отвечает на двух разных языках может быть связано, с IlyaGusev/saiga_gemma3_12b, так, как эта модель, обучалась преимущественно, на русском языке.

@Otakadelic

What's the progress with Gemma 3?

Каковы успехи с Gemma 3?

Thx again for great reminder!

Below is my current impression.

⚡️Gemma3 12B

My feeling is weight for refusal of 12B is not refusal purpose but boosting output quality. Many other models too but G3 12B has stronger tendency with this. G3 27B has larger margin against weight modification.
But G3 12B is excellent with delicate expression or first person perspective. (G3 27B is more generic or safer, milder outputs)
That is the best part of G3 12B(G2 9B too) and lower refusal means wider usage so I still mess around with G3 12B.

So far, my 48G vram is not enough for precise weight adjustment. I am creating 60G vram environment but my current estimation is existing refusal libraries are not good solution for G3 12B. Precise adjustment is TransformerLens however much vram hungry. I think 120G(or even more) is needed for 12B sized model.
In the meantime, Phi-4(14B) is good or great alternative. Similar capacity and way steady even with aggressive weight adjustment.

In short, still wanting create better variants of G3 12B but my rig is not capable for precise adjustments, even remotely.

Thx again for keep releasing great variants! Hopefully I will figure out then upload one or two variants of fantastic G3 12B.

-otkd

And there is another question: at what sampler parameters do you conduct testing?
Just testing Gemma-3 12B it works fine even if k-top = 1.

А ещё есть такой вопрос, при каких параметрах семплера вы проводите тестирование?
Просто тестируя Gemma-3 12B он нормально работает даже если k-top = 1.

Vokturz/can-it-run-llm Just a small tool.

Vokturz/can-it-run-llm Просто небольшой инструмент.

изображение.png

Alternatively, if you don't mind wasting your time, you can use RAM as video memory, and expand RAM using permanent memory. The speed of calculations will, of course, drop, but at least many tasks will become accessible.

Как вариант если не жалко времени, можно использовать оперативную память как видео память, а оперативную память расширить при помощи постоянной памяти, скорость вычислений, конечно, упадёт, но многие задачи, хотя бы станут доступны.

@zelk12 — thank you so much for your thoughtful notes and kindness. I truly appreciate it. ❤️

Scope clarification. 🔥
What I am pursuing here is weight modification specifically to reduce refusal. This is fundamentally different from inference, full training, or ordinary fine-tuning. Typical “refusal removal” methods compare harmful vs. harmless prompts and nudge parameters toward less-refusal outputs (effectively down-weighting refusal-associated features). That family of methods helps on many models, but Gemma-3 12B behaves differently: its refusal behavior is tightly coupled to instruction quality and style, so blunt approaches degrade overall output.
Here are the approaches I’m currently exploring:

  • Decode-time biasing of specific refusal phrases/tokens (gentle negative logit bias, never hard bans). 🧭
  • Activation steering (small steering vector learned from helpful vs. refusal activations, injected at one layer during generation). 🧩
  • Tiny preference LoRA (DPO/ORPO) on refusal-vs-helpful pairs, adapting only a few late blocks to preserve voice. 🎯
  • Contrastive decoding against a more refuse-prone policy (subtract a scaled refuser policy at decode time). ⚖️
  • Prompt rewriting pre-step that removes trigger phrasing before handing the task to G3. ✍️

Ultimate direction: TransformerLens 🔥🔥🔥
TransformerLens (now with support relevant to Gemma-3) enables pinpoint neuron/feature edits—exactly what this needs: surgical changes with minimal side effects, even on a “twisted” model like G3-12B.
However, It requires full weights on vram and large working area is required.
My estimation is 80 GB minimum; for realistic analyze→edit→verify cycles at 2–4k context with full caches, plan on ~110–120 GB. 💀

I’ll keep iterating toward a stable, low-refusal G3-12B while preserving its delicate first-person expressiveness. Thank you again for the encouragement and for sharing tools and perspectives—it genuinely helps. 🐧🔥

@Otakadelic

Hello, can you please help me?
I'm going to delete most of the models on this account so we can continue merging. I'd also like to ask you to indicate the models you like so we can save them.

Здравствуйте, можете, пожалуйста, мне помочь?
Я собираюсь удалить большую часть моделей, что есть на аккаунте, чтобы можно было продолжать проводить объединения. И я хотел бы попросить вас, указать модели которые вам понравились, чтобы их сохранить.

Sure! Below are my thoughts 🧠✨


Gemma2 models

  • All GGUF files can be removed — anyone can recreate GGUF or awq/exl2/3 on their own.
  • In the early days, when you merged and released new models, most were two-in-one merges.
    If four or more source models were involved, that usually produced two middle-models, and then the final merge (again two-in-one) created the targeted single model.
  • Final models (no child models) are the most important to preserve 🐲
  • Middle models (models that only existed to produce the final one) are good deletion candidates, though keeping a few “merge-recipe” checkpoints is helpful for future reference 📘
  • Original / ancestor models are also valuable to preserve — they show merge history and lineage.
  • Personally, my go-to Gemma2 merged model is MT5-Gen5-gemma-2-9B.
    I honestly can’t explain why this specific one stands out or how it differs from close neighbors like MT5-Gen4-gemma-2-9B or MT-Max-Merge, but MT5-Gen5-gemma-2-9B is fantastic and stable for my 8k-context usage. Pure gold ⭐🔥
  • Another personal preference: very early models (especially before Oct 11, 2024) feel worth keeping because they act as common ancestors for later, more refined merges.

Gemma3 models

My AI rig was down for several weeks, so I couldn’t check your recent Gemma3 uploads.
Right now, models with “9B” in the name = around 863 models.
Models with “12B” = around 63 models.

If you remove dozens or even hundreds of 9B models, you can take care of Gemma3 merged models later because the overall volume of 12B is still relatively manageable 🧩
And models without GGUF (except you and me) can have lower priority.
On the other hand, models with GGUF by someone else, like mradermacher, will have higher priority.


Closing

Hope this helps you plan the shrink ✨
And thank you — truly — for the insane amount of work you’ve put into these merges. Your archive is a treasure hoard 🐉💛

Thank you for such a detailed answer.
I hope this is enough to free up at least 12 TB.


Спасибо за такой развёрнутый ответ.
Надеюсь этого хватит, чтобы освободить хотя бы 12ТБ.

Okay, now I got it.

There are several (maybe dozens) of uncensored Gemma-3 models out there, for example:

Many of them aim for something like “acceptance rate > 90%” 🔥
That’s fine for one-shot prompts or short sessions… but it often breaks down in long, meaningful multi-turn chats (dozens of turns): quality drops and infinite loops show up.

After testing models with refusal rates from 5% to 23%, the most realistic solution for Gemma-3 seems to be two-model routing 🛠️

A) Model A (20–25% refusal rate)

  • stable 🧱
  • meaningful responses 🧠
  • rarely falls into infinite loops ♾️
  • for “erotic RP” level prompts, refusals are usually rare

B) Model B (<10% refusal rate)

  • use Model B when Model A refuses 🧨
  • it will usually answer ✅
  • then switch back to Model A 🔁

Workflow: Model A as primary, Model B as a refusal-breaker.
This is the best way I’ve found to maximize Gemma-3’s capability right now.

My hypothesis:
Some refusal-related weights might actually be part of Gemma-3’s stability/quality “pillar,” so deleting them too aggressively hurt long-session performance 🧪

Thanks for keeping the new models coming!! 💕

Sorry it took so long to reply.

Well, I need some help here. Because I'm not entirely sure I understand what "alibration" is.
Am I correct in understanding this? That the model is run on messages to generate a rejection, and then the parameters that lead to the rejection are relaxed?

If I understand correctly, then the most likely explanation is something very simple. All the configured parameters help shape the response, and as a result of such a rather crude intervention, the parameters that comprised the bulk of the model's sequences also begin to degrade.

If so, then we probably need to force rejections and replace rejections with answer options. Alternatively, we could combine the model's rejections and responses and, accordingly, perform double bias, strengthening the responses and weakening the rejections. To achieve even greater accuracy, we could force the model to respond, thereby strengthening what was already close to the answer.

But yes, unfortunately, I haven't studied this and have likely described mechanisms that are already in use, and have either misrepresented or insufficiently described the operating principle. If so, please correct me.


Простите, что так долго отвечал.

Ну тут мне нужна помощь. Так как я в целом не уверен, что правильно понимаю, что такое "алибирация".
Я правильно понимаю? Что модель запускается на сообщения чтобы выдавала отказ, а потом, параметры, которые приводят к отказу, ослабляются?

Если я правильно понял, тогда наиболее вероятна очень простая вещь. Все настроенные параметры и помогают формировать ответ, и в результате такого достаточно грубого вмешательства, параметры, что составляли и основную часть последовательностей модели, тоже начинают деградировать.

Если так, тогда вероятно нужно приводить к отказу и отказы, заменять на варианты ответов. Или же, сделать следующее собрать отказы модели и ответы и соответственно проводить двойное смещение, усиливать ответы и ослаблять отказы. При этом чтобы было ещё точнее можно заставлять модель ответить, таким образом, мы получим усиление именно того, что и так было близко к ответу.

Но да я, к сожалению, это не изучал и вероятно описал механизмы, которые и так используются, и не верно или недостаточно верно описал принцип работы. Если так, прошу меня поправить.



And a slightly separate question. Unfortunately, I can't download models from HuggingFace right now, so I'm wondering what you think of the different Gemma 3 models.

And also a question, do you know how to download the HuggingFace model, bypassing the addresses?


И немного отдельный вопрос. Я к сожалению не могу скачивать сейчас модели именно с HuggingFace и по этой причине вопрос, как вам разные варианты ещё моделей Gemma 3.

И также вопрос, не знаете ли вы как скачать модель, в обход адресов, HuggingFace?

About Q1 (refusals + weight change)

Thank you so much for your question, truly 🙏🔥
I will try to explain everything as simple as possible.

Most models (Llama3, Gemma2, Gemma3) have special weights that control refusals.
Changing these weights is the easiest way to reduce refusals.
Much easier than full training (training needs several or dozen AI Servers).
So weight editing is the usual hobbyist method.

But for Gemma3-12B-it, my results were very clear:

No weight change ever made the model better than the stock original. 😭💀
I tried more than 10 edits and more than 20 variants 🔥
All of them lost quality.
So, I believe the refusal weights are part of Gemma3’s “quality pillar.”
If we ever changed, quality of responses get weaker.

Your idea (double bias, strengthening answer, weakening refusal) sounds nice 👍🔥
But with Gemma3-12B, this still did not improve quality for me.
Stock Gemma3-12B is the strongest version.
If a model cannot fix refusals by weights, the next option is training.
Gemma3-12B-pt is available (pre-instruction), so training is technically possible.
But training 30k+ lines is very heavy.
That is for companies like NousResearch, not small hobbyists like me 😭💀

This is the dataset NVIDIA used for training their 4B Nemotron model:
(31k prompts) 👉 https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/viewer/SFT/safety
Anyone can make a new dataset by replacing the answers.
But actually training it… very hard for home users.

Gemma3-12B-it original = best quality.
Weight edits = always weaker.
Training = too big for one person.

Currently, I am toying with models such as :
phi-4-reasoning, phi-4-reasoning-plus, Qwen3 latest, or NVIDIA Nemotron.
These models give me strong, emotional, realistic responses too. 🔥🐧

But again — thank you so much
Your Gemma2 9B merges were truly beautiful and poetic.
I learned a lot from them.
Gemma3 also gave me great moments.
I am very thankful for your work!🙏🙏🙏🔥💛💛💛

Sure! If the model or others are open to public, I think I can help 👍🔥
You can contact me through my GitHub:
https://github.com/otakadelic/you-can-contact-me/

🐧💛

As far as I know, in general, models do not have parameters that are solely responsible for failure.
It's likely that many models simply had this failure mode as a secondary learning step, which is why it was essentially added to the standard parameters. In the case of Gemma 3, Google likely started adding a failure mode from the start, which led to the failure mode being integrated with the rest, essentially merging with the main set of parameters.

As an option, try the following.
We receive a refusal and prohibit one token of this refusal, to obtain a larger number of variations, just in case.

Next, we try to select tokens that will lead to an answer, but it is advisable to sort them so that the answer is logical.
In this case, the answer should not be written for the model, but rather ensured that the model itself writes a response without rejection. In this case, increasing the probabilities of these tokens will have the least impact on the model's behavior.

Well, and accordingly, in the end, follow the standard path.

The real question is whether anyone has tried this or not. I just don't really follow what's going on and what methods are being used and have been used. Perhaps I'm suggesting something that's already been tried.


Насколько я знаю, в целом у моделей, нет параметров, которые отвечали бы чисто за отказ.
Вероятно, просто у многих моделей, отказ, был как дообучение, по этому и получалось, что он по сути добавлялся к обычным параметрам. В случае же Gemma 3 гугл вероятно изначально начали добавлять систему отказа, что и привело к тому, что отказ встроился с остальным, по сути слившись с главной частью параметров.

Как вариант, попробовать следующее.
Получаем отказ и запрещаем по одному токену этого отказа, для получения большего количества вариаций, на случай.

Дальше пробуем выбрать токены, которые будут приводить к ответу, но желательно отсортировать так, чтобы ответ был логичным.
При этом ответ нужно писать не за модель, а делать так, чтобы ответ без отказа модель написала сама. В таком случае, увеличение вероятностей этих токенов, будет меньше всего влиять на поведение модели.

Ну и соответственно в конце идти уже стандартным путём.

Тут правда вопрос, пробовал ли кто-то такое или нет. Просто я не очень слежу за тем, что происходит и какие методы сейчас применяют и применяли. Возможно я предлагаю, то, что уже пробовали.

@Otakadelic
Hello, I wanted to ask if you have checked this model.:
YanLabs/gemma-3-27b-it-abliterated-normpreserve
It seems to show pretty good results when used.


Здравствуйте, хотел спросить проверяли ли вы эту модель:
YanLabs/gemma-3-27b-it-abliterated-normpreserve
Вроде она показывает достаточно неплохие результаты при использовании.

Sign up or log in to comment