• 2 Posts
  • 9 Comments
Joined 6 days ago
cake
Cake day: May 31st, 2025

help-circle
  • Ah, great idea about the low temp for rules and high for creativity. I guess I can easily change it in the front end, although I also set the temp when I start the server, and I’m not sure which one takes priority. Hopefully the frontend does, so I can tweak it easily.

    Also your post just got me thinking about the DRY sampler, which I’m using, but might be causing troubles for cases where the model legit should repeat itself, like an !inventory or !spells command. I might try to either disable it or add a custom break token, like the ! mark.

    I think ST can show token probabilities, so I’ll try that too, thanks. I have so much to learn! I really should try other frontends though. ST is powerful in a lot of ways like dynamic management of the context, but there are other things I don’t like as much. It attaches a lot of info to a character that I don’t feel should be a property of a character. And all my D&D scenarios so far have been just me + 1 AI char, because even though ST has a “group chat” feature, I feel like it’s cumbersome and kind of annoying. It feels like the frontend was first designed around one AI char only, and then something got glued on to work around that limitation.





  • Thanks for your comments and thoughts! I appreciate hearing from more experienced people.

    I feel like a little bit of prompt engineering would go a long way.

    Yah, probably so. I tried to write a system prompt to steer the model toward what I wanted, but it’s going to take a lot more refinement and experimenting to dial it in. I like your idea of asking it to be unforgiving about rules. I hadn’t put anything like that in.

    That’s a great idea about putting a D&D manual, or at least the important parts, into a RAG system. I haven’t tried RAG yet but it’s on my queue of matters to learn. I know what it is, I just haven’t tried it yet.

    I’ve for sure seen that the quality of output starts to decline about 16K context, even on models that claim to support 128K. Also, I feel like the system prompt seems more effective when there are only let’s say 4K context tokens so far. After the context grows, the model becomes less and less inclined to follow the system prompt. I’ve been guessing this is because as the context grows, any given piece of it becomes more dilute, but I don’t really know.

    For those reasons, I’m trying to use summarization to keep the context size under control, but I haven’t found a good approach yet. SillyTavern has an auto summary injecting system, but either I’m misunderstanding it, or I don’t like how it works, and I end up doing it manually.

    I tried a few CoT models, but not since I moved to ST as a front end. I was using them with the standard llama-server web interface, which is a rather simple affair. My problem was that the thinking output seemed to spam up the context, leaving me much less ctx space for my own use. Each think block was like 500-800 tokens. It looks like ST might have an ability to only keep the most recent think block in the context, so I need to do more experimenting. The other problem I had was that the thinking could just take a lot of time.



  • but it’s no fun when the LLM simply says “yeah, sure whatever.” I

    I hear ya. LLMs tend to heavily tilt toward what the user wants, which is not ideal for an RPG.

    Have you tried any of the specialized RPG models? The one I’m using now has, at least twice so far, put me into a situation where I felt my party (2 chars, me and the AI) were going to die unless we ran away. We just finished a very difficult fight, used everything at our disposal, and sustained several serious injuries in the process. Then an even more powerful foe appeared, and it really felt that was going to be the end unless we ran. Would it really have killed us? I can’t say, but I did get a genuine sense of that. It might help that in the system prompt, I had put this:

    The story should be genuinely dangerous and frightening, but survivable if we use our wits.

    I have the feeling the generalist models are much more tilted in the “yeah, sure, whatever” direction. I tried at least one RPG focused model (Dan’s dangerous winds, or something like that) which was downright brutal, and would kill me off right away with no opportunity for me to do anything about it. That wasn’t fun for the opposite reason. But like you say, it’s also not fun to have no risk and no boundaries to test one’s mettle. The sweet spot is can be elusive.

    I’m thinking that a non-LLM rules system around an LLM for descriptive purposes could really help here too, to enforce a kind of rigor on the experience.




  • What’s the advantage over Ollama?

    I’m very new to this so someone more knowledgeable should probably answer this for real.

    My impression was that ollama somehow uses the llama.cpp source internally, but wraps it up to provide features like auto-downloading of models. I didn’t care about that, but I liked the very tiny dependency footprint of llama.cpp. I haven’t tried ollama for network inference.

    There are other backends too which support network inference, and some posts allege they are better for that than llama.cpp is. vllm and … exllama or something like that? I haven’t looked into either of them. I’m running on inertia so far with llama.cpp, since it was so easy to get going and I’m kinda lazy.