Post
87
I did some testing on the scalability of FWKV. It hits a speed bottleneck at 1B due to the T4’s bandwidth limitations. Theoretically, it should match RWKV’s inference speed if the GPU had more bandwidth. So the 1B size is not accurate.
None defined yet.
floor(W·K·V). I'm hoping to make it much more efficient and scalable than RWKV.