There’s many examples of noisily encoding a large embedding vocabulary. This sou...

There’s many examples of noisily encoding a large embedding vocabulary. This sounds a bit like T-free or H-net? Or BLT?

One of the main issues with lines of work around this are that you end up trading embedding parameters for active parameters. This is rarely a good trade-off for the sake of compute.