​How do mixture-of-experts layers affect transformer models?

This new LLM technique has started improving the results of models without additional training. 

​This new LLM technique has started improving the results of models without additional training.