Text classification with Switch Transformer - Keras
keras.io › examples › nlpMay 10, 2020 · Introduction. This example demonstrates the implementation of the Switch Transformer model for text classification. The Switch Transformer replaces the feedforward network (FFN) layer in the standard Transformer with a Mixture of Expert (MoE) routing layer, where each expert operates independently on the tokens in the sequence.