Text Generation
Safetensors
English
llava_phi
conversational
custom_code

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸŽ‰ CompeteSMoE-5.1B

CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results.

πŸ“ Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. 🚧 We're actively working on a stronger, more robust release β€” coming soon! πŸš€ Stay tuned for updates. πŸ’‘

Hardware Resources

Stage MoE Method Hardware
Pre-Training 4xH100
Pre-FineTuning 4xH100
VIT CompeteSMoE 4xH100

Citation Information

More details can be found in our paper.

If you use CompeteSMoE, please cite it using this BibTeX:

@misc{nguyen2025competesmoe,
    title={CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition},
    author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho},
    year={2025},
    eprint={2505.13380},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}
Downloads last month
11
Safetensors
Model size
5.1B params
Tensor type
BF16
Β·
BOOL
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Fsoft-AIC/CompeteSMoE-5.1B

Finetuned
(77)
this model

Dataset used to train Fsoft-AIC/CompeteSMoE-5.1B

Collection including Fsoft-AIC/CompeteSMoE-5.1B

OSZAR »