405B or 410B ?

#8
by alielfilali01 - opened

The name and advertisement suggest the 405B name but the safetensors tag show the model as 410B ! Given the overall size it can be negligent but still it's a 5B params not counted ! Is there any specific reason?

@Ali-C137 its probably ignoring the embedding params

According to the llama3 tech paper, 405b is supposed to be using 8 key-value heads (the same as 8b and 70b), in that case, the model will be 405B (with embedding). And later they changed to 16 key-value heads (current published model) but do not want to change the model name..... They should mention it in the tech paper though.

Sign up or log in to comment

OSZAR »