225 652 524

Adina Yakefu

AdinaY

AI & ML interests

None yet

Recent Activity

upvoted a collection about 4 hours ago

RoboBrain

upvoted a paper about 5 hours ago

AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era

posted an update about 5 hours ago

RoboBrain 🧠 an 32B open embodied AI model enabling multi-robot collaboration, released by BAAIBeijing. Model: https://huggingface.co/collections/BAAI/robobrain-681e1389c64d06b3e4a45e44 Dataset: https://huggingface.co/datasets/BAAI/ShareRobot ✨ Task decomposition into 20+ precise actions ✨ Operable region detection (e.g: teapot handles, drawers) ✨ Motion trajectory prediction to avoid collisions

View all activity

Organizations

AdinaY's activity

upvoted a collection about 4 hours ago

RoboBrain

Collection

4 items • Updated 3 days ago • 2

upvoted a paper about 5 hours ago

AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era

Paper • 2412.10255 • Published Dec 13, 2024 • 1

posted an update about 5 hours ago

Post

385

RoboBrain 🧠 an 32B open embodied AI model enabling multi-robot collaboration, released by BAAIBeijing.

Model: BAAI/robobrain-681e1389c64d06b3e4a45e44
Dataset: BAAI/ShareRobot

✨ Task decomposition into 20+ precise actions
✨ Operable region detection (e.g: teapot handles, drawers)
✨ Motion trajectory prediction to avoid collisions

liked a model about 5 hours ago

BAAI/RoboBrain

Updated Apr 3 • 1.28k • 15

posted an update about 6 hours ago

Post

367

Seed-Coder 💻 code models by ByteDance

ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4

✨ 8B models: base/instruct/reasoning
✨ MIT licensed
✨ Model-centric data filtering (less manual effort)

updated a collection about 6 hours ago

🌞 May 2025 - Open works from the Chinese community

Collection

6 items • Updated about 6 hours ago • 1

upvoted a collection about 6 hours ago

Seed-Coder

Collection

3 items • Updated 1 day ago • 5

liked a dataset about 6 hours ago

BAAI/ShareRobot

Preview • Updated Mar 29 • 8.01k • 12

posted an update 3 days ago

Post

1992

HunyuanCustom 🔥 a multimodal video generation framework supporting image, audio, video & text conditions, released by TencentHunyuan

tencent/HunyuanCustom
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation (2505.04512)

✨Strong Identity Consistency
✨SOTA outperforms

updated a collection 3 days ago

🌞 May 2025 - Open works from the Chinese community

Collection

6 items • Updated about 6 hours ago • 1

liked a model 3 days ago

tencent/HunyuanCustom

Image-to-Video • Updated 3 days ago • 93

upvoted a paper 3 days ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published 5 days ago • 69

upvoted a paper 4 days ago

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published 5 days ago • 32

reacted to merve's post with 🔥 5 days ago

Post

4902

A ton of impactful models and datasets in open AI past week, let's summarize the best 🤩 merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3

💬 Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!
> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)
> NVIDIA released new CoT reasoning datasets
🖼️ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model
> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)
🗣️ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model
> Nari released Dia, a 1.6B text-to-speech model
> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model
👩🏻‍💻 JetBrains released Melium models in base and SFT for coding
> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model 🤩

reacted to merve's post with 👍🚀 6 days ago

Post

6406

A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥

D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩

> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352

Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩

Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️

D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩

Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️

this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.

2 replies

posted an update 6 days ago

Post

3840

ACE-Step 🎵 a music generation foundation model released by
StepFun & ACEStudio

Model: ACE-Step/ACE-Step-v1-3.5B
Demo: ACE-Step/ACE-Step

✨ 3.5B, Apache2.0 licensed
✨ 115× faster than LLMs (4-min music in 20s on A100)
✨ Diffusion + DCAE + linear transformer = speed + coherence
✨ Supports voice cloning, remixing, lyric editing & more

1 reply

posted an update 6 days ago

Post

793

CCI4.0-M2 📊 A powerful dataset with 3 specialized subsets, released by
BAAIBeijing

BAAI/cci40-68199d90bbc798680df16d7c

✨ M2-Base: 3.5TB web data (EN/ZH), with LLM-augmented content, APACHE2.0
✨ M2-CoT: 4.2TB of auto-synthesized CoT reasoning data
✨ M2-Extra: domain-specific knowledge