AI & ML interests

Local LLMs

Recent Activity

Parveshiiii 
posted an update 4 days ago
view post
Post
3396
Hey everyone!
We’re excited to introduce our new Telegram group: https://t.me/XenArcAI

This space is built for **model builders, tech enthusiasts, and developers** who want to learn, share, and grow together. Whether you’re just starting out or already deep into AI/ML, you’ll find a supportive community ready to help with knowledge, ideas, and collaboration.

💡 Join us to:
- Connect with fellow developers and AI enthusiasts
- Share your projects, insights, and questions
- Learn from others and contribute to a growing knowledge base

👉 If you’re interested, hop in and be part of the conversation: https://t.me/XenArcAI
·
prithivMLmods 
posted an update 6 days ago
view post
Post
3569
Introducing demos for new SOTA models from AI2: SAGE-MM (Smart Any-Horizon Agents for Long-Video Reasoning) and Molmo-2, an open vision-language model that supports multi-image (QA and pointing) and video (QA, pointing, and tracking). The respective demo-related collections are listed below. 🎃🔥

✨ SAGE-MM [Video-Reasoning]: prithivMLmods/SAGE-MM-Video-Reasoning
✨ Molmo2 [Demo]: prithivMLmods/Molmo2-HF-Demo

🎃 GitHub[SAGE-MM]: https://github.com/PRITHIVSAKTHIUR/SAGE-MM-Video-Reasoning
🎃 GitHub[Molmo2]: https://github.com/PRITHIVSAKTHIUR/Molmo2-HF-Demo
🎃 Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!
  • 1 reply
·
prithivMLmods 
posted an update 7 days ago
view post
Post
1997
Introducing TRELLIS.2 Text-to-3D. The demo for the TRELLIS.2-4B (Image-to-3D) model is streamlined with the Z-Image Turbo image generation model to enable Text-to-3D functionality. There is no need for input assets, making a small leap forward for ideation. Optionally, it also includes default support for Image-to-3D inference using direct image assets. Find the demo and related collections below... 🤗🔥

✨ TRELLIS.2-Text-to-3D [Demo]: prithivMLmods/TRELLIS.2-Text-to-3D
✨ Multimodal Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Github: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D

To know more about it, visit the app page or the respective model page!
Nymbo 
posted an update 8 days ago
view post
Post
1721
🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets


I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot
prithivMLmods 
posted an update 9 days ago
view post
Post
1982
Demo for Molmo2 on Hugging Face is live now, including Single/Multi-Image VQA, Visual Pointing/Grounding, Video VQA, and Video Point Tracking. Find the demo and related collections below. 🔥🤗

● Molmo2 HF Demo🖥️: prithivMLmods/Molmo2-HF-Demo
● Model Collection: https://huggingface.co/collections/allenai/molmo2
● Related Multimodal Space Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update 10 days ago
view post
Post
5502
Introducing the Z Image Turbo LoRA DLC App, a gallery space for plug-and-play Z-Image-Turbo LoRAs. It features a curated collection of impressive LoRAs for generating high-quality images. By default, it runs on the base model. Simply choose a LoRA, type your prompt, and generate images. You can find the app and more details below. 🤗🧪

● Space [Demo]: prithivMLmods/Z-Image-Turbo-LoRA-DLC
● Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
● Check the list of Z-Image LoRA's: https://huggingface.co/models?other=base_model:adapter:Tongyi-MAI/Z-Image-Turbo
● Github: https://github.com/PRITHIVSAKTHIUR/Z-Image-Turbo-LoRA-DLC

Other related image gen spaces:-

● FLUX-LoRA-DLC2: prithivMLmods/FLUX-LoRA-DLC2
● FLUX-LoRA-DLC: prithivMLmods/FLUX-LoRA-DLC
● Qwen-Image-LoRA-DLC: prithivMLmods/Qwen-Image-LoRA-DLC
● Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
● Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

& more...

To know more about it, visit the app page or the respective model page!
  • 2 replies
·
Aurelien-Morgan 
posted an update 17 days ago
leonardlin 
posted an update 17 days ago
view post
Post
2168
We just released our latest Shisa V2.1 Japanese multi-lingual models: https://huggingface.co/collections/shisa-ai/shisa-v21

Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.

Per usual, lots of details in the Model Cards for those interested.
  • 1 reply
·
prithivMLmods 
posted an update 18 days ago
view post
Post
2718
Introducing the D.Markdown Experimental Models, Proxima and Epsilon OCR models, built on top of Qwen3-VL and Qwen2.5-VL respectively. Proxima is optimized for Markdown generation and is capable of embedding inline programming code snippets and generating rich nodes such as HTML, XML, JSON, and YAML. Epsilon is optimized for reconstructing complex layouts including tables, forms, and mathematical content. 🌌✨

● proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
● epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
● proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
● epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF

● Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
● Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

👉 These models are stage progression models, and currently they may contain artifacts.

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update 20 days ago
view post
Post
1118
Try CUA GUI Operator 🖥️ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.

● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Other related multimodal spaces

● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en

I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.

To know more about it, visit the app page or the respective model page!
  • 1 reply
·
prithivMLmods 
posted an update 21 days ago
view post
Post
3558
One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. 🗣️🔥

🤗 Vision-to-VibeVoice-en [Demo]: prithivMLmods/Vision-to-VibeVoice-en
✨ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Speech [VibeVoice-Realtime-0.5B]: microsoft/VibeVoice-Realtime-0.5B
✨ Vision [Qwen2.5-VL]: Qwen/Qwen2.5-VL-7B-Instruct

To know more about it, visit the app page or the respective model page!
·
prithivMLmods 
posted an update 26 days ago
view post
Post
3705
Hello everyone,

The
strangerzonehf
[HF] Community / Organization Page, which is maintained by me, has reached the Top 10 Developer Pages ranking at 6th place, contributing 3.4% in the calendar cycle from August 2024 to August 2025. It is also the only South Asia / Indian page in the list. I could not be more proud to be doing things for the community. ❤️🤗

Source: https://www.dataprovenance.org/economies-of-open-intelligence.pdf

It is a pleasure to be a part of it.
Thank you!
@prithivMLmods
prithivMLmods 
posted an update 30 days ago
view post
Post
10669
Introducing the Super-OCRs Demo, a comparison of state-of-the-art multimodal OCR VLMs, including HunyuanOCR, DeepSeekOCR, Dots, and Nanonets in one space for performing OCR, rendering LaTeX and Markdown, and visual grounding (layout). Find the related Spaces and models below.🤗🔥

✨Super-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
✨Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨GitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo

⭐ Models Used:
✦ HunyuanOCR: tencent/HunyuanOCR
✦ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
✦ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
✦ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B

⭐ Some Other Relevant Apps:
✦ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✦ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
✦ Multimodal-OCR: prithivMLmods/Multimodal-OCR
✦ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
✦ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
✦ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental

To know more about it, visit the app page or the respective model page!
Nymbo 
posted an update about 1 month ago
view post
Post
5007
🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)


Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool


The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot
  • 1 reply
·
prithivMLmods 
posted an update about 1 month ago
view post
Post
3222
Introducing the advanced sketch-board editor "Nano-Banana-Pro-Sketch-Board" powered by the Gemini 2.5 Flash Image and Gemini 3 Pro Preview Image models through the Gemini API. This version includes more features than the Nano-Banana-AIO app for drawing and prompt-based concept transformation of freestyle sketches. 🔥🍌

✨Nano-Banana-Pro-Sketch-Board: prithivMLmods/Nano-Banana-Pro-Sketch-Board
✨Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
✨Github: https://github.com/PRITHIVSAKTHIUR/Nano-Banana-Pro-Sketch-Board
✨Model-Garden: https://tinyurl.com/4xxs9dvy

Some Other Relevant Apps [OSS]

⭐Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⭐Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⭐Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⭐Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Note: The Nano-Banana-Pro-Sketch-Board demo requires a Gemini API key for the editing process. Your API key will be removed when the app is reloaded or closed. Your key remains safe and will not be exposed to any medium. Also, the Gemini 3 Pro Preview Image model may require a paid API key from a Google Cloud project with billing enabled.

To know more about it, visit the app info section or the respective Model Garden page!
prithivMLmods 
posted an update about 1 month ago
view post
Post
1329
Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.

⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Some relevant Spaces

⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3

Check out the other spaces in the multimodal implementation collection.

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update about 1 month ago
view post
Post
1501
Try the all-new trending Qwen-Image-Edit-2509 (Multi-Image-Edits) specialized adapter demos, including Cloth-Design-Fuse, Texture Edit, Guided-Objects-Patching, and more — all in a single Hugging Face Space. The demo link is provided below. 🤗🔥

⮞ Space[Demo]: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⮞ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
⮞ Base Model: Qwen/Qwen-Image-Edit-2509

Similar applications↗️

⮞ Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2
⮞ Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⮞ Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update about 1 month ago
view post
Post
3534
Made a demo for multimodal understanding of Qwen3-VL space for tasks including point annotation, detection, captioning, guided text inferences, and more. Find the demo link below. 🤗↗️

⮞ Space[Demo]: prithivMLmods/Qwen3-VL-HF-Demo
⮞ Model Used: Qwen/Qwen3-VL-4B-Instruct
⮞ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
⮞ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-3VL-Multimodal-Understanding

To know more about it, visit the app page or the respective model page!
Parveshiiii 
posted an update about 1 month ago
view post
Post
1638
Another banger from XenArcAI! 🔥

We’re thrilled to unveil three powerful new releases that push the boundaries of AI research and development:

🔗 XenArcAI/SparkEmbedding-300m

- A lightning-fast embedding model built for scale.
- Optimized for semantic search, clustering, and representation learning.

🔗 XenArcAI/CodeX-7M-Non-Thinking

- A massive dataset of 7 million code samples.
- Designed for training models on raw coding patterns without reasoning layers.

🔗 XenArcAI/CodeX-2M-Thinking

- A curated dataset of 2 million code samples.
- Focused on reasoning-driven coding tasks, enabling smarter AI coding assistants.

Together, these projects represent a leap forward in building smarter, faster, and more capable AI systems.

💡 Innovation meets dedication.
🌍 Knowledge meets responsibility.