SJTU VisionXLab

community

https://yangxue.site/publication

yangxue0827

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Qingyun authored a paper 24 days ago

Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Qingyun authored a paper 3 months ago

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

Qingyun authored a paper 3 months ago

Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

View all activity

Qingyun

authored a paper 24 days ago

Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Paper • 2511.21272 • Published 29 days ago

Qingyun

authored 3 papers 3 months ago

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

Paper • 2509.10059 • Published Sep 12

Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

Paper • 2503.04543 • Published Mar 6 • 1

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Paper • 2509.15221 • Published Sep 18 • 111

sharon11

updated a dataset 4 months ago

VisionXLab/RSDet-datasets

Preview • Updated Aug 25 • 193 • 2

Qingyun

authored a paper 5 months ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25 • 31

wsdwJohn1231

authored a paper 8 months ago

Decoupled Global-Local Alignment for Improving Compositional Understanding

Paper • 2504.16801 • Published Apr 23 • 14

yangxue

authored 13 papers 9 months ago

H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection

Paper • 2210.06742 • Published Oct 13, 2022 • 1

Self-supervised Character-to-Character Distillation for Text Recognition

Paper • 2211.00288 • Published Nov 1, 2022

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Paper • 2410.08202 • Published Oct 10, 2024 • 6

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

Paper • 2411.18624 • Published Nov 27, 2024

Parameter-Inverted Image Pyramid Networks

Paper • 2406.04330 • Published Jun 6, 2024 • 1

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Paper • 2312.09238 • Published Dec 14, 2023

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published Jan 14 • 8

A Simple Aerial Detection Baseline of Multimodal Language Models

Paper • 2501.09720 • Published Jan 16 • 2

PointOBB: Learning Oriented Object Detection via Single Point Supervision

Paper • 2311.14757 • Published Nov 23, 2023

H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection

Paper • 2304.04403 • Published Apr 10, 2023

ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection

Paper • 2303.04989 • Published Mar 9, 2023

FLoRA: Low-Rank Core Space for N-dimension

Paper • 2405.14739 • Published May 23, 2024

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Paper • 2311.14758 • Published Nov 23, 2023

AI & ML interests

Recent Activity

Team members 6

VisionXLab's activity