Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

DAMO-NLP-SG
/
VideoLLaMA2.1-7B-AV

Visual Question Answering
Transformers
Safetensors
English
videollama2_qwen2
text-generation
Audio-visual Question Answering
Audio Question Answering
multimodal large language model
Model card Files Files and versions
xet
Community
4

Instructions to use DAMO-NLP-SG/VideoLLaMA2.1-7B-AV with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • Transformers

    How to use DAMO-NLP-SG/VideoLLaMA2.1-7B-AV with Transformers:

    # Use a pipeline as a high-level helper
    from transformers import pipeline
    
    pipe = pipeline("visual-question-answering", model="DAMO-NLP-SG/VideoLLaMA2.1-7B-AV")
    # Load model directly
    from transformers import AutoModelForCausalLM
    model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-SG/VideoLLaMA2.1-7B-AV", dtype="auto")
  • Notebooks
  • Google Colab
  • Kaggle
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

Some weights of Videollama2Qwen2ForCausalLM were not initialized from the model checkpoint at ./VideoLLaMA2.1-7B-AV and are newly initialized:

10
#4 opened over 1 year ago by deleted

Does this model support 'image' inference?

#3 opened over 1 year ago by
thesby

can you please tell me if the paths in config.json here needs to be modified when I run AV branch?

#2 opened over 1 year ago by
FoerKent

'process_video() got an unexpected keyword argument 'va'‘

5
#1 opened over 1 year ago by
fragrantly
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs