multimodal-learning

ICCV 2023-2025 Papers: Discover cutting-edge research from ICCV 2023-25, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!

Updated Nov 7, 2025
Python

henghuiding / ReLA

Star

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

multimodal-learning referring-image-segmentation referring-expression-segmentation referring-expression-comprehension vision-language-transformer cvpr2023

Updated Nov 26, 2025
Python

georgian-io / Multimodal-Toolkit

Star

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

natural-language-processing tabular-data transformer multimodal-learning huggingface-transformers

Updated Oct 30, 2024
Python

MMMU-Benchmark / MMMU

Star

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 19, 2025
Python

henghuiding / MeViS

Star

[ICCV 2023 & TPAMI 2025] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

video-understanding multimodal-learning referring-expression-segmentation referring-expression-comprehension referring-video-object-segmentation mose-dataset mevis-dataset

Updated Dec 16, 2025
Python

DmitryRyumin / ICASSP-2023-24-Papers

Star

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Updated May 5, 2025
Python

subho406 / OmniNet

Star

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

nlp machine-learning deep-learning neural-network artificial-intelligence transformer image-captioning video-recognition multimodal-learning multitask-learning

Updated Oct 31, 2020
Python

microsoft / XPretrain

Star

Multi-modality pre-training

nlp computer-vision multimedia multimodal-learning pre-training

Updated May 8, 2024
Python

pykale / pykale

Star

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

python data-science machine-learning computer-vision deep-learning pytorch transfer-learning graph-analysis domain-adaptation meta-learning medical-image-analysis multimodal-learning multimodal knowledge-aware-learning

Updated Dec 29, 2025
Python

njustkmg / OMML

Star

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

python pytorch classification paddlepaddle imagecaptioning multimodal-learning multimodal crossmodal-retrieval

Updated May 7, 2023
Python

Pointcept / GPT4Point

Star

[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.

multimodal-learning 3d-generation llm

Updated Apr 27, 2024
Python

HUANGLIZI / LViT

Star

[IEEE Transactions on Medical Imaging/TMI 2023] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"

pytorch segmentation medical-image-analysis multimodal-learning vision-language

Updated Mar 10, 2025
Python

kyegomez / CM3Leon

Sponsor

Star

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention multimodality attention-is-all-you-need multimodal-learning multimodal imagegeneration dalle

Updated Dec 15, 2023
Python

UCSC-VLAA / CLIPA

Star

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

deep-learning pytorch zero-shot-learning multimodal-learning contrastive-learning zero-shot-classification foundation-models neurips-2023

Updated Jun 3, 2024
Python

Improve this page

Add a description, image, and links to the multimodal-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-learning

Here are 286 public repositories matching this topic...

mlfoundations / open_flamingo

KaiyangZhou / CoOp

Eurus-Holmes / Awesome-Multimodal-Research

AILab-CVC / UniRepLKNet

ArrowLuo / CLIP4Clip

PreferredAI / cornac

DmitryRyumin / ICCV-2023-25-Papers

henghuiding / ReLA

georgian-io / Multimodal-Toolkit

MMMU-Benchmark / MMMU

henghuiding / MeViS

DmitryRyumin / ICASSP-2023-24-Papers

subho406 / OmniNet

microsoft / XPretrain

pykale / pykale

njustkmg / OMML

Pointcept / GPT4Point

HUANGLIZI / LViT

kyegomez / CM3Leon

UCSC-VLAA / CLIPA

Improve this page

Add this topic to your repo