awesome-colab-notebooks

GitHub
1.6k 277 非常简单 1 次阅读 昨天MIT开发框架
AI 解读 由 AI 自动生成,仅供参考

awesome-colab-notebooks 是一个专为机器学习实验打造的 Google Colab 笔记本精选合集。它汇集了来自全球顶尖机构(如 Google、Meta、伯克利等)的高质量代码资源,涵盖从大模型智能体、交互式图像分割到强化学习课程作业等前沿领域。

对于许多开发者和研究人员而言,配置复杂的深度学习环境往往耗时费力,且复现论文结果门槛较高。awesome-colab-notebooks 直击这一痛点,提供了“开箱即用”的实验环境。用户无需在本地安装繁琐的依赖库或担心显卡资源不足,只需点击链接即可在云端免费 GPU 上直接运行和修改代码,极大地缩短了从想法到验证的周期。

这套资源特别适合 AI 研究者、数据科学家以及希望快速上手新技术的开发者。无论是需要复现最新论文算法的科研人员,还是想要通过实战案例学习大模型应用的学生,都能在这里找到对应的模板。其核心亮点在于不仅整理了热门开源项目(如 LangGraph、Qwen3-Omni 等)的演示笔记,还持续追踪社区趋势,确保内容的时效性与实用性,是探索人工智能领域的高效加速器。

使用场景

一名计算机视觉研究生正急于复现最新的交互式图像分割论文(如 ritm-interactive-segmentation),以验证其在自定义数据集上的效果,但受限于本地显卡显存不足且环境配置复杂。

没有 awesome-colab-notebooks 时

  • 环境配置噩梦:需手动在本地安装 CUDA、cuDNN 及各类依赖库,常因版本冲突导致“依赖地狱”,耗费数天仍无法跑通代码。
  • 硬件门槛高企:本地消费级显卡显存不足以加载大模型,强行运行会导致内存溢出(OOM),迫使研究者不得不申请昂贵的云服务器或排队使用实验室集群。
  • 复现路径迷茫:面对 GitHub 上零散的官方仓库,缺乏经过调试的标准化入口,难以快速定位可执行的演示代码,大量时间浪费在排查基础报错上。
  • 实验迭代缓慢:每次调整超参数或更换数据集都需要重新配置环境,无法实现“开箱即用”的快速验证,严重拖慢科研进度。

使用 awesome-colab-notebooks 后

  • 一键启动实验:直接点击集合中对应的 Colab 链接(如 ritm-interactive-segmentation 笔记本),无需本地安装任何环境,浏览器即可调用云端 GPU 立即运行。
  • 免费算力支持:利用 Google Colab 提供的免费 T4/P100 GPU 资源,轻松突破本地硬件限制,流畅运行原本需要高端显卡才能负载的深度学习模型。
  • 标准化复现流程:借助社区维护的成熟笔记,直接获取已预装好依赖、打通数据加载与推理流程的代码,将复现时间从数天缩短至几分钟。
  • 敏捷迭代验证:可在云端直接修改代码单元格测试不同参数或上传私有数据,实时查看分割结果,极大提升了算法验证与调优的效率。

awesome-colab-notebooks 通过将复杂的工程配置封装为云端即点即用的实验模板,让研究者能从繁琐的环境搭建中解脱,专注于核心算法的创新与验证。

运行环境要求

操作系统
  • 未说明
GPU

未说明 (该项目为 Colab 笔记本集合,具体 GPU 需求取决于所选的单个笔记本,通常由 Google Colab 免费提供 T4/P100/A100 等)

内存

未说明 (取决于具体实验,Colab 免费版通常提供约 12GB RAM)

依赖
notes该项目本身不是一个单一的可安装工具,而是一个精选的机器学习 Colab 笔记本列表。运行环境完全依赖于 Google Colab 云端平台,无需本地配置操作系统或驱动。具体的库依赖和硬件需求(如显存大小)因用户选择运行的特定笔记本(如 LLaMA Factory, Gaussian Splatting 等)而异。建议直接在浏览器中打开链接使用,或根据特定笔记本的说明在本地复现环境。
python未说明 (由 Google Colab 环境决定,通常为 Python 3.10+)
torch
transformers
tensorflow
jax
langgraph
sglang
swanlab
opik
accelerate
diffusers
awesome-colab-notebooks hero image

快速开始

访问量 awesome-colab-notebooks

词云

页面可能无法正常显示。请直接打开 README.md 文件

用于机器学习实验的超赞 Colab 笔记本合集

热门

仓库 论文 软件包
  • agent-starter-pack
  • PaddleHub
  • ritm-interactive-segmentation
  • verl
  • Qwen3-Omni
  • dinov3
  • wmar
  • cwm
  • circle-guard-bench
  • SAELens
  • ARENA_3.0
  • TabPFN
  • vjepa2
  • opik
  • alphaevolve_results
  • SwanLab
  • homework_fall2023
  • langgraph
  • sglang
  • part_1_ml_cv
  • felix
  • prompt-eng-interactive-tutorial
  • presidio

课程

课程
name description authors links colaboratory update
LLM Engineering Essentials course 12-week course, created by experts from academia and industry, is designed specifically for developers and engineers Open In Colab 22.05.2025
Understanding Language Models Course deals with language models, in particular (but not exclusively so) on transformer-based language models like GPT-x or LLama Open In Colab 17.04.2025
Practical RL An open course on reinforcement learning in the wild Open In Colab 02.03.2025
Deep Learning School course (ML + CV) Nina Konovalova Open In Colab 14.02.2025
Introduction to Deep Learning course Tatiana Gaintseva
  • medium, medium
  • tf
  • wiki
  • yt, yt, yt, yt
Open In Colab 24.01.2025
ARENA Provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles Callum McDougall Open In Colab 30.12.2024
Deep Learning Course at the University of Amsterdam Series of Jupyter notebooks that are designed to help you understanding the "theory" from the lectures by seeing corresponding implementations Open In Colab 17.10.2024
The Autodiff Cookbook You'll go through a whole bunch of neat autodiff ideas that you can cherry pick for your own work, starting with the basics Open In Colab 20.09.2024
Machine Learning Simplified A Gentle Introduction to Supervised Learning Andrew Wolf Open In Colab 29.08.2024
Anthropic courses Anthropic's educational courses Anthropic
  • docs
  • git
  • pypi
  • reddit
Open In Colab 22.08.2024
mlcourse.ai Open Machine Learning Course Yury Kashnitsky Open In Colab 19.08.2024
Deep RL Course The Hugging Face Deep Reinforcement Learning Course Open In Colab 24.06.2024
Anthropic's Prompt Engineering Interactive Tutorial Course is intended to provide you with a comprehensive step-by-step understanding of how to engineer optimal prompts within Claude Anthropic
  • docs
  • git
  • medium
  • yt, yt
Open In Colab 02.04.2024
Generative AI for Beginners - A Course A 12 Lesson course teaching everything you need to know to start building Generative AI applications microsoft Open In Colab 22.02.2024
Deep Reinforcement Learning CS 285 at UC Berkeley Open In Colab 29.08.2023
npNLG The course introduces the basics of NLG, neural language models and their implementation in PyTorch, as well as a selection of recent pragmatic neural NLG approaches Michael Franke Open In Colab 09.11.2022
DSP theory Theory of digital signal processing: signals, filtration (IIR, FIR, CIC, MAF), transforms (FFT, DFT, Hilbert, Z-transform) etc Open In Colab 18.10.2022
Machine learning course This course is broad and shallow, but author will provide additional links so that you can deepen your understanding of the ML method you need Тимчишин Віталій Open In Colab 02.09.2021
Udacity Deep Learning class with TensorFlow Learn how to apply deep learning to solve complex problems Mark Daoust Open In Colab 20.01.2021
Intro to TensorFlow for Deep Learning Dive into deep learning with this practical course on TensorFlow and the Keras API Open In Colab 09.09.2020
Introduction to TensorFlow Lite Learn how to deploy deep learning models on mobile and embedded devices with TensorFlow Lite Open In Colab 09.09.2020
NYU-DLSP20 This course concerns the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition Open In Colab 30.10.2019

研究

研究
name description authors links colaboratory update
GigaAM SSL pretraining framework that leverages masked language modeling with targets derived from a speech recognition model Open In Colab 20.11.2025
Segment Anything 3 Unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., “yellow school bus”), image exemplars, or a combination of both Open In Colab 19.11.2025
AlphaFold Highly accurate protein structure prediction Open In Colab 23.10.2025
OWL-ViT Simple Open-Vocabulary Object Detection with Vision Transformers
  • arxiv
  • hf
Open In Colab 29.09.2025
CWM Code World Model, a 32-billion-parameter open-weights LLM, to advance research on code generation with world models
  • hf, hf, hf
  • medium, medium
  • meta, meta
  • twitter
  • yt, yt
Open In Colab 24.09.2025
Qwen3-Omni Single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts Open In Colab 22.09.2025
WMAR Custom tokenizer-detokenizer finetuning procedure that improves RCC, and a complementary watermark synchronization layer
  • arxiv
  • git, git, git, git
Open In Colab 19.09.2025
LIMIT On the Theoretical Limitations of Embedding-Based Retrieval
  • arxiv
  • git
  • hf, hf
  • medium
Open In Colab 27.08.2025
DINOv3 Produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models
  • arxiv, arxiv
  • hf, hf
  • kaggle
  • medium
  • meta, meta, meta
  • yt
Open In Colab 14.08.2025
Hogwild! Inference Run LLM "workers" in parallel, allowing them to synchronize via a concurrently-updated attention cache and prompt these workers to decide how best to collaborate Open In Colab 15.07.2025
Grounding DINO Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
  • arxiv
  • git, git, git, git, git, git, git
  • pwc, pwc, pwc, pwc
  • yt, yt, yt, yt
Open In Colab 10.07.2025
Hunyuan Open-source large language model built on a fine-grained Mixture-of-Experts architecture manayang Open In Colab 01.07.2025
Whisper Automatic speech recognition system trained on 680,000 hours of multilingual and multitask supervised data collected from the web Open In Colab 26.06.2025
IT³ Idempotent Test-Time Training, approach that enables on-the-fly adaptation to distribution shifts using only the current test instance, without any auxiliary task design Open In Colab 25.06.2025
AlphaEvolve Evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs on highly challenging tasks such as tackling open scientific problems or optimizing critical pieces of computational infrastructure
  • deepmind, deepmind
  • medium
  • reddit
  • wiki
  • yt, yt, yt, yt, yt, yt, yt, yt
Open In Colab 17.06.2025
V-JEPA 2 Self-supervised approach that combines internet-scale video data with a small amount of interaction data, to develop models capable of understanding, predicting, and planning in the physical world FAIR Open In Colab 11.06.2025
TimesFM Time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset Open In Colab 26.05.2025
Qwen2.5-Omni End-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner Open In Colab 29.04.2025
EAT Emotional Adaptation for Audio-driven Talking-head method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner through parameter-efficient adaptations Open In Colab 22.04.2025
Moshi Speech-text foundation model and full-duplex spoken dialogue framework
  • arxiv, arxiv, arxiv, arxiv, arxiv
  • demo
  • git, git, git, git
  • hf
  • pypi
  • reddit
  • yt, yt, yt, yt, yt
Open In Colab 31.03.2025
BiRefNet Bilateral reference framework for high-resolution dichotomous image segmentation Open In Colab 24.03.2025
ESM Evolutionary Scale Modeling: Pretrained language models for proteins Open In Colab 21.03.2025
Video Seal Comprehensive framework for neural video watermarking and a competitive open-sourced model Open In Colab 17.03.2025
SigLIP 2 Family of new multilingual vision-language encoders that build on the success of the original SigLIP
  • arxiv, arxiv
  • git, git, git
  • hf
  • medium, medium
  • yt
Open In Colab 17.03.2025
DeepLabCut Efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data Open In Colab 28.02.2025
STAR Spatial Temporal Augmentation with T2V models for Real-world video super-resolution, a novel approach that leverages T2V models for real-world video super-resolution, achieving realistic spatial details and robust temporal consistency Open In Colab 22.01.2025
InvSR Image super-resolution technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance
  • arxiv
  • git
  • hf, hf
  • yt
Open In Colab 21.01.2025
ModernBERT Bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders
  • arxiv
  • git, git
  • hf
  • medium
  • yt, yt, yt, yt
Open In Colab 22.12.2024
GraphCast Learning skillful medium-range global weather forecasting
  • arxiv
  • data
  • deepmind
  • git, git, git, git, git
  • medium
  • yt, yt, yt, yt, yt
Open In Colab 04.12.2024
TAPIR Tracking Any Point with per-frame Initialization and temporal Refinement Open In Colab 30.11.2024
ConsisID Tuning-free DiT-based controllable IPT2V model to keep human identity consistent in the generated video Open In Colab 28.11.2024
T2M-GPT Conditional generative framework based on Vector Quantised-Variational AutoEncoder and Generative Pre-trained Transformer for human motion generation from textural descriptions Open In Colab 24.11.2024
PuLID Pure and Lightning ID customization, a tuning-free ID customization method for text-to-image generation
  • arxiv
  • git, git, git
  • reddit
Open In Colab 09.11.2024
CoTracker Architecture that jointly tracks multiple points throughout an entire video Open In Colab 16.10.2024
PIFu Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
  • arxiv
  • yt
Open In Colab 08.10.2024
DifFace Method that is capable of coping with unseen and complex degradations more gracefully without complicated loss designs
  • arxiv
  • git, git, git, git
  • hf
Open In Colab 05.10.2024
Segment Anything 2 Foundation model towards solving promptable visual segmentation in images and videos Open In Colab 01.10.2024
Open-Unmix A deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists Open In Colab 25.09.2024
Deep Painterly Harmonization Algorithm produces significantly better results than photo compositing or global stylization techniques and that it enables creative painterly edits that would be otherwise difficult to achieve
  • arxiv, arxiv
  • git, git, git
Open In Colab 23.09.2024
CogVideo Large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels
  • arxiv, arxiv
  • demo
  • discord
  • git, git, git, git, git, git, git, git, git, git, git, git, git, git, git
  • hf, hf, hf, hf
  • reddit
  • yt, yt, yt, yt, yt
Open In Colab 18.09.2024
audio2photoreal Framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction Open In Colab 13.09.2024
Fast Segment Anything CNN Segment Anything Model trained using only 2% of the SA-1B dataset published by SAM authors
  • arxiv, arxiv
  • git
  • medium
  • yt, yt, yt
Open In Colab 10.09.2024
Neuralangelo Framework for high-fidelity 3D surface reconstruction from RGB video captures Open In Colab 02.09.2024
SPIN Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop Open In Colab 21.08.2024
YOLOv10 Aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture Open In Colab 20.08.2024
SpecVQGAN Taming the visually guided sound generation by shrinking a training dataset to a set of representative vectors Open In Colab 12.07.2024
LivePortrait Video-driven portrait animation framework with a focus on better generalization, controllability, and efficiency for practical usage Open In Colab 10.07.2024
Wav2Lip A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild Open In Colab 27.06.2024
FELIX Feature Engineering with LLMs for Interpretability and Explainability, a novel approach harnessing the vast world knowledge embedded in pre-trained Large Language Models to automatically generate a set of features describing the data Open In Colab 13.06.2024
PoolFormer MetaFormer Is Actually What You Need for Vision
  • arxiv
  • git, git, git
  • hf
Open In Colab 01.06.2024
StoryDiffusion Way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner Open In Colab 04.05.2024
FILM A frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion Open In Colab 03.05.2024
VoiceCraft token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech on audiobooks, internet videos, and podcasts Open In Colab 21.04.2024
ZeST Method for zero-shot material transfer to an object in the input image given a material exemplar image Open In Colab 16.04.2024
InstantMesh Feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability
  • arxiv
  • git, git, git
  • hf
  • reddit
  • yt
Open In Colab 16.04.2024
Würstchen Architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models
  • arxiv
  • hf
  • reddit
  • yt
Open In Colab 06.04.2024
BEiT Self-supervised vision representation model, which stands for Bidirectional Encoder representation from Image Transformers
  • arxiv, arxiv, arxiv
  • git, git, git, git
  • hf
  • medium
  • pwc
  • yt, yt
Open In Colab 30.03.2024
AudioSep Foundation model for open-domain audio source separation with natural language queries Open In Colab 15.03.2024
AQLM Extreme Compression of Large Language Models via Additive Quantization
  • arxiv
  • hf, hf, hf
  • reddit
  • yt, yt
Open In Colab 08.03.2024
YOLOv9 Learning What You Want to Learn Using Programmable Gradient Information Open In Colab 05.03.2024
Multi-LoRA Composition LoRA Switch and LoRA Composite, approaches that aim to surpass traditional techniques in terms of accuracy and image quality, especially in complex compositions Open In Colab 03.03.2024
AMARETTO Multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers shared and distinct within and across biological systems of human disease Open In Colab 28.02.2024
LIDA Tool for generating grammar-agnostic visualizations and infographics Victor Dibia Open In Colab 06.02.2024
ViT Vision Transformer and MLP-Mixer Architectures Open In Colab 06.02.2024
Qwen Comprehensive language model series that encompasses distinct models with varying parameter counts qwenlm
  • arxiv, arxiv, arxiv, arxiv
  • discord
  • docker
  • git, git, git, git, git, git
  • hf
  • pt
  • yt, yt, yt
Open In Colab 30.01.2024
3D Ken Burns A reference implementation of 3D Ken Burns Effect from a Single Image using PyTorch - given a single input image, it animates this still image with a virtual camera scan and zoom subject to motion parallax Manuel Romero
  • arxiv
  • yt
Open In Colab 24.01.2024
VALL-E X Cross-lingual neural codec language model for cross-lingual speech synthesis Open In Colab 19.01.2024
PhotoMaker Efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information Open In Colab 18.01.2024
DDColor End-to-end method with dual decoders for image colorization
  • arxiv
  • git, git
Open In Colab 15.01.2024
PASD Pixel-aware stable diffusion network to achieve robust Real-ISR as well as personalized stylization
  • arxiv
  • git
  • hf, hf
  • reddit
Open In Colab 12.01.2024
HandRefiner Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
  • arxiv
  • git, git, git
  • reddit
  • yt
Open In Colab 08.01.2024
LLaVA Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding Open In Colab 22.12.2023
Background Matting V2 Real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU Open In Colab 22.12.2023
FreeInit Concise yet effective method to improve temporal consistency of videos generated by diffusion modelsconcise yet effective method to improve temporal consistency of videos generated by diffusion models Open In Colab 21.12.2023
Gaussian Splatting State-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution Open In Colab 19.12.2023
SMPLer-X Scaling up EHPS towards the first generalist foundation model, with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources Open In Colab 18.12.2023
DeepCache Training-free paradigm that accelerates diffusion models from the perspective of model architecture Open In Colab 18.12.2023
MagicAnimate Diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity Open In Colab 18.12.2023
DiffBIR Towards Blind Image Restoration with Generative Diffusion Prior Open In Colab 18.12.2023
Segment and Track Anything Framewoork that allows users to precisely and effectively segment and track any object in a video
  • arxiv, arxiv, arxiv
  • git
  • hf
  • neurips, neurips
  • yt, yt, yt, yt, yt, yt, yt, yt, yt, yt, yt, yt
Open In Colab 08.12.2023
AudioLDM Text-to-audio system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining latents Open In Colab 02.12.2023
TabPFN Neural network that learned to do tabular data prediction Open In Colab 29.11.2023
Concept Sliders Plug-and-play low rank adaptors applied on top of pretrained models Open In Colab 26.11.2023
Qwen-VL Set of large-scale vision-language models designed to perceive and understand both text and images Open In Colab 24.11.2023
AnimeGANv3 Double-tail generative adversarial network for fast photo animation Open In Colab 23.11.2023
Ithaca First Deep Neural Network for the textual restoration, geographical and chronological attribution of ancient Greek inscriptions Open In Colab 21.11.2023
PixArt-Σ Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Open In Colab 07.11.2023
Zero123++ Image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view
  • arxiv
  • git, git
  • hf, hf
  • medium
  • reddit
  • yt
Open In Colab 26.10.2023
UniFormerV2 Unified Transformer for Efficient Spatiotemporal Representation Learning
  • arxiv
  • git, git, git, git
  • hf
  • pwc, pwc, pwc, pwc, pwc
Open In Colab 20.10.2023
Show-1 Hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation Open In Colab 15.10.2023
DA-CLIP Degradation-aware vision-language model to better transfer pretrained vision-language models to low-level vision tasks as a universal framework for image restoration Open In Colab 11.10.2023
SadTalker Generates 3D motion coefficients of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation Open In Colab 10.10.2023
Musika Music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU Open In Colab 09.10.2023
YOLOv6 Single-stage object detection framework dedicated to industrial applications Open In Colab 08.10.2023
DreamGaussian Algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details Open In Colab 04.10.2023
ICON Given a set of images, method estimates a detailed 3D surface from each image and then combines these into an animatable avatar Open In Colab 31.08.2023
DINOv2 Produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning Open In Colab 31.08.2023
StyleGAN 3 Alias-Free Generative Adversarial Networks Open In Colab 13.08.2023
FateZero Zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask Open In Colab 13.08.2023
Big GAN Large Scale GAN Training for High Fidelity Natural Image Synthesis
  • arxiv
Open In Colab 03.08.2023
LaMa Resolution-robust Large Mask Inpainting with Fourier Convolutions Open In Colab 02.08.2023
MakeItTalk A method that generates expressive talking-head videos from a single facial image with audio as the only input Open In Colab 27.07.2023
HiDT A generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution Open In Colab 24.07.2023
AWQ Activation-aware Weight Quantization, a hardware-friendly approach for LLM low-bit weight-only quantization Open In Colab 24.07.2023
Once-for-All Train a once-for-all network that supports diverse architectural settings by decoupling training and search, to reduce the cost Open In Colab 19.07.2023
Recognize Anything & Tag2Text Vision language pre-training framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features Open In Colab 09.07.2023
Thin-Plate Spline Motion Model End-to-end unsupervised motion transfer framework Open In Colab 07.07.2023
MobileSAM Towards Lightweight SAM for Mobile Applications
  • arxiv
  • git, git, git, git, git, git, git, git
  • twitter
  • yt
Open In Colab 30.06.2023
T5X Modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models at many scales
  • arxiv, arxiv
  • docs
  • git, git
  • tf, tf, tf
Open In Colab 27.06.2023
CodeTalker Cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty Open In Colab 16.06.2023
Gen-L-Video Extending off-the-shelf short video diffusion models for generating and editing videos comprising hundreds of frames with diverse semantic segments without introducing additional training, all while preserving content consistency Open In Colab 04.06.2023
First Order Motion Model for Image Animation Transferring facial movements from video to image Aliaksandr Siarohin Open In Colab 04.06.2023
PolyGen Approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture
  • arxiv, arxiv, arxiv
  • git
  • yt
Open In Colab 02.06.2023
Parallel WaveGAN State-of-the-art non-autoregressive models to build your own great vocoder Tomoki Hayashi Open In Colab 01.06.2023
ECON designed for "Human digitization from a color image", which combines the best properties of implicit and explicit representations, to infer high-fidelity 3D clothed humans from in-the-wild images, even with loose clothing or in challenging poses
  • arxiv
  • discord
  • docker
  • git, git, git, git, git, git, git
  • reddit
  • twitter
  • yt, yt, yt, yt
Open In Colab 31.05.2023
MMS The Massively Multilingual Speech project expands speech technology from about 100 languages to over 1000 by building a single multilingual speech recognition model supporting over 1100 languages, language identification models able to identify over 4000 languages, pretrained models supporting over 1400 languages, and text-to-speech models for over 1100 languages
  • arxiv
  • hf, hf, hf
  • meta
  • yt, yt
Open In Colab 26.05.2023
FAB Flow AIS Bootstrap uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes
  • arxiv
  • git, git
  • yt
Open In Colab 29.04.2023
CodeFormer Transformer-based prediction network to model global composition and context of the low-quality faces for code prediction, enabling the discovery of natural faces that closely approximate the target faces even when the inputs are severely degraded Open In Colab 21.04.2023
Text2Video-Zero Text-to-Image Diffusion Models are Zero-Shot Video Generators Open In Colab 11.04.2023
Segment Anything The Segment Anything Model produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image Open In Colab 10.04.2023
FollowYourPose Two-stage training scheme that can utilize image pose pair and pose-free video datasets and the pre-trained text-to-image model to obtain the pose-controllable character videos Open In Colab 07.04.2023
EVA3D High-quality unconditional 3D human generative model that only requires 2D image collections for training Open In Colab 06.04.2023
Stable Dreamfusion Using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis Open In Colab 04.04.2023
PIFuHD Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
  • arxiv
  • yt, yt
Open In Colab 26.03.2023
VideoReTalking System to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion Open In Colab 19.03.2023
Visual ChatGPT Connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting
  • arxiv
  • git, git, git, git
  • yt, yt
Open In Colab 15.03.2023
Tune-A-Video One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation Open In Colab 23.02.2023
GPEN GAN Prior Embedded Network for Blind Face Restoration in the Wild Open In Colab 15.02.2023
PyMAF-X Кegression-based approach to recovering parametric full-body models from monocular images Open In Colab 14.02.2023
Disco Diffusion A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations
  • git
  • yt, yt, yt
Open In Colab 11.02.2023
GrooVAE Some applications of machine learning for generating and manipulating beats and drum performances Open In Colab 02.02.2023
Multitrack MusicVAE The models in this notebook are capable of encoding and decoding single measures of up to 8 tracks, optionally conditioned on an underlying chord Open In Colab 02.02.2023
MusicVAE A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music Open In Colab 02.02.2023
Learning to Paint Learning to Paint With Model-based Deep Reinforcement Learning Manuel Romero
  • arxiv
  • reddit
  • yt
Open In Colab 01.02.2023
LORA Low-Rank Adaptation, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks
  • arxiv, arxiv, arxiv, arxiv, arxiv
  • git
  • hf, hf
  • medium, medium
  • pypi
  • reddit, reddit
  • yt, yt, yt, yt, yt, yt
Open In Colab 30.01.2023
Instant-NGP Instant Neural Graphics Primitives with a Multiresolution Hash Encoding Open In Colab 18.01.2023
Fourier Feature Networks Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains Open In Colab 17.01.2023
HybrIK Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation Open In Colab 01.01.2023
Composable-Diffusion Method can generate scenes at test time that are substantially more complex than those seen in training, composing sentence descriptions, object relations, human facial attributes, and even generalizing to new combinations that are rarely seen in the real world Open In Colab 23.12.2022
Score Jacobian Chaining Apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field Open In Colab 05.12.2022
Demucs Hybrid Spectrogram and Waveform Source Separation Alexandre Défossez
  • arxiv, arxiv, arxiv, arxiv
  • git, git, git, git
Open In Colab 21.11.2022
FSGAN Face Swapping GAN for face swapping and reenactment Open In Colab 16.11.2022
StyleCLIP Text-Driven Manipulation of StyleGAN Imager
  • arxiv, arxiv
  • git
  • yt, yt, yt, yt
Open In Colab 30.10.2022
AST Audio Spectrogram Transformer, the first convolution-free, purely attention-based model for audio classification
  • arxiv, arxiv, arxiv, arxiv
  • git, git, git
  • medium
  • yt, yt, yt
Open In Colab 18.10.2022
MotionDiffuse The first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods Open In Colab 13.10.2022
VToonify Leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details Open In Colab 07.10.2022
PyMAF Pyramidal Mesh Alignment Feedback loop in regression network for well-aligned body mesh recovery and extend it for the recovery of expressive full-body models Open In Colab 06.10.2022
AlphaTensor Discovering faster matrix multiplication algorithms with reinforcement learning
  • deepmind
  • yt, yt, yt, yt
Open In Colab 04.10.2022
Swin2SR Novel Swin Transformer V2, to improve SwinIR for image super-resolution, and in particular, the compressed input scenario
  • arxiv, arxiv, arxiv, arxiv
  • git, git, git
  • hf
  • kaggle, kaggle, kaggle
Open In Colab 03.10.2022
Functa From data to functa: Your data point is a function and you can treat it like one
  • arxiv
  • git, git
  • tf
Open In Colab 24.09.2022
DeOldify (photo) Colorize your own photos! Open In Colab 19.09.2022
DeOldify (video) Colorize your own videos! Jason Antic Open In Colab 19.09.2022
Real-ESRGAN Extend the powerful ESRGAN to a practical restoration application, which is trained with pure synthetic data
  • arxiv
  • git, git, git, git, git
Open In Colab 18.09.2022
IDE-3D Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
  • git, git, git, git
  • yt
Open In Colab 08.09.2022
Decision Transformers An architecture that casts the problem of RL as conditional sequence modeling Open In Colab 06.09.2022
textual-inversion An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion Open In Colab 21.08.2022
StyleGAN-Human A Data-Centric Odyssey of Human Generation Open In Colab 19.08.2022
Make-A-Scene Scene-Based Text-to-Image Generation with Human Priors
  • arxiv
  • yt
Open In Colab 12.08.2022
StyleGAN-NADA Zero-Shot non-adversarial domain adaptation of pre-trained generators Open In Colab 09.08.2022
YOLOv7 Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Open In Colab 09.08.2022
GLIP Grounded language-image pre-training model for learning object-level, language-aware, and semantic-rich visual representations Open In Colab 30.07.2022
Anycost GAN Interactive natural image editing Open In Colab 20.07.2022
GFPGAN Towards Real-World Blind Face Restoration with Generative Facial Prior Open In Colab 13.07.2022
EPro-PnP Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation Open In Colab 12.07.2022
Text2Human Text-driven controllable framework for a high-quality and diverse human generation Open In Colab 04.07.2022
VQ-Diffusion Based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model
  • arxiv, arxiv
  • git, git
Open In Colab 30.06.2022
OPT Open Pre-trained Transformers is a family of NLP models trained on billions of tokens of text obtained from the internet Open In Colab 29.06.2022
Customizing a Transformer Encoder We will learn how to customize the encoder to employ new network architectures Chen Chen
  • arxiv
  • git
Open In Colab 22.06.2022
MTTR End-to-End Referring Video Object Segmentation with Multimodal Transformers
  • arxiv, arxiv, arxiv
  • git
  • hf
  • yt
Open In Colab 20.06.2022
SwinIR Image Restoration Using Swin Transformer
  • arxiv, arxiv
  • git, git, git
Open In Colab 17.06.2022
VRT A Video Restoration Transformer
  • arxiv
  • git, git, git
Open In Colab 15.06.2022
Omnivore A single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters Open In Colab 14.06.2022
Dream Fields Zero-Shot Text-Guided Object Generation Open In Colab 10.06.2022
Detic Detecting Twenty-thousand Classes using Image-level Supervision
  • arxiv
  • git
Open In Colab 07.06.2022
SimCTG Contrastive training objective to calibrate the model's representation space, and a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text
  • arxiv, arxiv
  • git, git
  • hf, hf, hf
  • neurips
  • pypi
Open In Colab 04.06.2022
T0 Multitask Prompted Training Enables Zero-Shot Task Generalization
  • arxiv
  • yt, yt
Open In Colab 29.05.2022
AvatarCLIP A zero-shot text-driven framework for 3D avatar generation and animation Open In Colab 15.05.2022
Text2Mesh Text-Driven Neural Stylization for Meshes Open In Colab 14.05.2022
T5 Text-To-Text Transfer Transformer
  • arxiv
  • git
  • tf
Open In Colab 11.05.2022
XLS-R Self-supervised Cross-lingual Speech Representation Learning at Scale Open In Colab 10.05.2022
MAGIC Training-free framework, iMAge-Guided text generatIon with CLIP, for plugging in visual controls in the generation process and enabling LMs to perform multimodal tasks in a zero-shot manner
  • arxiv
Open In Colab 02.05.2022
DiffCSE Unsupervised contrastive learning framework for learning sentence embeddings
  • arxiv, arxiv, arxiv
  • git
  • hf
  • twitter
Open In Colab 24.04.2022
ViDT+ An Extendable, Efficient and Effective Transformer-based Object Detector
  • arxiv, arxiv
  • git, git
Open In Colab 20.04.2022
BasicVSR++ Redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment Open In Colab 18.04.2022
NAFNet Nonlinear Activation Free Network for Image Restoration
  • arxiv, arxiv
  • pwc, pwc
Open In Colab 15.04.2022
Panini-Net GAN Prior based Degradation-Aware Feature Interpolation for Face Restoration
  • arxiv
  • git, git
Open In Colab 13.04.2022
E2FGVI An End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules Open In Colab 06.04.2022
LDM High-Resolution Image Synthesis with Latent Diffusion Models
  • arxiv, arxiv, arxiv
  • git, git, git, git
  • hf
Open In Colab 04.04.2022
GP-UNIT Novel framework, Generative Prior-guided UNsupervised Image-to-image Translation, to improve the overall quality and applicability of the translation algorithm Open In Colab 02.04.2022
DualStyleGAN More challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain Open In Colab 24.03.2022
CLIPasso Semantically-Aware Object Sketching Open In Colab 21.03.2022
StyleSDF A high resolution, 3D-consistent image and shape generation technique Open In Colab 05.03.2022
Disentangled Lifespan Face Synthesis LFS model is proposed to disentangle the key face characteristics including shape, texture and identity so that the unique shape and texture age transformations can be modeled effectively Open In Colab 22.02.2022
ClipCap CLIP Prefix for Image Captioning Open In Colab 15.02.2022
ROMP Monocular, One-stage, Regression of Multiple 3D People
  • arxiv, arxiv, arxiv
  • git, git, git
  • yt, yt, yt
Open In Colab 11.02.2022
Mask2Former Masked-attention Mask Transformer for Universal Image Segmentation Open In Colab 09.02.2022
BertViz Tool that visualizes attention at multiple scales, each of which provides a unique perspective on the attention mechanism Jesse Vig
  • arxiv, arxiv, arxiv, arxiv, arxiv
  • medium
  • pypi
  • yt
Open In Colab 05.02.2022
JoJoGAN One Shot Face Stylization
  • arxiv
  • git, git
Open In Colab 02.02.2022
Pose with Style Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN Open In Colab 19.01.2022
ConvNeXt A pure ConvNet model constructed entirely from standard ConvNet modules
  • arxiv
  • git, git
  • hf
  • yt, yt, yt
Open In Colab 19.01.2022
diffsort Differentiable Sorting Networks
  • arxiv, arxiv
  • yt
Open In Colab 17.01.2022
Taming Transformers for High-Resolution Image Synthesis We combine the efficiancy of convolutional approaches with the expressivity of transformers by introducing a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer Open In Colab 13.01.2022
GFM Glance and Focus Matting network, which employs a shared encoder and two separate decoders to learn both tasks in a collaborative manner for end-to-end natural image matting
  • arxiv
  • git, git, git, git, git
  • yt
Open In Colab 05.01.2022
RealBasicVSR Investigating Tradeoffs in Real-World Video Super-Resolution
  • arxiv
  • hf
  • reddit
Open In Colab 25.12.2021
GLIDE Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
  • arxiv
  • yt
Open In Colab 22.12.2021
Nerfies First method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones Open In Colab 06.12.2021
HyperStyle A hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space Open In Colab 03.12.2021
encoder4editing Designing an Encoder for StyleGAN Image Manipulation
  • arxiv
  • git
Open In Colab 02.12.2021
StyleCariGAN Caricature Generation via StyleGAN Feature Map Modulation Open In Colab 30.11.2021
CartoonGAN The implementation of the cartoon GAN model with PyTorch Tobias Sunderdiek Open In Colab 24.11.2021
SimSwap An efficient framework, called Simple Swap, aiming for generalized and high fidelity face swapping
  • arxiv
  • git
Open In Colab 24.11.2021
RVM Robust High-Resolution Video Matting with Temporal Guidance Open In Colab 24.11.2021
RVM Robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance Open In Colab 24.11.2021
AnimeGANv2 An improved version of AnimeGAN - it prevents the generation of high-frequency artifacts by simply changing the normalization of features in the network Open In Colab 17.11.2021
SOAT StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN
  • arxiv
  • git, git
  • hf
Open In Colab 13.11.2021
Arnheim Generative Art Using Neural Visual Grammars and Dual Encoders
  • arxiv, arxiv, arxiv, arxiv, arxiv
  • git
  • wiki
  • yt, yt, yt, yt
Open In Colab 11.11.2021
StyleGAN 2 Generation of faces, cars, etc. Mikael Christensen
  • arxiv
  • git
  • yt
Open In Colab 05.11.2021
ByteTrack Multi-Object Tracking by Associating Every Detection Box Open In Colab 30.10.2021
GPT-2 Retrain an advanced text generating neural network on any text dataset using gpt-2-simple! Max Woolf Open In Colab 18.10.2021
ConvMixer An extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network
  • arxiv
  • git, git
  • medium
  • yt
Open In Colab 06.10.2021
IC-GAN Instance-Conditioned GAN Open In Colab 01.10.2021
Skillful Precipitation Nowcasting Using Deep Generative Models of Radar Open-sourced dataset and model snapshot for precipitation nowcasting Open In Colab 29.09.2021
Live Speech Portraits Real-Time Photorealistic Talking-Head Animation Open In Colab 26.09.2021
StylEx Training a GAN to explain a classifier in StyleSpace Open In Colab 25.08.2021
VITS Parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models Open In Colab 23.08.2021
Bringing Old Photo Back to Life Restoring old photos that suffer from severe degradation through a deep learning approach Open In Colab 13.07.2021
PTI Pivotal Tuning Inversion enables employing off-the-shelf latent based semantic editing techniques on real images using StyleGAN
  • arxiv
  • git, git
Open In Colab 01.07.2021
TediGAN Framework for multi-modal image generation and manipulation with textual descriptions
  • arxiv, arxiv
  • git, git, git, git
  • yt
Open In Colab 30.06.2021
SCALE Modeling Clothed Humans with a Surface Codec of Articulated Local Elements Open In Colab 26.06.2021
CogView Mastering Text-to-Image Generation via Transformers Open In Colab 21.06.2021
GANs N' Roses Stable, Controllable, Diverse Image to Image Translation
  • arxiv, arxiv
  • git, git
  • yt
Open In Colab 19.06.2021
Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes A method to stylize images by optimizing parameterized brushstrokes instead of pixels Open In Colab 02.06.2021
Pixel2Style2Pixel Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation Open In Colab 01.06.2021
Fine-tuning a BERT We will work through fine-tuning a BERT model using the tensorflow-models PIP package
  • arxiv
  • tf
Open In Colab 25.05.2021
ReStyle A Residual-Based StyleGAN Encoder via Iterative Refinement Open In Colab 21.05.2021
Motion Representations for Articulated Animation Novel motion representations for animating articulated objects consisting of distinct parts Open In Colab 29.04.2021
SAM Age Transformation Using a Style-Based Regression Model Open In Colab 26.04.2021
Geometry-Free View Synthesis Is a geometric model required to synthesize novel views from a single image? Open In Colab 22.04.2021
NeRViS An algorithm for full-frame video stabilization by first estimating dense warp fields Open In Colab 11.04.2021
NeX View synthesis based on enhancements of multiplane image that can reproduce NeXt-level view-dependent effects in real time Open In Colab 25.03.2021
Score SDE Score-Based Generative Modeling through Stochastic Differential Equations
  • arxiv, arxiv, arxiv, arxiv
  • git, git
  • yt
Open In Colab 18.03.2021
Talking Head Anime from a Single Image The network takes as input an image of an anime character's face and a desired pose, and it outputs another image of the same character in the given pose Pramook Khungurn Open In Colab 23.02.2021
NFNet An adaptive gradient clipping technique, a significantly improved class of Normalizer-Free ResNets
  • arxiv, arxiv
  • git
  • yt, yt
Open In Colab 17.02.2021
RITM Simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps
  • arxiv
  • git
  • pwc, pwc
Open In Colab 13.02.2021
CLIP A neural network which efficiently learns visual concepts from natural language supervision Open In Colab 29.01.2021
Adversarial Patch A method to create universal, robust, targeted adversarial image patches in the real world Tom Brown
  • arxiv
Open In Colab 27.01.2021
MSG-Net Multi-style Generative Network with a novel Inspiration Layer, which retains the functionality of optimization-based approaches and has the fast speed of feed-forward networks Open In Colab 25.01.2021
Neural Style Transfer Implementation of Neural Style Transfer in Keras 2.0+ Somshubra Majumdar
  • arxiv, arxiv, arxiv
Open In Colab 22.01.2021
SkyAR A vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles Zhengxia Zou Open In Colab 18.01.2021
MusicXML Documentation The goal of this notebook is to explore one of the magenta libraries for music Open In Colab 08.01.2021
SVG VAE A colab demo for the SVG VAE model Raphael Gontijo Lopes Open In Colab 08.01.2021
Neural Magic Eye Learning to See and Understand the Scene Behind an Autostereogram Open In Colab 01.01.2021
FGVC Method first extracts and completes motion edges, and then uses them to guide piecewise-smooth flow completion with sharp edges Open In Colab 30.12.2020
VIBE Video Inference for Body Pose and Shape Estimation, which makes use of an existing large-scale motion capture dataset together with unpaired, in-the-wild, 2D keypoint annotations
  • arxiv
  • git, git, git, git, git
  • pwc
  • yt, yt, yt, yt, yt, yt, yt, yt, yt
Open In Colab 23.12.2020
SeFa A closed-form approach for unsupervised latent semantic factorization in GANs Open In Colab 06.12.2020
Stylized Neural Painting An image-to-painting translation method that generates vivid and realistic painting artworks with controllable styles Open In Colab 01.12.2020
BiT Big Transfer: General Visual Representation Learning
  • arxiv, arxiv
  • hf
  • medium
  • yt, yt, yt
Open In Colab 12.11.2020
LaSAFT Latent Source Attentive Frequency Transformation for Conditioned Source Separation Woosung Choi Open In Colab 01.11.2020
Lifespan Age Transformation Synthesis Multi-domain image-to-image generative adversarial network architecture, whose learned latent space models a continuous bi-directional aging process Open In Colab 31.10.2020
IDInvert In-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing Open In Colab 22.10.2020
HiGAN Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis Open In Colab 14.10.2020
InterFaceGAN Interpreting the Latent Space of GANs for Semantic Face Editing Open In Colab 13.10.2020
Instance-aware Image Colorization Novel deep learning framework to achieve instance-aware colorization Jheng-Wei Su Open In Colab 30.08.2020
MnasNet Automated mobile neural architecture search approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency Open In Colab 27.08.2020
MoCo Momentum Contrast for unsupervised visual representation learning
  • arxiv, arxiv, arxiv
  • git
  • yt, yt, yt
Open In Colab 20.08.2020
CAPE Learning to Dress 3D People in Generative Clothing Open In Colab 05.08.2020
Rewriting a Deep Generative Model We ask if a deep network can be reprogrammed to follow different rules, by enabling a user to directly change the weights, instead of training with a data set Open In Colab 01.08.2020
SIREN Implicit Neural Representations with Periodic Activation Functions Open In Colab 25.06.2020
3D Photo Inpainting Method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view Open In Colab 04.05.2020
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose Deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V)
  • arxiv
  • git, git, git
  • yt
Open In Colab 29.04.2020
Motion Supervised co-part Segmentation A self-supervised deep learning method for co-part segmentation
  • arxiv
  • git
  • yt
Open In Colab 07.04.2020
Onsets and Frames Onsets and Frames is an automatic music transcription framework with piano and drums models Open In Colab 02.04.2020
FBA Matting Low-cost modification to alpha matting networks to also predict the foreground and background colours
  • arxiv
  • git
  • hf
  • pwc
Open In Colab 19.03.2020
BERT score An automatic evaluation metric for text generation Tianyi Zhang
  • arxiv
  • pypi
Open In Colab 05.03.2020
Deep Image Prior Structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning Open In Colab 30.10.2019
ProxylessNAS Directly learn the architectures for large-scale target tasks and target hardware platforms
  • arxiv, arxiv, arxiv
  • medium
  • pt
  • reddit
  • yt, yt
Open In Colab 29.10.2019
Generating Piano Music with Transformer This Colab notebook lets you play with pretrained Transformer models for piano music generation, based on the Music Transformer Open In Colab 16.09.2019
SSGAN Self-Supervised GANs via Auxiliary Rotation Loss
  • arxiv, arxiv, arxiv, arxiv, arxiv, arxiv, arxiv, arxiv, arxiv
  • kaggle
  • medium
  • yt
Open In Colab 20.06.2019
S3GAN High-Fidelity Image Generation With Fewer Labels
  • arxiv
  • kaggle, kaggle
  • tf
Open In Colab 10.06.2019
HMR End-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image Open In Colab 15.03.2019
GANSynth This notebook is a demo GANSynth, which generates audio with Generative Adversarial Networks Jesse Engel Open In Colab 25.02.2019
AmoebaNet Regularized Evolution for Image Classifier Architecture Search
  • arxiv
  • medium
  • yt, yt, yt
Open In Colab 25.10.2018
Latent Constraints Conditional Generation from Unconditional Generative Models Open In Colab 27.11.2017
Performance RNN This notebook shows you how to generate new performed compositions from a trained model Open In Colab 11.07.2017
NSynth This colab notebook has everything you need to upload your own sounds and use NSynth models to reconstruct and interpolate between them Open In Colab 06.04.2017

教程

教程
name description authors links colaboratory update
Kornia Library is composed by a subset of packages containing operators that can be inserted within neural networks to train models to perform image transformations, epipolar geometry, depth estimation, and low-level image processing such as filtering and edge detection that operate directly on tensors Open In Colab 28.11.2025
LM Evaluation Harness Framework for few-shot evaluation of language models. Lintang Sutawika Open In Colab 26.11.2025
Magenta RT An open-weights live music model that allows you to interactively create, control and perform music in the moment Chris Donahue Open In Colab 26.11.2025
SHAP SHapley Additive exPlanations is a game theoretic approach to explain the output of any machine learning model
  • arxiv, arxiv, arxiv, arxiv
  • docs
  • neurips
  • pypi
  • yt, yt, yt, yt, yt, yt
Open In Colab 20.11.2025
Nano Banana An image generation and editing model powered by generative artificial intelligence and developed by Google DeepMind Guillaume Vernade Open In Colab 20.11.2025
NeMo A conversational AI toolkit built for researchers working on automatic speech recognition, natural language processing, and text-to-speech synthesis Open In Colab 17.11.2025
PyTerrier A Python framework for performing information retrieval experiments
  • arxiv
  • docs
  • git, git, git, git, git, git, git
Open In Colab 13.11.2025
Transfer learning and fine-tuning You will learn how to classify images of cats and dogs by using transfer learning from a pre-trained network François Chollet
  • pwc
  • tf
  • wiki
Open In Colab 11.11.2025
Datasets A Community Library for Natural Language Processing
  • arxiv
  • docs
  • hf
  • kaggle
  • pypi
  • yt
Open In Colab 10.11.2025
Agent Starter Pack Collection of production-ready Generative AI Agent templates built for Google Cloud Kristopher Overholt
  • medium, medium
  • pypi
  • reddit
  • yt, yt, yt, yt
Open In Colab 06.11.2025
Google Cloud Text-to-Speech Enables easy integration of Google text recognition technologies into developer applications Open In Colab 06.11.2025
Imagen 4 Text-to-image model, with photorealistic images, near real-time speed, and sharper clarity Katie Nguyen Open In Colab 06.11.2025
Lyria 2 Delivers high-fidelity music and professional-grade audio, capturing subtle nuances across a range of genres and intricate compositions Katie Nguyen
  • deepmind
  • docs
  • reddit
  • yt
Open In Colab 06.11.2025
Vertex AI Search brings together the power of deep information retrieval, state-of-the-art natural language processing, and the latest in LLM processing to understand user intent and return the most relevant results for the user Megha Agarwal Open In Colab 06.11.2025
ADK Collection provides ready-to-use agents built on top of the Agent Development Kit, designed to accelerate your development process Open In Colab 05.11.2025
TRL Set of tools to train trans

常见问题

相似工具推荐

openclaw

OpenClaw 是一款专为个人打造的本地化 AI 助手,旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚,能够直接接入你日常使用的各类通讯渠道,包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息,OpenClaw 都能即时响应,甚至支持在 macOS、iOS 和 Android 设备上进行语音交互,并提供实时的画布渲染功能供你操控。 这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地,用户无需依赖云端服务即可享受快速、私密的智能辅助,真正实现了“你的数据,你做主”。其独特的技术亮点在于强大的网关架构,将控制平面与核心助手分离,确保跨平台通信的流畅性与扩展性。 OpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者,以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力(支持 macOS、Linux 及 Windows WSL2),即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你

349.3k|★★★☆☆|昨天
Agent开发框架图像

stable-diffusion-webui

stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面,旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点,将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。 无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师,还是想要深入探索模型潜力的开发者与研究人员,都能从中获益。其核心亮点在于极高的功能丰富度:不仅支持文生图、图生图、局部重绘(Inpainting)和外绘(Outpainting)等基础模式,还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外,它内置了 GFPGAN 和 CodeFormer 等人脸修复工具,支持多种神经网络放大算法,并允许用户通过插件系统无限扩展能力。即使是显存有限的设备,stable-diffusion-webui 也提供了相应的优化选项,让高质量的 AI 艺术创作变得触手可及。

162.1k|★★★☆☆|2天前
开发框架图像Agent

everything-claude-code

everything-claude-code 是一套专为 AI 编程助手(如 Claude Code、Codex、Cursor 等)打造的高性能优化系统。它不仅仅是一组配置文件,而是一个经过长期实战打磨的完整框架,旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。 通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能,everything-claude-code 能显著提升 AI 在复杂任务中的表现,帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略,使得模型响应更快、成本更低,同时有效防御潜在的攻击向量。 这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库,还是需要 AI 协助进行安全审计与自动化测试,everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目,它融合了多语言支持与丰富的实战钩子(hooks),让 AI 真正成长为懂上

143.9k|★★☆☆☆|今天
开发框架Agent语言模型

ComfyUI

ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎,专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式,采用直观的节点式流程图界面,让用户通过连接不同的功能模块即可构建个性化的生成管线。 这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景,也能自由组合模型、调整参数并实时预览效果,轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性,不仅支持 Windows、macOS 和 Linux 全平台,还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构,并率先支持 SDXL、Flux、SD3 等前沿模型。 无论是希望深入探索算法潜力的研究人员和开发者,还是追求极致创作自由度的设计师与资深 AI 绘画爱好者,ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能,使其成为当前最灵活、生态最丰富的开源扩散模型工具之一,帮助用户将创意高效转化为现实。

107.9k|★★☆☆☆|昨天
开发框架图像Agent

markitdown

MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具,专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片(含 OCR)、音频(含语音转录)、HTML 乃至 YouTube 链接等多种格式的解析,能够精准提取文档中的标题、列表、表格和链接等关键结构信息。 在人工智能应用日益普及的今天,大语言模型(LLM)虽擅长处理文本,却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点,它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式,成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外,它还提供了 MCP(模型上下文协议)服务器,可无缝集成到 Claude Desktop 等 LLM 应用中。 这款工具特别适合开发者、数据科学家及 AI 研究人员使用,尤其是那些需要构建文档检索增强生成(RAG)系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性,但其核心优势在于为机器

93.4k|★★☆☆☆|昨天
插件开发框架

LLMs-from-scratch

LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目,旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型(LLM)。它不仅是同名技术著作的官方代码库,更提供了一套完整的实践方案,涵盖模型开发、预训练及微调的全过程。 该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型,却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码,用户能够透彻掌握 Transformer 架构、注意力机制等关键原理,从而真正理解大模型是如何“思考”的。此外,项目还包含了加载大型预训练权重进行微调的代码,帮助用户将理论知识延伸至实际应用。 LLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API,而是渴望探究模型构建细节的技术人员而言,这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计:将复杂的系统工程拆解为清晰的步骤,配合详细的图表与示例,让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础,还是为未来研发更大规模的模型做准备

90.1k|★★★☆☆|昨天
语言模型图像Agent