Dreamfusion

7.8k 694

04 May, 2024

Python

What is Stable Dreamfusion ?

Stable Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.

Stable Dreamfusion Demo

Colab notebooks:

Instant-NGP backbone (-O): !Instant-NGP Backbone
Vanilla NeRF backbone (-O2): !Vanilla Backbone

Install Stable Dreamfusion

git clone https://github.com/ashawkey/stable-dreamfusion.git

cd stable-dreamfusion

Optional: create a python virtual environment

To avoid python package conflicts, we recommend using a virtual environment, e.g.: using conda or venv:

python -m venv venv_stable-dreamfusion

source venv_stable-dreamfusion/bin/activate # you need to repeat this step for every new terminal

Install with pip

pip install -r requirements.txt

Download pre-trained models

To use image-conditioned 3D generation, you need to download some pretrained checkpoints manually:

Zero-1-to-3 for diffusion backend. We use zero123-xl.ckpt by default, and it is hard-coded in guidance/zero123_utils.py .

    cd pretrained/zero123

    wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt

    ```

* [Omnidata](https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/torch) for depth and normal prediction.
    These ckpts are hardcoded in `preprocess_image.py` .



```bash

    mkdir pretrained/omnidata

    cd pretrained/omnidata

    # assume gdown is installed

    gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt

    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt

    ```

To use [DeepFloyd-IF](https://github.com/deep-floyd/IF), you need to accept the usage conditions from [hugging face](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0), and login with `huggingface-cli login` in command line.

For DMTet, we port the pre-generated `32/64/128` resolution tetrahedron grids under `tets` .

The 256 resolution one can be found [here](https://drive.google.com/file/d/1lgvEKNdsbW5RS4gVxJbgBS4Ac92moGSa/view?usp=sharing).

### Build extension (optional)

By default, we use [ `load` ](https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime.

We also provide the `setup.py` to build each extension:

```sh

cd stable-dreamfusion

# install all extension modules

bash scripts/install_ext.sh

# if you want to install manually, here is an example:

pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)

Taichi backend (optional)

Use Taichi backend for Instant-NGP. It achieves comparable performance to CUDA implementation while No CUDA build is required. Install Taichi with pip:

pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly

Trouble Shooting:

we assume working with the latest version of all dependencies, if you meet any problems from a specific dependency, please try to upgrade it first (e.g., pip install -U diffusers). If the problem still holds, reporting a bug issue will be appreciated!
[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped): this usually indicates problems in OpenGL installation. Try to re-install Nvidia driver, or use nvidia-docker as suggested in https://github.com/ashawkey/stable-dreamfusion/issues/131 if you are using a headless server.
TypeError: xxx_forward(): incompatible function arguments： this happens when we update the CUDA source and you used setup.py to install the extensions earlier. Try to re-install the corresponding extension (e.g., pip install ./gridencoder).

Tested environments

Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.

Usage

First time running will take some time to compile the CUDA extensions.

#### stable-dreamfusion setting

### Instant-NGP NeRF Backbone

# + faster rendering speed

# + less GPU memory (~16G)

# - need to build CUDA extensions (a CUDA-free Taichi backend is available)

## train with text prompt (with the default settings)

# `-O` equals `--cuda_ray --fp16`

# `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.

python main.py --text "a hamburger" --workspace trial -O

# reduce stable-diffusion memory usage with `--vram_O`

# enable various vram savings (https://huggingface.co/docs/diffusers/optimization/fp16).

python main.py --text "a hamburger" --workspace trial -O --vram_O

# You can collect arguments in a file. You can override arguments by specifying them after `--file`. Note that quoted strings can't be loaded from .args files...

python main.py --file scripts/res64.args --workspace trial_awesome_hamburger --text "a photo of an awesome hamburger"

# use CUDA-free Taichi backend with `--backbone grid_taichi`

python3 main.py --text "a hamburger" --workspace trial -O --backbone grid_taichi

# choose stable-diffusion version (support 1.5, 2.0 and 2.1, default is 2.1 now)

python main.py --text "a hamburger" --workspace trial -O --sd_version 1.5

# use a custom stable-diffusion checkpoint from hugging face:

python main.py --text "a hamburger" --workspace trial -O --hf_key andite/anything-v4.0

# use DeepFloyd-IF for guidance (experimental):

python main.py --text "a hamburger" --workspace trial -O --IF

python main.py --text "a hamburger" --workspace trial -O --IF --vram_O # requires ~24G GPU memory

# we also support negative text prompt now:

python main.py --text "a rose" --negative "red" --workspace trial -O

## after the training is finished:

# test (exporting 360 degree video)

python main.py --workspace trial -O --test

# also save a mesh (with obj, mtl, and png texture)

python main.py --workspace trial -O --test --save_mesh

# test with a GUI (free view control!)

python main.py --workspace trial -O --test --gui

### Vanilla NeRF backbone

# + pure pytorch, no need to build extensions!

# - slow rendering speed

# - more GPU memory

## train

# `-O2` equals `--backbone vanilla`

python main.py --text "a hotdog" --workspace trial2 -O2

# if CUDA OOM, try to reduce NeRF sampling steps (--num_steps and --upsample_steps)

python main.py --text "a hotdog" --workspace trial2 -O2 --num_steps 64 --upsample_steps 0

## test

python main.py --workspace trial2 -O2 --test

python main.py --workspace trial2 -O2 --test --save_mesh

python main.py --workspace trial2 -O2 --test --gui # not recommended, FPS will be low.

### DMTet finetuning

## use --dmtet and --init_with <nerf checkpoint> to finetune the mesh at higher reslution

python main.py -O --text "a hamburger" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial/checkpoints/df.pth

## init dmtet with a mesh to generate texture

# require install of cubvh: pip install git+https://github.com/ashawkey/cubvh

# remove --lock_geo to also finetune geometry, but performance may be bad.

python main.py -O --text "a white bunny with red eyes" --workspace trial_dmtet_mesh --dmtet --iters 5000 --init_with ./data/bunny.obj --lock_geo

## test & export the mesh

python main.py -O --text "a hamburger" --workspace trial_dmtet --dmtet --iters 5000 --test --save_mesh

## gui to visualize dmtet

python main.py -O --text "a hamburger" --workspace trial_dmtet --dmtet --iters 5000 --test --gui

### Image-conditioned 3D Generation

## preprocess input image

# note: the results of image-to-3D is dependent on zero-1-to-3's capability. For best performance, the input image should contain a single front-facing object, it should have square aspect ratio, with <1024 pixel resolution. Check the examples under ./data.

# this will exports `<image>_rgba.png`, `<image>_depth.png`, and `<image>_normal.png` to the directory containing the input image.

python preprocess_image.py <image>.png

python preprocess_image.py <image>.png --border_ratio 0.4 # increase border_ratio if the center object appears too large and results are unsatisfying.

## zero123 train

# pass in the processed <image>_rgba.png by --image and do NOT pass in --text to enable zero-1-to-3 backend.

python main.py -O --image <image>_rgba.png --workspace trial_image --iters 5000

# if the image is not exactly front-view (elevation = 0), adjust default_polar (we use polar from 0 to 180 to represent elevation from 90 to -90)

python main.py -O --image <image>_rgba.png --workspace trial_image --iters 5000 --default_polar 80

# by default we leverage monocular depth estimation to aid image-to-3d, but if you find the depth estimation inaccurate and harms results, turn it off by:

python main.py -O --image <image>_rgba.png --workspace trial_image --iters 5000 --lambda_depth 0

python main.py -O --image <image>_rgba.png --workspace trial_image_dmtet --dmtet --init_with trial_image/checkpoints/df.pth

## zero123 with multiple images

python main.py -O --image_config config/<config>.csv --workspace trial_image --iters 5000

## render <num> images per batch (default 1)

python main.py -O --image_config config/<config>.csv --workspace trial_image --iters 5000 --batch_size 4

# providing both --text and --image enables stable-diffusion backend (similar to make-it-3d)

python main.py -O --image hamburger_rgba.png --text "a DSLR photo of a delicious hamburger" --workspace trial_image_text --iters 5000

python main.py -O --image hamburger_rgba.png --text "a DSLR photo of a delicious hamburger" --workspace trial_image_text_dmtet --dmtet --init_with trial_image_text/checkpoints/df.pth

## test / visualize

python main.py -O --image <image>_rgba.png --workspace trial_image_dmtet --dmtet --test --save_mesh

python main.py -O --image <image>_rgba.png --workspace trial_image_dmtet --dmtet --test --gui

### Debugging

# Can save guidance images for debugging purposes. These get saved in trial_hamburger/guidance.

# Warning: this slows down training considerably and consumes lots of disk space!

python main.py --text "a hamburger" --workspace trial_hamburger -O --vram_O --save_guidance --save_guidance_interval 5 # save every 5 steps