Image-to-3D
Sharp
amael-apple commited on
Commit
e154a2f
·
1 Parent(s): bf30f06

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -71
README.md CHANGED
@@ -1,71 +1,71 @@
1
- ---
2
- license: apple-amlr
3
- library_name: sharp
4
- ---
5
-
6
-
7
- # Sharp Monocular View Synthesis in Less Than a Second
8
-
9
- [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://apple.github.io/ml-sharp/)
10
- [![arXiv](https://img.shields.io/badge/arXiv-2512.10685-b31b1b.svg)](https://arxiv.org/abs/2512.10685)
11
-
12
- This software project accompanies the research paper: _Sharp Monocular View Synthesis in Less Than a Second_
13
- by _Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy,
14
- Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun_.
15
-
16
- ![](data/teaser.jpg)
17
-
18
- We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25–34% and DISTS by 21–43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.
19
-
20
- ## Getting started
21
-
22
- Please, follow the steps in the [code repository](https://github.com/apple/ml-sharp) to set up your environment. Then you can download the checkpoint from the _Files and versions_ tab above, or use the `huggingface-hub` CLI:
23
-
24
- ```bash
25
- pip install huggingface-hub
26
- huggingface-cli download --local-dir . apple/Sharp
27
- ```
28
-
29
-
30
- To run prediction:
31
-
32
- ```
33
- sharp predict -i /path/to/input/images -o /path/to/output/gaussians -c sharp_2572gikvuh.pt
34
- ```
35
-
36
- The results will be 3D gaussian splats (3DGS) in the output folder. The 3DGS `.ply` files are compatible to various public 3DGS renderers. We follow the OpenCV coordinate convention (x right, y down, z forward). The 3DGS scene center is roughly at (0, 0, +z). When dealing with 3rdparty renderers, please scale and rotate to re-center the scene accordingly.
37
-
38
- ### Rendering trajectories (CUDA GPU only)
39
-
40
- Additionally you can render videos with a camera trajectory. While the gaussians prediction works for all CPU, CUDA, and MPS, rendering videos via the `--render` option currently requires a CUDA GPU. The gsplat renderer takes a while to initialize at the first launch.
41
-
42
- ```
43
- sharp predict -i /path/to/input/images -o /path/to/output/gaussians --render -c sharp_2572gikvuh.pt
44
-
45
- # Or from the intermediate gaussians:
46
- sharp render -i /path/to/output/gaussians -o /path/to/output/renderings -c sharp_2572gikvuh.pt
47
- ```
48
-
49
-
50
- ## Evaluation
51
-
52
- Please refer to the paper for both quantitative and qualitative evaluations.
53
- Additionally, please check out this [qualitative examples page](https://apple.github.io/ml-sharp/) containing several video comparisons against related work.
54
-
55
- ## Citation
56
-
57
- If you find our work useful, please cite the following paper:
58
-
59
- ```bibtex
60
- @inproceedings{Sharp2025:arxiv,
61
- title = {Sharp Monocular View Synthesis in Less Than a Second},
62
- author = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoyand Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
63
- journal = {arXiv preprint arXiv:2512.10685},
64
- year = {2025},
65
- url = {https://arxiv.org/abs/2512.10685},
66
- }
67
- ```
68
-
69
- ## Acknowledgements
70
-
71
- Our codebase is built using multiple opensource contributions, please see [ACKNOWLEDGEMENTS](ACKNOWLEDGEMENTS) for more details.
 
1
+ ---
2
+ license: apple-amlr
3
+ library_name: sharp
4
+ ---
5
+
6
+
7
+ # Sharp Monocular View Synthesis in Less Than a Second
8
+
9
+ [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://apple.github.io/ml-sharp/)
10
+ [![arXiv](https://img.shields.io/badge/arXiv-2512.10685-b31b1b.svg)](https://arxiv.org/abs/2512.10685)
11
+
12
+ This software project accompanies the research paper: _Sharp Monocular View Synthesis in Less Than a Second_
13
+ by _Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy,
14
+ Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun_.
15
+
16
+ ![](teaser.jpg)
17
+
18
+ We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25–34% and DISTS by 21–43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.
19
+
20
+ ## Getting started
21
+
22
+ Please, follow the steps in the [code repository](https://github.com/apple/ml-sharp) to set up your environment. Then you can download the checkpoint from the _Files and versions_ tab above, or use the `huggingface-hub` CLI:
23
+
24
+ ```bash
25
+ pip install huggingface-hub
26
+ huggingface-cli download --local-dir . apple/Sharp
27
+ ```
28
+
29
+
30
+ To run prediction:
31
+
32
+ ```
33
+ sharp predict -i /path/to/input/images -o /path/to/output/gaussians -c sharp_2572gikvuh.pt
34
+ ```
35
+
36
+ The results will be 3D gaussian splats (3DGS) in the output folder. The 3DGS `.ply` files are compatible to various public 3DGS renderers. We follow the OpenCV coordinate convention (x right, y down, z forward). The 3DGS scene center is roughly at (0, 0, +z). When dealing with 3rdparty renderers, please scale and rotate to re-center the scene accordingly.
37
+
38
+ ### Rendering trajectories (CUDA GPU only)
39
+
40
+ Additionally you can render videos with a camera trajectory. While the gaussians prediction works for all CPU, CUDA, and MPS, rendering videos via the `--render` option currently requires a CUDA GPU. The gsplat renderer takes a while to initialize at the first launch.
41
+
42
+ ```
43
+ sharp predict -i /path/to/input/images -o /path/to/output/gaussians --render -c sharp_2572gikvuh.pt
44
+
45
+ # Or from the intermediate gaussians:
46
+ sharp render -i /path/to/output/gaussians -o /path/to/output/renderings -c sharp_2572gikvuh.pt
47
+ ```
48
+
49
+
50
+ ## Evaluation
51
+
52
+ Please refer to the paper for both quantitative and qualitative evaluations.
53
+ Additionally, please check out this [qualitative examples page](https://apple.github.io/ml-sharp/) containing several video comparisons against related work.
54
+
55
+ ## Citation
56
+
57
+ If you find our work useful, please cite the following paper:
58
+
59
+ ```bibtex
60
+ @inproceedings{Sharp2025:arxiv,
61
+ title = {Sharp Monocular View Synthesis in Less Than a Second},
62
+ author = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoyand Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
63
+ journal = {arXiv preprint arXiv:2512.10685},
64
+ year = {2025},
65
+ url = {https://arxiv.org/abs/2512.10685},
66
+ }
67
+ ```
68
+
69
+ ## Acknowledgements
70
+
71
+ Our codebase is built using multiple opensource contributions, please see [ACKNOWLEDGEMENTS](ACKNOWLEDGEMENTS) for more details.