New research model out ! I uploaded a new Branchy model based on Phi-2 for faster inference using Early Exit. Check it out : valcore/Branchy-Phi-2. I also uploaded a Hugging Face Space to try it out : valcore/Branchy-phi-2, unfortunately inference is very slow on free tier. Let me know what you are thinking about it !
Explaining a new state-of-the-art monocular depth estimation model: Depth Anything β¨ π§Ά Before we begin: Depth Anything is recently integrated to π€ transformers and you can use it with three lines of code! β¨
The model's success heavily depends on unlocking the use of unlabeled datasets, although initially the authors used self-training and failed. What the authors have done: β° Train a teacher model on labelled dataset β° Guide the student using teacher and also use unlabelled datasets pseudolabelled by the teacher However, this was the cause of the failure, as both architectures were similar, the outputs were the same. So the authors have added a more difficult optimization target for student to learn additional knowledge on unlabeled images that went through color jittering, distortions, Gaussian blurring and spatial distortion, so it can learn more invariant representations from them. The architecture consists of DINOv2 encoder to extract the features followed by DPT decoder. At first, they train the teacher model on labelled images, and then they jointly train the student model and add in the dataset pseudo-labelled by ViT-L. Thanks to this, Depth Anything performs very well! I have also benchmarked the inference duration of the model against different models here. I also ran torch.compile benchmarks across them and got nice speed-ups π https://huggingface2.notion.site/DPT-Benchmarks-1e516b0ba193460e865c47b3a5681efb?pvs=4