Add pipeline tag, library name and Github README content
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,5 +1,89 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
Classifiers used in [MeshFleet](https://github.com/FeMa42/MeshFleet) to generate the [MeshFleet Dataset](https://huggingface.co/datasets/DamianBoborzi/MeshFleet). The Dataset and its generation is described in
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-classification
|
| 4 |
+
library_name: pytorch
|
| 5 |
---
|
| 6 |
|
| 7 |
+
Classifiers used in [MeshFleet](https://github.com/FeMa42/MeshFleet) to generate the [MeshFleet Dataset](https://huggingface.co/datasets/DamianBoborzi/MeshFleet). The Dataset and its generation is described in [MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling](arxiv.org/abs/2503.14002).
|
| 8 |
+
|
| 9 |
+
# MeshFleet Dataset Implementation and Car Quality Classification
|
| 10 |
+
|
| 11 |
+
🤗 **Huggingface Dataset**: [MeshFleet](https://huggingface.co/datasets/DamianBoborzi/MeshFleet)
|
| 12 |
+
|
| 13 |
+
This repository contains the implementation for the generation of the MeshFleet dataset, a curated collection of 3D car models derived from Objaverse-XL, using a car quality classification pipeline.
|
| 14 |
+
|
| 15 |
+

|
| 16 |
+
|
| 17 |
+
**Dataset Overview:**
|
| 18 |
+
|
| 19 |
+
The MeshFleet dataset provides metadata for 3D car models, including their SHA256 from Objaverse-XL, vehicle category, and size. The core dataset is available as a CSV file: [`meshfleet_with_vehicle_categories_df.csv`](https://huggingface.co/datasets/DamianBoborzi/MeshFleet). You can easily load it using pandas:
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
import pandas as pd
|
| 23 |
+
|
| 24 |
+
meshfleet_df = pd.read_csv('./data/meshfleet_with_vehicle_categories_df.csv')
|
| 25 |
+
print(meshfleet_df.head())
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
The actual 3D models can be downloaded from [Objaverse-XL](https://github.com/allenai/objaverse-xl.git) using their corresponding SHA256 hashes. Pre-rendered images of the MeshFleet models are also available within the Hugging Face repository in the `renders` directory, organized as `renders/{sha256}/00X.png`.
|
| 29 |
+
|
| 30 |
+
**Project Structure and Functionality:**
|
| 31 |
+
|
| 32 |
+
This repository provides code for the following:
|
| 33 |
+
|
| 34 |
+
1. **Data Preprocessing:** Preparing the labeled objects (subset from Objaverse) for quality classification.
|
| 35 |
+
2. **Car Quality Classifier Training:** Training a model to predict the quality of 3D car models based on rendered images.
|
| 36 |
+
3. **Objaverse-XL Processing:** Downloading, rendering, and classifying objects from the larger Objaverse-XL dataset.
|
| 37 |
+
4. **Classification of Rendered Objects:** Applying the trained classifier to new renderings.
|
| 38 |
+
|
| 39 |
+
> Please Note: You don't have to clone or install the repository if you just want to use the extracted data. You can download the preprocessed data from the Hugging Face repository: <https://huggingface.co/datasets/DamianBoborzi/MeshFleet>
|
| 40 |
+
|
| 41 |
+
## Data Preprocessing
|
| 42 |
+
|
| 43 |
+
The preprocessing pipeline involves several key steps:
|
| 44 |
+
|
| 45 |
+
1. **Loading Labels:** Load the `car_quality_dataset_votes.csv` file from [CarQualityDataset](https://huggingface.co/datasets/DamianBoborzi/CarQualityDataset). This file contains quality labels (votes) for a subset of Objaverse objects, which serves as our training data.
|
| 46 |
+
2. **Loading/Generating Renderings:** Obtain rendered images of the labeled objects. You can either use the pre-rendered images from the same [CarQualityDataset](https://huggingface.co/datasets/DamianBoborzi/CarQualityDataset) repository or generate your own using the Objaverse-XL library.
|
| 47 |
+
3. **Embedding Generation:** Create image embeddings using DINOv2 and SigLIP models. These embeddings capture visual features crucial for quality assessment. We provide a notebook (`scripts/objaverse_generate_sequence_embeddings.ipynb`) to guide you through this process. Pre-generated embeddings will also be made available for download.
|
| 48 |
+
4. **Classifier Training** We use the generated embeddings and labels to train a classifier to predict the quality of 3D car models. The training script is `sequence_classifier_training.py`.
|
| 49 |
+
5. **Objaverse-XL Processing:** You can load, render, and classify 3D car models from the larger Objaverse-XL dataset, using the `oxl_processing/objaverse_xl_batched_renderer.py` script
|
| 50 |
+
|
| 51 |
+
### Installation
|
| 52 |
+
|
| 53 |
+
Before you begin, install the necessary dependencies:
|
| 54 |
+
|
| 55 |
+
```bash
|
| 56 |
+
pip install -r requirements.txt
|
| 57 |
+
pip install .
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## Classifier Training
|
| 61 |
+
|
| 62 |
+
This section details the steps to train the car quality classifier.
|
| 63 |
+
|
| 64 |
+
1. **Load and Render Labeled Objects:** Begin by loading the labeled objects from `data/car_quality_dataset_votes.csv`. Render the objects(if you haven't already downloaded pre-rendered images.
|
| 65 |
+
2. **Generate Embeddings:** Use the `scripts/objaverse_generate_sequence_embeddings.ipynb` notebook to generate DINOv2 and SigLIP embeddings for the rendered images. Place the generated embeddings in the `data` directory. We will also provide a download link for pre-computed embeddings soon.
|
| 66 |
+
3. **Train the Classifier:** Train a classifier using the `quality_classifier/sequence_classifier_training.py` script. This script takes the embeddings as input and trains a model to predict object quality. Make sure to adjust the paths within the script if you're using your own generated embeddings. A pre-trained model will also be made available for download.
|
| 67 |
+
|
| 68 |
+
## Objaverse-XL Processing (Downloading, Rendering, and Classification)
|
| 69 |
+
|
| 70 |
+
To process objects from the full Objaverse-XL dataset:
|
| 71 |
+
|
| 72 |
+
1. **Use the `oxl_processing/objaverse_xl_batched_renderer.py` script:** This script handles downloading, rendering, and (optionally) classifying objects from Objaverse-XL.
|
| 73 |
+
2. **Refer to the `oxl_processing` directory:** This directory contains all necessary files and a dedicated README with more detailed instructions. Note that the full Objaverse-XL dataset is substantial, so consider this when planning storage and processing. You can also use the renders of the objects from the [Objaverse_processed](https://huggingface.co/datasets/DamianBoborzi/Objaverse_processed) repository (The renders are around 500 GB).
|
| 74 |
+
|
| 75 |
+
## Classification of Rendered Objects
|
| 76 |
+
|
| 77 |
+
To classify new renderings using the trained model:
|
| 78 |
+
|
| 79 |
+
1. **Download Pre-trained Models:** Download the trained classifier and the PCA model (used for dimensionality reduction of DINOv2 embeddings) from Hugging Face and place them in the `car_quality_models` directory.
|
| 80 |
+
2. **Prepare a CSV:** Create a CSV file containing the SHA256 hash and image path (`sha256,img_path`) for each rendered image you want to classify.
|
| 81 |
+
3. **Run `reclassify_oxl_data.py`:** Use the following command to classify the images:
|
| 82 |
+
|
| 83 |
+
```bash
|
| 84 |
+
python reclassify_oxl_data.py --num_objects <number_of_objects> --gpu_batch_size <batch_size> [--use_combined_embeddings]
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
* `<number_of_objects>`: The number of objects to classify.
|
| 88 |
+
* `<batch_size>`: The batch size for GPU processing.
|
| 89 |
+
* `--use_combined_embeddings`: (Optional) Use both DINOv2 and SigLIP embeddings for classification.
|