ViTModelFT for Skin Cancer Classification

Model Details

Model Architecture: Vision Transformer (ViT)
Framework: PyTorch
Input Shape: 224x224 RGB images
Number of Parameters: ~86M (Based on ViT Base Model)
Output: Multi-class classification (9 classes)

Model Description

This model uses a Vision Transformer (ViT) as a backbone for skin cancer classification. The ViT model is pretrained on ImageNet and then fine-tuned for the task. The last layer is replaced with a fully connected network for multi-class classification, with 3 layers: 512, 256 neurons leading to 9 classes representing different skin cancer types.

The ViT model is frozen for all layers except the fully connected layers, allowing the model to adapt to the new classification task while retaining knowledge learned from ImageNet.

Training Details

Optimizer: Adam
Batch Size: 64
Loss Function: Cross-Entropy Loss
Number of Epochs: 10
Dataset: Skin Cancer 9-Class Dataset

Metrics (Validation Set)

Class	Precision	Recall	F1-Score
0	0.69	0.56	0.62
1	0.60	0.75	0.67
2	0.90	0.56	0.69
3	0.20	0.06	0.10
4	0.47	1.00	0.64
5	0.63	0.75	0.69
6	0.00	0.00	0.00
7	0.67	0.50	0.57
8	0.60	1.00	0.75

Overall Accuracy: 0.59
Macro Average Precision: 0.53
Macro Average Recall: 0.58
Macro Average F1-Score: 0.52
Weighted Average Precision: 0.58
Weighted Average Recall: 0.59
Weighted Average F1-Score: 0.56

License

This model is released under the MIT License.

This model has been pushed to the Hub using the PytorchModelHubMixin integration:

Library: [More Information Needed]
Docs: [More Information Needed]

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sebastiansarasti/ViTSkinCancer

Base model

google/vit-base-patch16-224-in21k

Finetuned

(2476)

this model