Tokenizer loading?
I want to try using the model in hf/trainer on a new task. I'm getting issues with loading the tokenizer. (I am fine with loading the model with modelgenerator). Any advice on how to load it?
MODEL_PATH ="genbio-ai/AIDO.DNA-300M"
from modelgenerator.tasks import SequenceClassification
model = SequenceClassification.from_config({"model.backbone": "aido_dna_300m", "model.n_classes": 2})
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH,trust_remote_code=True)
Output error:
KeyError Traceback (most recent call last)
File ~/anaconda3/envs/dna/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py:1128, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
1127 try:
-> 1128 config_class = CONFIG_MAPPING[config_dict["model_type"]]
1129 except KeyError:
File ~/anaconda3/envs/dna/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py:825, in _LazyConfigMapping.getitem(self, key)
824 if key not in self._mapping:
--> 825 raise KeyError(key)
826 value = self._mapping[key]
KeyError: 'rnabert'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[19], line 2
1 # Load the tokenizer
----> 2 tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH,trust_remote_code=True)
File ~/anaconda3/envs/dna/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py:782, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
780 if config_tokenizer_class is None:
781 if not isinstance(config, PretrainedConfig):
--> 782 config = AutoConfig.from_pretrained(
783 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
784 )
785 config_tokenizer_class = config.tokenizer_class
786 if hasattr(config, "auto_map") and "AutoTokenizer" in config.auto_map:
File ~/anaconda3/envs/dna/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py:1130, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
1128 config_class = CONFIG_MAPPING[config_dict["model_type"]]
1129 except KeyError:
-> 1130 raise ValueError(
1131 f"The checkpoint you are trying to load has model type {config_dict['model_type']} "
1132 "but Transformers does not recognize this architecture. This could be because of an "
1133 "issue with the checkpoint, or because your version of Transformers is out of date."
1134 )
1135 return config_class.from_dict(config_dict, **unused_kwargs)
1136 else:
1137 # Fallback: use pattern matching on the string.
1138 # We go from longer names to shorter names to catch roberta before bert (for instance)
ValueError: The checkpoint you are trying to load has model type rnabert but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.