novelai_api.Tokenizer¶
- class SentencePiece[source]¶
Bases:
sentencepiece.SentencePieceProcessorWrapper around sentencepiece.SentencePieceProcessor that adds the encode and decode methods
- trans_table_ids: Dict[int, str]¶
- trans_table_str: Dict[str, int]¶
- trans_regex_str: re.Pattern¶
- class Tokenizer[source]¶
Bases:
objectAbstraction of the tokenizer behind each Model
- classmethod get_tokenizer_name(model: novelai_api.Preset.Model) str[source]¶
Get the tokenizer name a model uses
- Parameters:
model – Model to get the tokenizer name of
- classmethod decode(model: novelai_api.Preset.Model | novelai_api.ImagePreset.ImageModel, o: List[int]) str[source]¶
Decode the provided tokens using the chosen tokenizer
- Parameters:
model – Model to use the tokenizer of
o – List of tokens to decode
- Returns:
Text the provided tokens decode into
- classmethod encode(model: novelai_api.Preset.Model | novelai_api.ImagePreset.ImageModel, o: str) List[int][source]¶
Encode the provided text using the chosen tokenizer
- Parameters:
model – Model to use the tokenizer of
o – Text to encode
- Returns:
List of tokens the provided text encodes into