About 50 results
Open links in new tab
  1. How to do Tokenizer Batch processing? - HuggingFace

    Jun 7, 2023 · in the Tokenizer documentation from huggingface, the call fuction accepts List [List [str]] and says: text (str, List [str], List [List [str]], optional) — The sequence or batch of sequences to be …

  2. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  3. Unable to get the tokenizer of Gemma-3 - Stack Overflow

    Mar 22, 2025 · 3 I am trying to get the tokenizer using huggingface AutoTokenizer library, but I am unable to fetch, is there any other way to get it? Where I am doing wrong?

  4. Looking for a clear definition of what a "tokenizer", "parser" and ...

    Mar 28, 2018 · A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens …

  5. OpenAI API: How do I count tokens before(!) I send an API request?

    Mar 21, 2023 · How do I count tokens before (!) I send an API request? As stated in the official OpenAI article: To further explore tokenization, you can use our interactive Tokenizer tool, which allows you …

  6. How to add all standard special tokens to my hugging face tokenizer …

    Aug 11, 2022 · I want all special tokens to always be available. How do I do this? My first attempt to give it to my tokenizer: def does_t5_have_sep_token(): tokenizer: PreTrainedTokenizerFast = …

  7. How to add new tokens to an existing Huggingface tokenizer?

    May 8, 2023 · Two comments : 1/ for two examples above "Extending existing AutoTokenizer with new bpe-tokenized tokens" and "Direct Answer to OP", you did not resize embeddings, is that an oblivion …

  8. python - AutoTokenizer.from_pretrained fails to load locally saved ...

    from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') config = AutoConfig.from_pretrained('distilroberta …

  9. How to add new special token to the tokenizer? - Stack Overflow

    Sep 15, 2021 · How to add new special token to the tokenizer? Asked 4 years, 5 months ago Modified 2 years, 8 months ago Viewed 34k times

  10. json - Tokenizer.from_file () HUGGINFACE - Stack Overflow

    Nov 1, 2022 · Tokenizer.from_file () HUGGINFACE : Exception: data did not match any variant of untagged enum ModelWrapper Ask Question Asked 3 years, 3 months ago Modified 8 months ago