Intl.Segmenter provides a standard API for breaking strings into locale‑specific units such as graphemes, words, or sentences, allowing developers to write language‑agnostic text processing code.
Creating a Segmenter Instance
Instantiate the segmenter with a locale identifier and an options object that defines the desired granularity. This object determines how the input text will be partitioned.
- Use new Intl.Segmenter(locale, {granularity: word}) for word‑level splitting.
- Pass
ja-JPfor Japanese,hifor Hindi, etc. - The constructor throws a
RangeErrorif the locale is unsupported. - Store the instance for reuse to avoid repeated construction overhead.
Granularity Options
The granularity field can be set to grapheme, word, or sentence. Choose the level that matches your use case.
- grapheme: splits at user‑perceived characters, useful for character counters.
- word: separates lexical items while flagging punctuation via
isWordLike. - sentence: isolates complete sentences, handling language‑specific end marks.
- Default granularity is
graphemewhen no option is supplied.
Filtering Word‑Like Segments
After segmentation, filter out non‑word tokens by checking the isWordLike boolean. This yields a clean array of lexical items.
- Convert the iterator to an array with
Array.from(segmenter.segment(text)). - Apply
.filter(item => item.isWordLike)to drop punctuation. - Map the remaining objects to
item.segmentto extract plain strings. - The resulting array can be counted with
.lengthfor an accurate word total.
Checking Locale Support
Before deploying, verify that the target browsers support the required locales using Intl.Segmenter.supportedLocalesOf. This prevents silent fallbacks.
- Provide an array of locale strings to the method.
- The return value contains only the locales that are guaranteed to work.
- If the array is empty, consider a polyfill or an alternative approach.
- Combine this check with feature detection for broader compatibility.
Practical Example: Interactive Word Counter
The following snippet attaches a mouseup listener to a paragraph, runs a Japanese word segmenter on the selected text, and displays the count in a <pre> element.
- Retrieve the selection with
window.getSelection(). - Pass
selection.toString()into aIntl.Segmenter(ja-JP, {granularity: word})instance. - Filter with
.filter(item => item.isWordLike)and map toitem.segment. - Update the UI by setting
preElement.textContent = words.length. - For large‑scale apps, see our guide on real‑time orchestration for patterns that also apply to text pipelines.
By following these steps, developers can replace fragile regular‑expression hacks with a reliable, locale‑aware solution that works across modern browsers.