Skip to Content
  • Home
  • Blog
  • Privacy Policy
  • Terms And conditions
  • Disclaimer
  • About Us
      • Home
      • Blog
      • Privacy Policy
      • Terms And conditions
      • Disclaimer
      • About Us
  • Knowledge Base
  • How to Build a Locale‑Aware Word Counter Using Intl.Segmenter
  • How to Build a Locale‑Aware Word Counter Using Intl.Segmenter

    4 March 2026 by
    Suraj Barman

    Intl.Segmenter provides a standard API for breaking strings into locale‑specific units such as graphemes, words, or sentences, allowing developers to write language‑agnostic text processing code.

    Creating a Segmenter Instance

    Instantiate the segmenter with a locale identifier and an options object that defines the desired granularity. This object determines how the input text will be partitioned.

    • Use new Intl.Segmenter(locale, {granularity: word}) for word‑level splitting.
    • Pass ja-JP for Japanese, hi for Hindi, etc.
    • The constructor throws a RangeError if the locale is unsupported.
    • Store the instance for reuse to avoid repeated construction overhead.

    Granularity Options

    The granularity field can be set to grapheme, word, or sentence. Choose the level that matches your use case.

    • grapheme: splits at user‑perceived characters, useful for character counters.
    • word: separates lexical items while flagging punctuation via isWordLike.
    • sentence: isolates complete sentences, handling language‑specific end marks.
    • Default granularity is grapheme when no option is supplied.

    Filtering Word‑Like Segments

    After segmentation, filter out non‑word tokens by checking the isWordLike boolean. This yields a clean array of lexical items.

    • Convert the iterator to an array with Array.from(segmenter.segment(text)).
    • Apply .filter(item => item.isWordLike) to drop punctuation.
    • Map the remaining objects to item.segment to extract plain strings.
    • The resulting array can be counted with .length for an accurate word total.

    Checking Locale Support

    Before deploying, verify that the target browsers support the required locales using Intl.Segmenter.supportedLocalesOf. This prevents silent fallbacks.

    • Provide an array of locale strings to the method.
    • The return value contains only the locales that are guaranteed to work.
    • If the array is empty, consider a polyfill or an alternative approach.
    • Combine this check with feature detection for broader compatibility.

    Practical Example: Interactive Word Counter

    The following snippet attaches a mouseup listener to a paragraph, runs a Japanese word segmenter on the selected text, and displays the count in a <pre> element.

    • Retrieve the selection with window.getSelection().
    • Pass selection.toString() into a Intl.Segmenter(ja-JP, {granularity: word}) instance.
    • Filter with .filter(item => item.isWordLike) and map to item.segment.
    • Update the UI by setting preElement.textContent = words.length.
    • For large‑scale apps, see our guide on real‑time orchestration for patterns that also apply to text pipelines.

    By following these steps, developers can replace fragile regular‑expression hacks with a reliable, locale‑aware solution that works across modern browsers.


    Latest Stories

    Explore fresh ideas and updates from our editorial team.

    See All
    Your Dynamic Snippet will be displayed here... This message is displayed because you did not provide enough options to retrieve its content.

    Copyright © 2026 TechStora. All Rights Reserved.