The 30-character button problem: testing strategies for 19 locales

Split-screen comparison of English vs German button labels showing character overflow

The 30-character button is a product design constraint that most engineers and localization teams encounter late — at QA, or in production, when a German user reports that the checkout button reads "Weiter zur Zah..." with an ellipsis that the designer never intended. The constraint existed in the design system from the beginning. What failed was the handoff: no one told the translation pipeline about it, and no one tested against it systematically before ship.

This is a solvable problem, but solving it requires moving the character limit check from QA to translation time, and building a test strategy that accounts for all 19 locales rather than just the ones your team speaks.

Why character limits are different for localization than for English copy

English copy has character limits too, but English writers and designers share a language, and a designer can read a button label and say "that's too long." The localization equivalent requires either bilingual reviewers for every target language, or tooling that enforces the constraint mechanically.

The mechanical challenge is that translated text length is not predictable from English length with any reliability. The conventional wisdom is that European languages expand 20–30% on average over English. That's a useful rule of thumb for body copy, but it dramatically understates the problem for short UI strings. A 10-character English button label might translate to 22 characters in German, 18 in French, 8 in Japanese, or 35 in Finnish. The shorter the English string, the more variance you see in the translations, because short strings often have no room to compress and the target language may have no equivalent short form.

German compound nouns are the canonical example, but they're not the only one. Finnish is an agglutinative language that builds words by stacking suffixes — the Finnish equivalent of "Add to your shopping cart" might be a single compound word exceeding 30 characters. Dutch and Norwegian have similar compounding patterns. At the other extreme, CJK languages (Chinese, Japanese, Korean) often produce translations substantially shorter than English in character count, but those characters are typically rendered at larger width per character, so pixel-width constraints may still be violated even when the character count is lower.

The three layers of the problem

Character overflow in localized UIs manifests in three different ways, and they require different detection strategies:

Hard character count violations happen when the resource file has an explicit maxlength or max_chars annotation and the translation exceeds it. This is the easiest case to detect — a static analysis of the resource file before export catches it. The tricky part is that many resource file formats don't carry these annotations by default. You have to add them to your source strings as part of the string annotation workflow.

Rendered pixel overflow happens when the character count is within limit but the rendered width exceeds the container. This is common with CJK-to-Latin or Arabic-to-Latin comparisons, and also common when a design constraint is expressed in pixels rather than characters. A button designed for 180px might fit "Submit" and its French equivalent but not the Polish equivalent "Zatwierdź zmiany" at the same font size. Detecting this requires either a pseudo-localization pass with expanded dummy text, or automated visual testing against a rendered viewport.

Semantic truncation is the most insidious: the translated string fits but was shortened in a way that changes meaning. A translator asked to stay under 30 characters for "Continue to payment" might produce "Bezahlen" (just "Pay") in German, which is shorter and fits the button but communicates differently. This requires a human reviewer who understands both the UI context and the translation, and it's not something static analysis catches.

Building a pre-translation character annotation workflow

The most effective intervention is annotating character limits on source strings before they reach translators, rather than checking translated output after the fact.

In .xliff format, the maxwidth attribute on a trans-unit element carries this information and is recognized by most CAT tools. A translator's workbench will show a live character count and flag the string as over-limit before they even submit. In .po format, a comment like #. max_length: 28 is convention; some tools read it, some don't. For JSON-based formats (i18next, .arb, Flutter), a parallel annotations file or a custom key-level metadata block is the common pattern.

The annotation workflow requires one-time setup but pays dividends across all future translation cycles: translators know the budget before they translate, not after. This is a qualitatively different position from "translate freely and then check."

Consider a scenario from a mobile app localization in early 2025: a growing productivity SaaS shipped a nav bar with five buttons, each with a 22-character annotation. The translation team worked in a CAT tool that surfaced the limit as a red indicator. German, French, and Spanish all came in under limit. Japanese required no adjustment — the translations averaged 8 characters. The two problem locales were Finnish and Polish, where three of the five button labels came back over limit. Because the limit was visible to translators, they had already proposed shorter alternatives as comments; the review round was a quick approval cycle rather than a full retranslation cycle. The whole localization workflow completed in one pass rather than three.

Pseudo-localization as a pre-QA length test

Pseudo-localization is a technique where source strings are transformed — typically by replacing characters with lookalike Unicode characters and padding the string to simulate typical translation expansion — and the resulting pseudo-locale is loaded into the application and screened visually. It's a standard practice in mature i18n workflows but underused by product teams that haven't built localization into their engineering culture yet.

A simple pseudo-localization expansion factor of 1.3–1.4x on English strings catches most layout overflow problems before any real translation exists. If "Continue to payment" at 18 characters becomes a 24-character padded pseudo string and that already clips the button, you know the German translation will have the same problem. You can fix the layout design — wider button, smaller font, two-line wrapping with max-height constraint — before you've paid for translations that need to be redone.

More sophisticated pseudo-localization also tests RTL layout (wrapping source strings in Arabic directionality markers) and CJK character width assumptions (replacing Latin characters with full-width Unicode equivalents). These catch different classes of layout bug than simple length expansion.

We're not saying pseudo-localization replaces visual testing in real locales. It catches systematic layout constraints early and cheaply; it doesn't catch locale-specific rendering quirks, font fallback issues, or the actual appearance of translated text in a real language. Both layers are needed.

Per-locale character limit test matrices in CI

For teams running automated visual regression tests, a per-locale screenshot matrix is the most reliable way to catch character overflow before it ships. The test setup renders every screen in every target locale and compares text containers for overflow, ellipsis, or unexpected word wrap.

The practical challenge is scale: 19 locales × N screens × M breakpoints is a large matrix, and maintaining it requires infrastructure investment. A pragmatic approach is to run the full matrix on the locales with the highest expansion risk (German, Finnish, Polish) and the locales with the highest character-width risk (Chinese, Japanese). The remaining locales get spot-checked on the screens with the tightest character constraints. This isn't a complete test, but it catches the failure modes that cause visible regressions.

The key strings to include in per-locale CI tests are those with explicit character limit annotations in the source file. If you've done the annotation work upstream, the CI test list writes itself: test every annotated key in every locale, verify the rendered output stays within the annotated limit.

What gets missed anyway

Even a well-implemented character limit workflow has gaps. Locale-specific font size requirements — some languages conventionally use slightly larger base font sizes for readability — can push a string over its pixel budget even when the character count is clean. Right-to-left text in a button that has left-aligned icon assumptions creates unexpected layout bugs that aren't about length at all. Number formatting inside strings (date ranges, price strings) can produce longer output in some locales because the formatted number itself expands.

These edge cases aren't a reason to avoid systematic character limit testing. They're a reason to combine it with live locale QA on real devices before shipping. The systematic checks eliminate the large class of predictable failures; live QA catches the long tail of locale-specific surprises that no static analysis anticipates.