"Chinese character" written in traditional (left) and simplified (right) forms
Script type
Time period
c. 13th century BCE – present
  • Left-to-right
  • Top-to-bottom, columns right-to-left
Languages (among others)
Related scripts
Parent systems
  • Chinese characters
Child systems
ISO 15924
ISO 15924Hani (500), ​Han (Hanzi, Kanji, Hanja)
Unicode alias
U+4E00–U+9FFF CJK Unified Ideographs (full list)
 This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and  , see IPA § Brackets and transcription delimiters.
Chinese name
Simplified Chinese汉字
Traditional Chinese漢字
Literal meaningHan characters
Vietnamese name
Vietnamese alphabet
  • chữ Hán
  • chữ Nho
  • Hán tự
  • 𡨸漢
  • 𡨸儒
Chữ Hán漢字
Zhuang name
  • 𭨡倱[1]
  • Sawgun
Korean name
Japanese name
Chinese characters Chinese family of scripts Written ChineseKanjiHanjaChữ Hán Historical forms and styles Neolithic symbols in China Oracle bone Bronze Seal (Bird-wormLargeSmall) Clerical Cursive Semi-cursive Regular Flat brush Typefaces Fangsong Ming Sans-serif Properties and classification ComponentsStrokes (order)Radical Collation and standards Character-form standards Jiu zixingXin zixing Kangxi Dictionary forms (1716) General Standard Chinese Characters (mainland China, 2013) Graphemes of Commonly-used Chinese characters (Hong Kong, 2007) Standard Form of National Characters (Taiwan, 1982) Grapheme-usage standards General Standard Characters (PRC, 2013) Jōyō kanji (Japan, 2010) Other standards Standardized Forms of Words with Variant Forms (PRC, 2002) Nan Min Recommended Characters (Taiwan, 2009) Previous standards Commonly-used Characters (PRC, 1988) Tōyō kanji (Japan, 1946) Reforms China Clerical reforms Traditional characters Simplified characters (first roundsecond round) Debate Japan Kyūjitai Shinjitai Ryakuji Korea Yakja Singapore Table of Simplified Characters Homographs and readings Literary and colloquial readings Variants Graphemic variants Zetian characters Derived systems Slavonic transcription Nüshu Kana (Man'yōganaHiraganaKatakana) Jurchen script Khitan (LargeSmall) Idu script Bopomofo Sawndip Chữ Nôm

Chinese characters[a] are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Chinese characters have a documented history spanning over three millennia, representing one of the four independent inventions of writing accepted by scholars; of these, they comprise the only writing system continuously used since its invention. Over time, the function, style, and means of writing characters have evolved greatly. Informed by a long tradition of lexicography, modern states using Chinese characters have standardised their forms and pronunciations: broadly, simplified characters are used to write Chinese in mainland China, Singapore, and Malaysia, while traditional characters are used in Taiwan, Hong Kong, and Macau.

After being introduced to other countries in order to write Literary Chinese, characters were eventually adapted to write the local languages spoken throughout the Sinosphere. In Japanese, Korean, and Vietnamese, Chinese characters are known as kanji, hanja, and chữ Hán respectively. Each of these countries used existing characters to write both native and Sino-Xenic vocabulary, and created new characters for their own use. These languages each belong to separate language families, and generally function differently from Chinese. This has contributed to Chinese characters largely being replaced with alphabets in Korean and Vietnamese, leaving Japanese as the only major non-Chinese language still written with Chinese characters.

Unlike in alphabets, where letters correspond to a language's units of sound, called phonemes—Chinese characters correspond to morphemes, a language's smallest units of meaning. Morphemes in Chinese are usually a single syllable in length, but characters may represent morphemes comprising multiple syllables as well. Chinese characters are not ideographs, as they correspond to the morphemes of a particular language, but not the abstracted ideas themselves. Most characters are made of smaller components that may provide information regarding the character's meaning or pronunciation.


Chinese characters are accepted as representing one of four independent inventions of writing in human history.[b] In each instance, writing evolved from a system using two distinct types of ideographs. Ideographs could either be pictographs visually depicting objects or concepts, or fixed signs representing concepts only by shared convention. These systems are classified as proto-writing, because the techniques they used were insufficient to carry the meaning of spoken language by themselves.[3]

Various innovations were required for Chinese characters to emerge from proto-writing. Firstly, pictographs became distinct from simple pictures in use and appearance: for example, the pictograph , meaning 'large', was originally a picture of a large man, but one would need to be aware of its specific meaning in order to interpret the sequence 鹿 as signifying 'large deer', rather than being a picture of a large man and a deer next to one another. Due to this process of abstraction, as well as to make characters easier to write, pictographs gradually became more simplified and regularised—often to the extent that the original objects represented are no longer obvious.[4]

This proto-writing system was limited to representing a relatively narrow range of ideas with a comparatively small library of symbols. This compelled innovations that allowed for symbols to directly encode spoken language.[5] In each historical case, this was accomplished by some form of the rebus technique, where the symbol for a word is used to indicate a different word with a similar pronunciation, depending on context. This allowed for words that lacked a plausible pictographic representation to be written down for the first time. This technique, called 假借 (jiǎjiè) in Chinese, pre-empted more sophisticated methods of character creation that would further expand the lexicon. The process whereby writing emerged from proto-writing took place over a long period; when the purely pictorial use of symbols disappeared, leaving only those representing spoken words, the process was complete.[6]


Chinese characters have been used in several different writing systems throughout history. The concept of a writing system includes the written symbols that are used, called graphemes—these may include characters, numerals, or punctuation—as well as the rules by which the graphemes are used to record language.[7] Chinese characters are logographs, graphemes that denote words or morphemes in a language. Writing systems that use logographs are contrasted with alphabets and syllabaries, where graphemes correspond to the phonetic units in a language.[8] In special cases characters may correspond to non-morphemic syllables; due to this, written Chinese is often characterised as morphosyllabic.[9][c]

The Sinosphere has a long tradition of lexicography attempting to explain and refine the use of characters; for most of history, analysis revolved around a model first popularised in the 2nd-century Shuowen Jiezi dictionary.[11] Newer models have since appeared, often attempting to describe both the methods by which characters were created, the characteristics of their structures, and the way they presently function.[12]

Structural analysis

Most characters can be analysed structurally as compounds made of smaller components (偏旁; piānpáng), which may have their own functions. Phonetic components provide a hint to a character's pronunciation, and semantic components indicate some element of the character's meaning. Components that serve neither function may be classified as pure signs with no particular meaning, other than their presence distinguishing one character from another.[13]

A straightforward structural classification scheme may consist of three pure classes of semantographs, phonographs and signs—having only semantic, phonetic, and form components respectively, as well as four classes corresponding to each possible combination of the three component types.[14] According to Yang Runlu, of the 3,500 characters used frequently in Standard Chinese, pure semantographs are the rarest, accounting for about 5% of the lexicon, followed by pure signs with 18%, and semantic–form and phonetic–form compounds together accounting for 19%. The remaining 58% are phono-semantic compounds.[15]

The Chinese palaeographer Qiu Xigui presents "three principles" of character formation, with semantographs describing all characters whose forms are wholly related to their meaning, regardless of the method by which the meaning was originally depicted, phonographs that include a phonetic component, and loangraphs encompassing existing characters that have been borrowed to write other words. He also acknowledges the existence of character classes that fall outside of these principles, such as pure signs.[16]



Graphical evolution of pictographs

While relatively few in number, most of the earliest characters originated as pictographs (象形; xiàngxíng), representational pictures of physical objects.[17] In practice, their forms have become regularised and simplified after centuries of iteration in order to make them easier to write. Examples include ('Sun'), ('Moon'), and ('tree').[A]

As character forms developed, distinct depictions of various physical objects within pictographs became reduced to instances of a single written component.[18] As such, what a pictograph is depicting is often not immediately evident, and may be considered as a pure sign without regard for its origin in picture-writing. However, if a character's use in compounds, such as in ('clear sky') still reflects its meaning and is not phonetic or arbitrary, it can still be considered as a semantic component.[19]

Due to the regularisation of character forms, individualised components may form part of a compound pictograph. For example, within a given character the component 'MOUTH' often carries a meaning related to mouths, but within ('tall')—a pictograph of a tall building—it instead depicts a window, ultimately lending to the character's meaning of 'tallness'. In another instance, the same 'MOUTH' component depicts the lip of a vessel in the modern form of the pictograph ('full').[B]

Pictographs have often been extended from their original concrete meanings to take on additional layers of metaphor and synecdoche, which sometimes even displace the pictograph's original meaning. Historically, this process has sometimes created excess ambiguity between different senses of a character, which is usually then resolved by deriving new compound characters by adding components corresponding to specific senses. This can result in new pictographs, but usually results in other character types.[20]


Indicatives (指事; zhǐshì), also called simple ideographs, represent abstract concepts that lack concrete physical forms, but nonetheless can be visually depicted in an intuitive way. Examples include ('up') and ('down')—these characters originally had forms consisting of dots placed above and below a line, which later evolved into their present forms, which have less potential for graphical ambiguity in context.[21] More complex indicatives include ('convex'), ('concave'), and ('flat and level').[22]

Compound ideographs

Compound ideographs (會意; huìyì)—also called logical aggregates, associative idea characters, or syssemantographs—juxtapose multiple pictographs or indicatives to suggest a new, synthetic meaning. A canonical example is ('bright'), interpreted as the juxtaposition of the two brightest objects in the sky: 'SUN' and 'MOON', together expressing their shared quality of brightness. Though the historicity of this etymology has been contested in recent scholarship, it is a canonical reading: for example, the common compound word 明白 means 'understanding', touching on the derived association of with 'illumination'. The addition of the abbreviated 'GRASS' component on top results in the compound ideograph ('to sprout'), alluding to the heliotropic behaviour of plant life. Other commonly cited examples include ('rest'), composed of pictographs 'MAN' and 'TREE', and ('good'), composed of 'WOMAN' and 'CHILD'.[C][23]

The compound character 好 illustrated as its component characters 女 and 子 positioned side by side
The compound character illustrated as its component characters and positioned side by side

Many traditional examples of compound ideographs are now believed to have actually originated as phono-semantic compounds, made obscure by subsequent changes in form.[24] Peter A. Boodberg and William Boltz go so far as to deny that any compound ideographs were devised in antiquity, maintaining that "secondary readings" that are now lost are responsible for the apparent absence of phonetic indicators,[25] but their arguments have been rejected by other scholars.[26]

An example of a modern compound ideograph used in written Chinese is ('concrete'), which combines the 'MAN', 'WORK', and 'STONE' components.[D] Compound ideographs are common in kokuji, characters originally coined in Japan.[27]


Phono-semantic compounds

Phono-semantic compounds (形声; 形聲; xíngshēng) are composed of at least one semantic component and one phonetic component.[28] They may be formed by one of several methods, often a phonetic component added to disambiguate a loangraph or a semantic component added to represent an extended sense of the original character. A compound's phonetic component may have been selected as to indicate an additional layer of meaning to the character as a whole. As a result, determining whether a given character is a phono-semantic compound or an ideographic compound is often non-trivial.[29]

Examples of phono-semantic compounds include (; 'river'), (; 'lake'), (liú; 'stream'), (chōng; 'surge'), and (huá; 'slippery'). These characters each have a component on their left-hand side composed of three short strokes: , which is a reduced form of ('water'). When appearing in this position, this component usually serves a semantic function, indicating the character has some meaning related to water. Here, the remainder of each character is a phonetic component: () is pronounced identically to () in Standard Chinese, () is pronounced similarly to (), and (chōng) is pronounced similarly to (zhōng).[d]

While they may sometimes indicate a character's pronunciation exactly, the phonetic components of most compounds only attempt to provide an approximation—even before any subsequent sound shifts take place within the spoken language. Some characters may only have the same initial or final sound of a syllable in common with phonetic components.[32] The table below lists characters that each use for their phonetic part—save the final one, which uses a previous character in the list—it is apparent that none of them share its modern pronunciation. The Old Chinese pronunciation of has been reconstructed by Baxter and Sagart (2014) as /*lAjʔ/, similar to that for each compound.[33] The table illustrates the sound changes that have taken place since the Shang and Zhou dynasties, when most of the characters in question entered the lexicon. The resulting drift is illustrative of the more extreme cases, when a character's phonetic component no longer provides any hint of its pronunciation.[34]

Phono-semantic compounds sharing phonetic component
Char. Gloss Component OC[α] MC[β] Modern[γ]
Sem. Phon. Mandarin Cantonese Japanese
PTC [e] /*lAjʔ/ yaeX [jè] jaa5 [jaː˩˧] ya [ja̠]
  • 'water'
  • /*lAjʔ/
/*Cə.lraj/ drje chí [ʈʂʰǐ] ci4 [tsʰiː˩] chi [tɕi]
  • 'horse'
  • 'bow'
/*l̥ajʔ/ syeX chí [ʈʂʰǐ]
shǐ [ʂì]
ci4 [tsʰiː˩] chi [tɕi]
shi [ɕi]
'set up'
  • 'flag'
/*l̥aj/ sye shī [ʂí] si1 [siː˥] se [se̞]
shi [ɕi]
  • 'earth'
/*[l]ˤej-s/ dijH [tî] dei6 [tei˨] ji [dʑi]
chi [tɕi]

    • 𠂉
  • 'person'
/*l̥ˤaj/ tha [tʰá] taa1 [tʰaː˥] ta [ta̠]
  • 'female'
[f] [f]
  • 'hand'
  • /*l̥ˤaj/
/*l̥ˤaj/ thaH tuō [tʰwó] to1 [tʰɔː˥] ta [ta̠]
da [da̠]

This method is still used to form new characters: for example (; 'plutonium') is the semantic 'GOLD' plus the phonetic ()—described in Chinese as " gives sound, gives meaning". Many Chinese names for chemical elements and other characters related to chemistry were formed in this way.[35]


The phenomenon of an existing character being adapted to write another word with a similar pronunciation was necessary to the emergence of the Chinese writing system, and it has remained common in the writing system ever since. Some loangraphs are introduced to represent words previously lacking another written form—this is often the case with abstract grammatical particles such as and —but this is not always so.[36]

Loangraphs are also used to write words borrowed from other languages, such as the various Buddhist terminology introduced to China in antiquity, as well as contemporary non-Chinese words and names. For example, in the name 罗马尼亚; 羅馬尼亞 (Luómǎníyà; 'Romania'), each character is commonly used as a loangraph for its respective syllable. However, the barrier between a character's pronunciation and meaning is never total: when transcribing into Chinese, loangraphs are often chosen deliberately as to create certain connotations. This is regularly done with corporate brand names: for example, Coca-Cola's Chinese name is 可口可乐; 可口可樂 (Kěkǒu Kělè; 'delicious enjoyable').[37][38]


Some characters and components are pure signs, whose meaning merely derives from their having a fixed and distinct form. Basic examples of pure signs are found with the numerals beyond four, e.g. ('five') and ('eight'), whose forms do not give visual hints to the quantities they represent.[39]

Traditional Shuowen Jiezi classification

The Shuowen Jiezi is a character dictionary authored c. 100 CE by the scholar Xu Shen (c. 58 – c. 148 CE). In its postface, Xu analyses what he sees as all the methods by which characters are created, introducing a categorisation scheme which would later become known as the 'six writings' (六書; 六书; liùshū). Mature formulations of this scheme stated that every character belonged to one of six categories, each mentioned with varying emphasis in the Shuowen Jiezi. For nearly two millennia afterwards, this framework would serve as the traditional lens through which characters were analysed throughout the Sinosphere.[40] Xu based most of his analysis on examples of Qin seal script that were written down several centuries before his time—these were usually the oldest forms available to him, but Xu stated that he was aware of the existence of even older forms.[41]

Modern scholars agree that the theory presented in the Shuowen Jiezi is problematic, failing to fully capture the nature of Chinese writing, both in the present, as well as at the time Xu was writing.[42] Traditional Chinese lexicography as embodied in the Shuowen Jiezi presupposes either a phonetic or semantic purpose for every character component, providing implausible etymologies for characters later accepted as being pure signs.[43] However, the model has proven resilient, and it continues to serve as a guide for students in the process of memorising characters. One of the most important innovations in the Shuowen Jiezi is its notion of radicals (部首; bùshǒu; 'section headers'), which are certain visually prominent components by which characters are categorised, allowing for their convenient location in a dictionary. The Shuowen Jiezi uses over 500 radicals; while later dictionaries generally use substantially fewer, the organisational concept itself remains ubiquitous.[44]


Diagram comparing the abstraction of pictographs in cuneiform, Egyptian hieroglyphs, and Chinese characters – from an 1870 publication by Egyptologist Gaston Maspero[45]

According to Qiu, the broadest trend in the evolution of Chinese characters over their history has been simplification, both in graphical shape (字形; zìxíng), the "external appearances of individual graphs", and in graphical form (字体; 字體; zìtǐ), "overall changes in the distinguishing features of graphic[al] shape and calligraphic style, [...] in most cases refer[ring] to rather obvious and rather substantial changes".[46]

Traditional invention narrative

Several Chinese classics indicate that knotted cords were used to keep records prior to the invention of writing.[47][48] Works that reference the practice include chapter 80 of the Tao Te Ching[49] and the "Xici II" chapter within the I Ching.[50]

According to one tradition, Chinese characters were invented during the 3rd millennium BCE by Cangjie, a scribe of the legendary Yellow Emperor. Cangjie is said to have invented symbols called () due to his frustration with the limitations of knotting, taking inspiration from his study of the tracks of animals, landscapes, and the stars in the sky. On the day that these first characters were created, grain rained down from the sky; that night, the people heard the wailing of ghosts and demons, lamenting that humans could no longer be cheated.[51]


A series of inscribed graphs and pictures have been discovered at Neolithic sites in China, including Jiahu (c. 6500 BCE), Dadiwan and Damaidi from the 6th millennium BCE, and Banpo from the 5th millennium BCE. The marks at these sites appear one at a time, and do not seem to imply any greater context. As such, "we do not have any basis for stating that these constituted writing nor is there reason to conclude that they were ancestral to Shang dynasty Chinese characters."[52] However, they do demonstrate sign use in the Yellow River valley from the Neolithic through to the Shang period.[53] A historical connection with the symbols used by the late Neolithic Dawenkou culture (c. 4300 – c. 2600 BCE) in Shandong has been deemed plausible by palaeographers, with Qiu concluding that they "cannot be definitively treated as primitive writing, nevertheless they are symbols which resemble most the ancient pictographic script discovered thus far in China... They undoubtedly can be viewed as the forerunners of primitive writing."[54]

Oracle bone script

Ox scapula inscribed with characters recording the result of divinations

The earliest attested Chinese writing comprises a body of inscriptions produced during the Late Shang period (c. 1250 – 1050 BCE), with the very earliest examples from the reign of Wu Ding dated between 1250 and 1200 BCE.[55][56][57][58] Many of these inscriptions were made on oracle bones—usually either ox scapulae or turtle shells—and recorded official divinations carried out by the Shang royal house. Contemporaneous inscriptions in a related but distinct style were also made on ritual bronze vessels. This oracle bone script (甲骨文; jiǎgǔwén) was first documented in 1899, after specimens were discovered being sold as "dragon bones" for medicinal purposes, with the symbols carved into them identified as early Chinese character forms. By 1928, the source of the bones had been traced to a village near Anyang in Henan—discovered to be the site of Yin, the final Shang capital—which was excavated by a team led by Li Ji (1896–1979) from the Academia Sinica between 1928 and 1937. To date, over 150,000 oracle bone fragments have been found.[59]

Oracle bone inscriptions recorded divinations undertaken to communicate with the spirits of royal ancestors.[59] The inscriptions range from a few characters in length at their shortest, to around 40 characters at their longest. The Shang king would communicate with his ancestors by means of scapulimancy, inquiring about subjects such as the royal family, military success, and the weather. The answers as interpreted would be inscribed on the divination material itself.[59]

Oracle bone script is the direct ancestor of later forms of written Chinese. The oldest known inscriptions already represent a well-developed writing system,[60][61] which suggests an initial emergence predating the late second millennium BCE. Although written Chinese is first attested in official divinations, it is widely believed that writing was also used for other purposes during the Shang, but that the media used in other contexts—likely bamboo and wooden slips—were less durable than bronzes or oracle bones, and have not been preserved.[62]

Zhou scripts

The Shi Qiang pan, a bronze ritual basin dated c. 900 BCE. Long inscriptions on the surface describe the deeds and virtues of the first seven Zhou kings.

The traditional notion of an orderly procession of scripts, with each suddenly invented and displacing the one previous, has been disproven by more recent scholarship and archaeological work. Instead, scripts evolved gradually, with several coexisting in a given area. As early as the Shang, the oracle bone script existed as a simplified form alongside another that was used in bamboo books, in addition to elaborate pictorial forms often used in clan emblems. These other forms have been preserved in what is called bronze script (金文; jīnwén), where inscriptions were made using a stylus in a clay mould, which was then used to cast ritual bronzes. These different techniques generally resulted in character forms that were less angular in appearance than the forms of oracle bone script.[63]

Study of these bronze inscriptions has revealed that the mainstream script underwent slow, gradual evolution during the late Shang, which continued during the Zhou dynasty (c. 1046 – 256 BCE) until assuming the form now known as small seal script (小篆; xiǎozhuàn) within the Zhou state of Qin.[64][65] Other scripts in use during the late Zhou include the bird-worm seal script (鸟虫篆; 鳥蟲篆; niǎochóngzhuàn), as well as the regional forms used in non-Qin states. Examples of these styles were preserved as variants in the Shuowen Jiezi. Historically, Zhou forms were collectively referred to as large seal script (大篆; dàzhuàn), a term which has fallen out of favour due to its lack of precision.[66]

Qin unification and small seal script

Following the Qin's conquest of the other Chinese states and the founding of the imperial Qin dynasty in 221 BCE, the Qin small seal script was standardised for use throughout the entire country under the direction of Chancellor Li Si (c. 280 – 208 BCE). It was traditionally believed that Qin scribes only used small seal script, and the later clerical script was a sudden invention during the early Han. However, more than one script was used by Qin scribes: a little-known, rectilinear, "vulgar" style had also been in use in Qin for centuries prior to the wars of unification. The popularity of this form grew as writing became more widespread. By the Warring States period (c. 475 – 221 BCE), an immature form of clerical script (隶书; 隸書; lìshū) had emerged based on the vulgar form, often called "early clerical" or "proto-clerical".[67]

Clerical script

The proto-clerical script evolved gradually; by the Han dynasty (202 BCE – 220 CE), it had arrived at a mature form, also called 八分 (bāfēn). Bamboo slips discovered during the late 20th century point to this maturation being completed during the reign of Emperor Wu of Han (r. 141–87 BCE). This process, called libian (隶变; 隸變), involved character forms being mutated and simplified, and the components used in many characters being changed or omitted. In turn, the forms of components themselves were regularised as to use fewer, straighter, and more well-defined strokes. As a result, clerical forms largely lack the direct pictorial quality of seal script—the process that produced the clerical form of obscured its origins as a picture of the Moon. As in previous eras, multiple styles were used during the Han, though clerical script dominated.[68]

Around the midpoint of the Eastern Han (25–220 CE), a simplified and easier form of clerical script appeared, which Qiu terms 'neo-clerical' (新隶体; 新隸體; xīnlìtǐ).[69] By the end of the Han, this had become the dominant script used by scribes, though clerical script remained in use for formal works, such as engraved stelae. Qiu describes neo-clerical as a transitional form between clerical and regular script which remained in use through the Three Kingdoms period (220–280 CE) and beyond.[70]

Cursive and semi-cursive

By the late Han, an early form of semi-cursive script (行书; 行書; xíngshū; 'running script') had developed from a cursive form of neo-clerical script. Semi-cursive script was traditionally attributed to Liu Desheng (劉德升; c. 147 – 188 CE), although such attributions often refer to early masters, rather than the inventors of a script. Later analysis had lent credence to popular origins for the semi-cursive script, rather than it being Liu's invention.[71]

An early type of cursive script (草书; 草書; cǎoshū) was also in use as early as 24 BCE, incorporating cursive forms popular at the time, as well as elements from the vulgar writing that originated within Qin. By the Jin dynasty (266–420), the Han cursive style became known as 章草 (zhāngcǎo), sometimes known in English as 'clerical cursive', 'ancient cursive', or 'draft cursive'. Some attribute this name, which uses the character ('orderly'), to the fact that the style was considered more orderly than what would become the modern form of cursive, called 今草 (jīncǎo). This latter form had first emerged during the Jin and was influenced by semi-cursive and regular script; it is exemplified by the work of calligraphers like Wang Xizhi (303–361), often called the "Sage of Calligraphy".[72]

Regular script

A page from a printed Song publication in a regular script typeface, which resembles the handwriting of Tang-era calligrapher Ouyang Xun (557–641)

Regular script (楷书; 楷書; kǎishū), based on clerical and semi-cursive forms, is the predominant form in which characters are written and printed. Its innovations have traditionally been credited to the calligrapher Zhong Yao (c. 151 – 230), who was living in the state of Cao Wei (220–266); he is often called the "father of regular script". The earliest surviving writing in regular script comprises copies of Zhong Yao's work, including at least one copy by Wang Xizhi. Characteristics of regular script include the 'pause' (; dùn) technique used to end horizontal strokes, as well as heavy tails on diagonal strokes made going down and to the right. It developed further during the Eastern Jin (317–420) in the hands of Wang Xizhi and his son Wang Xianzhi (344–386). However, most Jin-era writers continued to use neo-clerical and semi-cursive styles in their daily writing. It was not until the Northern and Southern period (420–589) that regular script became the predominant form.[73] The system of imperial examinations for the civil service established during the Sui dynasty (581–618) required test takers to write in Literary Chinese using regular script, which contributed to the longstanding prevalence of both in later Chinese history.[74]


Structural templates used in compounds, with red marking possible positions for components

As part of the evolution from seal script into clerical script, many bespoke, interlinked character components became discrete and regularised. Moreover, it became typical for each character of a text to be written within a uniform square allotted for it, as a definite series of strokes.[75] Strokes can be considered both the basic unit of handwriting, as well as the writing system's basic unit of graphemic organisation. Individual strokes are generally categorised according to technique and graphemic function.[76]

Characters are constructed according to predictable visual patterns. Some components may have distinct forms when occupying specific positions within a character—for example, the 'KNIFE' component appears as on the right side of characters, but as at the top of characters. The order in which components are drawn within a character is largely fixed. The order in which the strokes of a component are drawn is also set, but may differ by region.[77] This is summed up in practice with a few rules of thumb: generally components and characters are assembled from left-to-right, and from top-to-bottom, with 'enclosing' components started before, then closed after, the components they enclose.[78]

For example, is made up of two components, with each in turn composed of three strokes, drawn in the following order:

Character Component Stroke
(1) ㇔
(2) ㇔
(3) 乛
(4) 乛
(5) ㇚
(6) ㇐

Variants and allographs

Variants of the Chinese character for 'turtle', collected c. 1800 from printed sources. The traditional form (left) is used in Taiwan and Hong Kong. The simplified form is used in China, and the simplified form is used in Japan.

Over a character's history, graphical variants called allographs emerge via several processes while retaining the semantics of previous forms. This is comparable to the visually distinct double-storey ⟨a⟩ and single-storey ⟨ɑ⟩ forms equally representing the Latin letter A. Character variants also emerge for aesthetic reasons, to make handwriting easier, or to correct what the writer perceives to be errors in a character's form.[79] For example, individual components may be replaced with visually, phonetically, or semantically similar alternatives.[80] The boundary between character structure and style, and thus between allographs of the same character versus semantically distinct characters, is often non-trivial or unclear.[81]


From the earliest inscriptions until the 20th century, texts were generally laid out vertically—with characters written from top to bottom in columns, arranged from right to left. A horizontal writing direction—with characters written from left to right in rows, arranged from top to bottom—only became predominant in the Sinosphere during the 20th century, as a result of Western influence.[82]

Methods of writing

Ordinary handwriting on a lunch menu in Hong Kong

The earliest attested Chinese characters were carved into bone, or marked using a stylus in clay moulds used to cast ritual bronzes. They were also written in ink onto slips of silk, wood, and bamboo. The invention of paper for use as a writing medium occurred during the 1st century CE, and is traditionally credited to Cai Lun (d. 121 CE).[83] There are numerous styles, or scripts (; ; shū) in which characters can be written, including the historical forms like seal script and clerical script. Most styles used throughout the Sinosphere originated within China, though they may display regional variation. Styles that have been created outside of China tend to remain localised in their use: these include the Japanese edomoji and Vietnamese lệnh thư scripts.[84]


Chinese calligraphy of mixed styles by Song poet Mi Fu (1051–1107)

Calligraphy was traditionally one of the four arts to be mastered by Chinese scholars, considered to be an artful means for expressing their thoughts and teachings. Chinese calligraphy typically uses an ink brush in accordance with a deliberately minimalist set of rules. Strict regularity is not required, as character forms may be accentuated to evoke a variety of aesthetic effects.[85]

Printing and typefaces

Woodblock printing was invented in China between the 6th and 9th centuries, followed by the invention of movable type by Bi Sheng (972–1051) during the 11th century.[86] The increasing use of print during the Ming (1368–1644) and Qing dynasties (1644–1912) led to considerable standardisation in character forms, which prefigured later script reforms during the 20th century. This print orthography, exemplified by the 1716 Kangxi Dictionary, was later dubbed the jiu zixing ('old character shapes').[87] Chinese characters may be printed or displayed using different typefaces,[88] of which there are four broad classes in use:[89]

  • Song typefaces (宋体; 宋體), also called "Ming" (明体; 明體)—with the name "Song" generally used for simplified Chinese typefaces, and "Ming" for others—broadly correspond to Western serif styles. Broadly, Song typefaces are in the tradition of historical Chinese print; both names refer to eras where printing is considered to have flourished in the Sinosphere. While most typefaces during the Song dynasty (960–1279) resembled the regular script style of a particular calligrapher, most modern Song typefaces are designed for general purpose use, and with an emphasis on neutrality.
  • Sans-serif typefaces are called 'black form' (黑体; 黑體; hēitǐ) in Chinese and 'Gothic' (ゴシック体) in Japanese. Sans-serif strokes are rendered as simple lines of even thickness.
  • "Kai" typefaces (楷体; 楷體) directly imitate handwritten regular script.
  • Fangsong typefaces (仿宋体; 仿宋體)—called "Song" in Japan—correspond to semi-script styles in the Western paradigm.

Use with computers

The first four characters of the Thousand Character Classic in different typefaces and historical styles. From right to left: seal script, clerical script, regular script, Ming, and sans-serif

Before computers became ubiquitous, earlier electro-mechanical communications devices like telegraphs and typewriters were originally designed for use with alphabets, often by means of alphabetic text encodings like Morse code and ASCII. Adapting these technologies for use with a writing system comprising thousands of distinct characters was non-trivial.[90]

Input methods

Predominantly, Chinese characters are input using methods which enable the use of a standard computer keyboard. Phonetic encodings are usually based on existing transcription schemes, such as pinyin or bopomofo for Mandarin, and Jyutping for Cantonese. To write a given character using a phonetic encoding, one types out its phonetic transcription, possibly followed by a number representing the tone: for example, 香港 ('Hong Kong') could be input as xiang1gang3 using pinyin, or as hoeng1gong2 using Jyutping.[91]

Input codes for characters may also be based on their form. Using the existing rules of stroke order and how components are assembled into whole characters,[92] characters may also be assigned a unique shorthand using one of several methods, potentially increasing the speed of typing. Popular form-based encoding methods include Wubi on the mainland, and Cangjie—named after the mythological inventor of writing—in Taiwan and Hong Kong. For example, ('border') is encoded as NGMWM using the Cangjie method, with each letter corresponding to the components 弓土一田一, with some omitted according to predictable rules.[93]

Contextual constraints may be used to improve candidate character selection. When ignoring tones, 大学; 大學 and 大雪 are both transcribed as daxue, the system may prioritize which candidate should appear first based on the surrounding context.[94]

Encoding and interchange

The Unicode Standard is the predominant text encoding worldwide; according to the philosophy of the Unicode Consortium, each distinct graph is assigned a number in the standard; specifying a particular allograph is a choice made by the typeface rendering the text.[95] Unicode's Basic Multilingual Plane (BMP) represents the standard's 216 smallest code points.[96] Of these, 20992 (or 32%) are assigned to "CJK Unified Ideographs", a designation comprising characters used in each of the Chinese family of scripts. As of version 15.1, Unicode defines a total of 97670 Chinese characters.[97]

Vocabulary and adaptation

Chinese writing first emerged during the historical stage of the spoken language known as Old Chinese. Most characters correspond to a morpheme that historically functioned as a stand-alone Old Chinese word.[98] Classical Chinese is the form of written Chinese used in the classic works of Chinese literature from roughly the 5th century BCE until the 2nd century CE.[99] This prestige form was imitated in the writing of later authors even as their spoken languages began to diverge across the country. This later form, referred to as "Literary Chinese", remained the predominant written language in China until the 20th century. Its use in the Sinosphere was loosely analogous to that of Latin in pre-modern Europe. While not static over time, Literary Chinese retained many properties of spoken Old Chinese. Texts were read aloud using literary and colloquial readings that varied by region, informed by the local spoken vernacular. With numerous sound mergers occurring in different varieties over time, polysyllabic words increasingly served to reduce ambiguity between words that had become homophonic.[100] Much of the modern vocabulary used in vernacular Chinese varieties consists of compound words comprising multiple morphemes.[101] It has been estimated that over two-thirds of the 3,000 most common words in modern Standard Chinese are polysyllabic, with the vast majority of these being two-syllable words.[102] While written vernacular Chinese existed in various historical forms following the Classical period, it was only widely adopted as a replacement for Literary Chinese in the early 20th century, as a part of the New Culture Movement.[103]

Over time, writing had been introduced to surrounding countries alongside other facets of Chinese culture. The non-Sinitic-speaking elites of areas including Vietnam, Korea, Japan, and the Ryukyu Islands[104] each embraced writing for record-keeping, histories, and official communications, forming what is now called the Sinosphere.[105] Chinese, Japanese, Korean, and Vietnamese each belong to their own language families, and tend to function differently from one another. Reading systems were devised that enabled non-Chinese speakers to interpret texts in terms of their native language, a phenomenon that has been described variously as either a form of diglossia, or as a process of translation into and out of Chinese. The literary culture that developed in this context was less directly tied to a specific spoken language than others that wrote using phonetic scripts. This comparative lack of phonocentrism is exemplified by the cross-linguistic phenomenon of brushtalk, where mutual literacy allowed speakers of different languages to engage in face-to-face conversations.[106][107]

After the introduction of Literary Chinese, populations throughout the Sinosphere also began using characters to write local languages directly, though in each case Literary Chinese generally remained dominant until the modern era. Characters were used to represent both native vocabulary as well as the loanwords each language borrowed from Chinese, referred to as Sino-Xenic vocabulary. Characters may have native readings, Sino-Xenic readings, or both.[108] Comparison of Sino-Xenic vocabulary across the Sinosphere has been useful in the reconstruction of Middle Chinese phonology.[109] Literary Chinese was used in Vietnam during the millennium of Chinese rule that began in 111 BCE. Around the 13th century, characters were adapted to write Vietnamese, creating the chữ Nôm script. Writing also arrived in Korea during the 2nd century BCE and spread throughout the country over the following three centuries. From Korea, writing then spread to Japan during the 5th century CE.[110] Both Korean and Japanese were being written with characters by the 6th century.[111] By the late 20th century, Vietnam and Korea had largely replaced both Literary Chinese and Chinese characters with alphabets designed to write their local languages, leaving Japanese as the only major non-Sinitic language normally written with Chinese characters.[112]

Old Chinese

Line drawings of various ordinary objects such as books, baskets, buildings, and musical instruments are displayed beside their corresponding Chinese characters
Excerpt from a 1436 primer on Chinese characters

Words in Old Chinese were generally monosyllabic; as such, each character denoted an independent word.[113] Affixes could be added to form a new word, which was often written with the same single character. In many cases, the pronunciations then diverged due to the systematic sound changes caused by the affixes. For example, many modern readings reflect the departing tone present in Middle Chinese, which many scholars now believe is a reflex of a derivational suffix /*-s/ that served a range of semantic functions in Old Chinese—possibly the only example of inflectional morphology extant in what was otherwise an analytic language:[114][115]

Evolution of character senses as caused by the Old Chinese 去聲 (qùshēng)
Character OC[δ] MC[β] mod. Gloss
[116] *drjon drjwen' chuán 'to transmit'
*drjons drjwenH zhuàn 'a record'
[116] *maj ma 'to grind'
*majs maH 'grindstone'
宿[117] *sjuk sjuwk 'to stay overnight'
*sjuks sjuwH xiù 'celestial mansion'
[118] *hljot sywet shuō 'speak'
*hljots sywejH shuì 'exhort'

Another common sound change occurred between voiced and voiceless initials, though the phonemic voicing distinction has disappeared in most modern varieties. This is believed to reflect an Old Chinese de-transitivising prefix, but scholars disagree on whether the voiced or voiceless form reflects the original root. Each pair of examples below reflects two words of opposite transitivity.

Evolution of character transitivity pairs from Old Chinese
Character OC[δ] MC[β] mod. Gloss
[119] *kens kenH jiàn 'to see'
*gens henH xiàn 'to appear'
[119] *prats pæjH bài[g] 'to defeat'
*brats bæjH 'to be defeated'
[120] *tjat tsyet zhé 'to bend'
*djat dzyet shé 'to be broken by bending'

Vernacular Chinese varieties

Multi-syllable words began entering the Chinese language during the Western Zhou dynasty (c. 1046 – 771 BCE). An estimated 25–30% of the vocabulary used in Warring States period (c. 475 – 221 BCE) texts consists of multi-syllable words. Over time, the introduction of multi-syllable vocabulary into vernacular varieties of Chinese has accelerated—necessitated by phonetic changes that have increased the number of homophones.[121] The most common process of Chinese word formation after the Classical period has been to create compounds of existing words. Words have also been created by appending affixes to words, by reduplication, and by borrowing words from other languages.[122] While polysyllabic words are generally written with one character per syllable, abbreviations are occasionally used.[34]

There are a number of 'dialect characters' (方言字; fāngyánzì) that are not used in standard written vernacular Chinese—a form corresponding to spoken Standard Chinese, in turn based on the Beijing dialect of Mandarin—but are found in other spoken varieties. The most complete example of an orthography based on a variety other than Standard Chinese is Written Cantonese. However, for other varieties this level of completeness has been described by Victor H. Mair as being "almost unthinkable".[123] It is common to use standard characters to transcribe previously unwritten words in Chinese dialects when obvious cognates exist. When no obvious cognate exists due to factors like irregular sound changes, semantic drift, or an ultimate origin in a non-Chinese language substratum or loanword, characters are borrowed to transcribe the word—either ad hoc, or according to the rebus principle.[124] These new characters are generally phono-semantic compounds, although there are examples of compound ideographs, e.g. ('bad').[E] In Taiwan, there is also a body of official characters used to represent Taiwanese Hokkien and Hakka. An example of an Hakka vernacular character is (cii11; 'kill').[F]


In Japanese, Chinese characters are referred to as kanji. Historically, Japanese speakers would adapt the syntax and vocabulary of Literary Chinese texts to reflect their Japanese-language equivalents while reading. Writing essentially involved the inverse of this process, and resulted in an ordinary Literary Chinese text. Beginning in the Nara period (710–794), a system of annotations called kanbun was often employed to aid readers.[125] When adapted to write Japanese, characters were used to represent both Sino-Japanese vocabulary loaned from Chinese, as well as the corresponding native synonyms. Most kanji were subject to both borrowing processes, and as a result have both Sino-Japanese and native readings, known as on'yomi and kun'yomi respectively. Moreover, many kanji were borrowed multiple times from different varieties of Chinese, resulting in multiple distinct on'yomi readings.[126]

The Japanese writing system is a mixed script, and has also incorporated syllabaries called kana to represent phonetic units called moras, rather than morphemes. Prior to the Meiji era (1868–1912), writers used certain kanji to represent their sound values instead, in a system known as man'yōgana. Starting in the 9th century, specific man'yōgana were graphically simplified to create two distinct syllabaries called hiragana and katakana, which slowly replaced the earlier convention. Modern Japanese retains the use of kanji to represent most word stems, while kana syllabograms are generally used for grammatical affixes, particles, and loanwords. The forms of hiragana and katakana are visually distinct from one another, owing in large part to different methods of simplification: katakana were derived from smaller components of each man'yōgana, while hiragana were derived from the cursive forms of man'yōgana in their entirety. In addition, the hiragana and katakana for some moras were derived from different man'yōgana.[127]

Due to Japanese being a synthetic language, many kanji have multi-syllable readings. For example, the kanji has a native kun'yomi reading of katana. In different contexts, it can also be read with the on'yomi reading , such as in the Chinese loanword 日本刀 (nihontō; '[Japanese] sword'), with a pronunciation corresponding to that in Chinese at the time of borrowing. Prior to the invention of katakana, loanwords were typically written with unrelated kanji with on'yomi readings matching the syllables in the loanword. These spellings are called ateji: for example, 亜米利加 was the ateji form for modern アメリカ (Amerika; 'America'). As opposed to man'yōgana used solely for their pronunciation, ateji still corresponded to specific Japanese words. Some are still in use: the official list of jōyō kanji includes 106 ateji readings.[128]


In Korean, Chinese characters are known as hanja. Literary Chinese was used in Korea as early as the 2nd century BCE. During the Three Kingdoms period (57 BCE – 668 CE), a form of Korean-language literature comprising mostly Sino-Korean vocabulary called idu was also written with characters. Similarly to kanbun in Japan, writers in Korea also developed a system of phonetic annotations for Literary Chinese called gugyeol—which was devised during the Goryeo period (918–1392), but entered widespread use during the later Joseon period (1392–1897).[129] Although the hangul alphabet was invented by the Joseon king Sejong (r. 1418–1450) in 1443, it was not taken up by the Korean literati, and its use did not become widespread until the late 19th century.[130][131] Much of the Korean lexicon, especially technical and academic vocabulary, consists of Chinese loanwords. While hanja were usually only used to write this Sino-Korean vocabulary, there is evidence that vernacular readings were sometimes used.[109] However, due to the lack of tones in spoken Korean, there are many Sino-Korean words that are homophones with identical hangul spellings. For example, the phonetic dictionary entry for 기사 (gisa) yields more than 30 different entries. This ambiguity had historically been resolved by also including the associated hanja. While still sometimes used for Sino-Korean vocabulary, it is much rarer for native Korean words to be written using hanja.[132] When learning new characters, Korean students are instructed to associate each one with both its Sino-Korean pronunciation, as well as a native Korean synonym.[133] Examples include:

Example Korean dictionary listings
Hanja Hangul Gloss
Native translation Sino-Korean
; mul ; su 'water'
사람; saram ; in 'person'
; keun ; dae 'big'
작을; jakeul ; so 'small'
아래; arae ; ha 'down'
아비; abi ; bu 'father'
나라 이름; nara ireum ; han 'Korea'

South Korea

Hanja are still used in South Korea, though not to the extent that kanji are used in Japan. They remain in use in place names, newspapers, and to disambiguate homophones. They are also used in the practice of calligraphy. Use of hanja in South Korea generally retains connotations of classical Confucian education, and knowledge of characters is considered a marker of class in Korean society. Their use is politically contentious, with official policies regarding their prominence in education having vacillated since independence. In general, there is a trend toward the exclusive use of hangul in ordinary contexts.[134] Some see the total abandonment of hanja as a "purification" of the national language and culture, while others support an increase in use to levels previously seen during the 1970s and 1980s. Students in grades 7–12 are presently taught 1,800 characters,[135] albeit with a principal focus on simple recognition and attaining sufficient literacy to read a newspaper.[131] Hanja are also comparatively prominent in Korean academia, as the vast majority of Korean documents, history, and literature—such as the Veritable Records of the Joseon Dynasty (1392–1865)—were written in Literary Chinese. Therefore, a working knowledge of Chinese characters is still important for anyone wishing to interpret and study older Korean texts, or anyone who wishes to read scholarship in the humanities. Working knowledge of hanja is also useful for understanding the etymology of Sino-Korean vocabulary.[136]

North Korea

A 1949 law in North Korea apparently banned the use of all so-called foreign languages, which has been interpreted as including hanja by outside observers.[132] Due to the country's isolation, accurate reports about its use of hanja are difficult to obtain. A North Korean textbook for university history departments published in 1971 contained 3,323 distinct characters, and in the 1990s North Korean school children were still expected to learn 2,000 characters.[137] A 2013 textbook appears to integrate the use of hanja in secondary school education.[138] It has been estimated that North Korean students learn around 3,000 hanja by the time they graduate university. In some cases, characters appear in advertisements and newspapers, but their cultural use is narrower than in the South, and mostly restricted to dictionaries and textbooks.[139]


The first two lines of the classic Vietnamese epic poem The Tale of Kieu, written in both chữ Nôm and the Vietnamese alphabet
  Borrowed characters representing Sino-Vietnamese words
  Borrowed characters representing native Vietnamese words
  Invented chữ Nôm representing native Vietnamese words

Chinese characters are called chữ Hán (𡨸漢), chữ Nho (𡨸儒), or Hán tự (漢字) in Vietnamese. Literary Chinese was used for all formal writing in Vietnam until the early 20th century. The country's oldest attested writing is a Literary Chinese inscription in Thanh Hóa dated to 618, erected by local officials of the Chinese Sui dynasty (581–618).[140] A script using characters to write Vietnamese called chữ Nôm emerged around the 13th century, and was initially used to record Vietnamese folk literature. The oldest attested chữ Nôm dates to 1209, comprising half of a bilingual Buddhist inscription alongside the corresponding Literary Chinese. Some newly coined chữ Nôm characters are phono-semantic compounds corresponding to spoken Vietnamese syllables.[141] The resulting system was highly complex, and literacy was limited to a small segment of the Vietnamese population, never more than 5%.[142] Both Literary Chinese and chữ Nôm fell out of use during the French colonial period, and were gradually replaced by the Latin-based Vietnamese alphabet, now the country's primary writing system.[143]

A page from a bilingual copy of the Sutra of Filial Piety, with Literary Chinese alongside an early form of chữ Nôm representing Old Vietnamese pronunciation. Pairs of characters are sometimes used to represent the consonant clusters present in Old Vietnamese.

Other languages

Several minority languages of South and Southwest China have been written with scripts using both borrowed and locally created characters. The most extensive of these is sawndip, a script created to write the Zhuang languages of Guangxi. Sawndip is still used, despite the Chinese government encouraging its replacement with a Latin-based Zhuang alphabet. One survey estimated sawndip as having twice as many users as the official alphabet.[144] Other non-Sinitic languages of China written with Chinese characters include Miao, Yao, Bouyei, Mulam, Kam, Bai, and Hani. Each of these languages are now written with Latin-based alphabets in official contexts.[145]

Graphically derived scripts

Between the 10th and 13th centuries, dynasties founded by non-Han peoples in northern China also created scripts for their languages that were inspired by Chinese characters, but did not use them directly: these included the Khitan large script, Khitan small script, Tangut script, and Jurchen script.[145] Nüshu was a script used by Yao women to write the Xiangnan Tuhua language,[146] and bopomofo is a semi-syllabary invented during the 20th century to phonetically represent Standard Chinese;[147] both use forms graphically derived from Chinese characters. Other scripts within China that have adapted some characters but are otherwise distinct include the Geba syllabary used to write the Naxi language, the script for the Sui language, the script for the Yi languages, and the syllabary for the Lisu language.[145]

Excerpt from the Secret History of the Mongols featuring Chinese characters used to transcribe Mongolian, with glosses to the right of each row

Chinese characters have also been repurposed phonetically to transcribe the sounds of non-Chinese languages. For example, in addition to the Persian and Arabic scripts, the Mongolian language was also written with Chinese characters. This system was used by the only manuscripts of the 13th-century Secret History of the Mongols that have survived from the medieval era.[148] According to the 19th-century missionary John Gulick:

The inhabitants of other Asiatic nations, who have had occasion to represent the words of their several languages by Chinese characters, have as a rule used unaspirated characters for the sounds g, d, b. The Muslims from Arabia and Persia have followed this method ... The Mongols, Manchu, and Japanese also constantly select unaspirated characters to represent the sounds g, d, b, and j of their languages. These surrounding Asiatic nations, in writing Chinese words in their own alphabets, have uniformly used g, d, b, etc., to represent the unaspirated sounds.[149]

Special cases

Contractions and abbreviations

Some compound words and set phrases have been represented by single-character contractions, often considered ligatures instead of characters representing a single morpheme. They are often used in handwriting or for decorative purposes, but are sometimes seen in print. They are sometimes called 合文 (héwén) in Chinese. An example is the 'double happiness' character () formed as a ligature of 喜喜, and referred to by its disyllabic name 双喜; 雙喜 (shuāngxǐ).[G] Numerals are also sometimes written as ligatures—e.g. 廿 (niàn; 'twenty'), normally read as 二十 (èrshí) in Standard Chinese.[H] In oracle bone script, personal names, ritual items, and even whole phrases were contracted into single characters: for example, 受又 (shòu yòu; 'receive blessings') becomes (yòu). An example found in medieval manuscripts writes 'bodhisattva' (菩薩; púsà) as a contracted character, composed of four arranged in a 2×2 grid, derived from the 'GRASS' components within the original characters. Another example is 𱕸; (tuān), a contraction of 图书馆; 圖書館 (túshūguǎn; 'library').[150]

Multi-syllable morphemes

A handful of native Chinese morphemes are two syllables in length; some of these date back to the Classical period.[151] They are often written with a pair of phono-semantic compounds that share a common semantic component. For example, the first character of 蝴蝶 (húdié; 'butterfly') and the second character of 珊瑚 (shānhú; 'coral') use 'INSECT' and 'JADE' for their respective semantic components, while sharing the phonetic component (). None of these characters are used independently, except as poetic abbreviations.[I][152] Another example is the name for the pipa, a type of lute. The characters 枇杷 were used to write the names of both the instrument and of the loquat, a fruit named for its similar shape to the instrument. These characters use the semantic 'HAND' component, referencing the upward and downward strokes made while playing the instrument. For the fruit's name, the semantic component was later switched to 'TREE' to form 枇杷, while the instrument's name became 琵琶, incorporating the top half of ('guqin') into both characters.[J]

The erhua phenomenon in some varieties of Mandarin is reflected in writing by means of a ; (ér) suffix. As such, some monosyllabic words may be written with two characters, such as huār (花儿; 花兒; 'flower').[153]

Rare and complex characters

Extremely stroke-rich characters tend to be rare. One of the most complex characters included in modern Chinese dictionaries is (nàng; 'snuffle') with 36 strokes.[K] Stroke-rich characters are often composed of other characters in triplicate or quadruplicate, such as the triplicated (bìng) with 39 strokes, and the quadrupled (bèng) with 52, both meaning 'the loud noise of thunder'. (; 'appearance of a dragon in flight') consists of the 'DRAGON' radical in triplicate, for a total of 48 strokes. In Japanese, an 84-stroke kokuji exists: 𱁬, normally read taito. It is composed of the 'cloud' character atop the aforementioned triple-'dragon' character, also meaning 'appearance of a dragon in flight'.[154][155]


Each modern polity that writes with Chinese characters has standardised their forms, pronunciation, and stroke orders. Most characters have a single standard stroke order, but some may differ by region, occasionally resulting in different stroke counts. The latest published standards for character forms are:

Polity Standard Characters Latest revision
 China Table of General Standard Chinese Characters 8105 2013[156]
 Hong Kong List of Graphemes of Commonly-Used Chinese Characters[h] 4762 2012[157]
Reference Glyphs for Chinese Computer Systems in Hong Kong[i] 2016[158]
 Taiwan[j] Chart of Standard Forms of Common National Characters 4808 1983[160]
Chart of Standard Forms of Less-Than-Common National Characters 6341 1983[161]
Chart of Rarely-Used National Characters 18388 2017[159]
 Japan Jōyō kanji 2136 2010[162]
 South Korea Basic Hanja for Educational Use 1800 2000[163]

Simplified characters

Regional forms of the character in the Noto Serif typeface family. From left to right: forms used in mainland China, Taiwan, and Hong Kong (top), and in Japan and Korea (bottom)
The first official list of simplified characters, published in 1935 and consisting of 324 characters[164]

Though most closely associated with the People's Republic, the idea of a mass simplification of character forms first gained traction in China during the early 20th century. In 1909, the educator and linguist Lufei Kui (1886–1941) formally proposed the use of simplified characters in education for the first time. Over the following years—marked by the 1911 Xinhai Revolution that toppled the Qing dynasty, followed by social and political discontent that erupted in the 1919 May Fourth Movement—many anti-imperialist intellectuals in China began to see the country's writing system as a serious impediment to its modernisation. Many began advocating for the script to be reformed, or even entirely replaced by an alphabet, with the goal of increasing literacy rates throughout the country.[165]

During the Republican era (1912–1949), discussions on script reform took place within both the ruling Kuomintang (KMT) party, as well as the Chinese Communist Party (CCP). In 1935, the Republican government published the first official list of simplified forms, which consisted of 324 characters collated by Peking University professor Qian Xuantong (1887–1939). However, strong opposition within the party resulted in the list being rescinded in 1936.[166]

Traditional ()
Simplified ()
Comparison of strokes between character forms,[k] showing systematic simplification of the component 'GATE'

Cursive script served as a source for many simplified character forms; others had already been used in print, though usually not in formal works. The broader initiative of script reform was ultimately inherited by the Communists, who began work on script reform in earnest following the proclamation of the People's Republic of China in 1949. Since their introduction in the 1950s, the PRC has officially encouraged the use of simplified characters on the mainland. The Republic of China, as well as Hong Kong and Macau—still under colonial rule at the time—were not subject to the reforms.[167]

People's Republic of China

Simplified forms were collated and standardised by the PRC during the 1950s and 1960s. In 1958, Premier Zhou Enlai (1898–1976) announced the government's intent to focus on character simplification, as opposed to replacing characters with Hanyu Pinyin, which had been introduced that same year.[165] The PRC formed a Script Reform Committee which began pursuing official character simplification during the 1950s, beginning with a list of 2,238 characters published in 1956, which was then largely affirmed with minor corrections in 1965. The majority of these characters were drawn from conventional abbreviations or ancient forms.[168] For example, the orthodox character ('to come') was written as in the earlier clerical script. The latter form used one fewer stroke, and was thus adopted as a simplification. In addition to simplifying the forms of many commonly-used characters, the reforms also reduced the total number of distinct characters by merging some forms together.[169] The ('cloud') character was written as in the ancient oracle bone script. The simpler form remained in use as a phonetic loan meaning 'to say'; it was replaced in its original sense of 'cloud' with a form that added a semantic 'RAIN' component. The simplified forms of these two characters have been merged into .[L]

A second round of simplified characters was promulgated in 1977, but was poorly received by the public and quickly fell out of official use. It was ultimately formally rescinded in 1986. The second round of simplifications were unpopular in large part because the vast majority of its forms were completely new, in contrast to the many familiar variants present in the first round. Two revised lists of simplified forms were published in 1988: the List of Commonly Used Characters in Modern Chinese with 2,500 common characters and 1,000 less common characters, and the Chart of Generally Utilised Characters of Modern Chinese with 7,000 characters, including those in the smaller list. In 2013, the revised Table of General Standard Chinese Characters supplanted the 1988 lists as the standard, including a total of 8,105 characters.[170][171] The Chinese Proficiency Test (HSK) covers 2,663 characters and 5,000 words at its highest level, while the Chinese Proficiency Grading Standards for International Chinese Language Education would cover 3,000 characters and 11,092 words at its highest level.[172][173][174]

Southeast Asia

Singapore underwent three successive rounds of character simplification promulgated by the Ministry of Education, with the first two including some forms that differed from those promoted in mainland China. The first round was published in 1969, and consisted of 502 simplified characters. The second round in 1974 consisted of 2287 simplified characters, including 49 that differed from those in the PRC; these were removed in the final round in 1976.[175] In 1993, Singapore adopted the revisions made in mainland China in 1986. Unlike on the mainland, where personal names can only be registered using simplified characters, parents in Singapore have the option of registering their children's names using traditional characters.[176]

Malaysia uses simplified characters in Chinese-language schools. Chinese-language newspapers in the country are published in either simplified or traditional characters—often, headlines are printed with traditional forms, and the body with simplified forms.[177]

Most Chinese-language schools and businesses in the Philippines use traditional characters. Recently, more Chinese-language schools have switched to simplified characters, with many schools using some combination of the two. The country's Chinese-language newspapers largely retain traditional characters, as their readership mostly draws from the older generations.[178]

Traditional characters

Regional allographs of in Chinese, Japanese, Korean, and Vietnamese styles


In Taiwan, the Ministry of Education's Chart of Standard Forms of Common National Characters lists 4,808 characters; the Chart of Standard Forms of Less-Than-Common National Characters lists another 6,341 characters. The Test of Chinese as a Foreign Language (TOCFL) covers 8,000 words at its highest level. The Taiwan Benchmarks for the Chinese Language (TBCL), a guideline designed to describe levels of Chinese language proficiency, covers 3,100 characters and 14,425 words at the highest level.[179][180]

Hong Kong

Hong Kong uses traditional characters; the Education and Manpower Bureau's List of Graphemes of Commonly-Used Chinese Characters contains 4,762 characters, and is intended for use in elementary and junior secondary education.[157]

North America

Most Chinese-language newspapers and signage in the United States and Canada use traditional characters.[181] There is some effort to get municipal governments to implement more simplified character signage due to recent immigration from mainland China.[182]


After World War II, the Japanese government instituted its own program of orthographic reforms. Some characters were assigned simplified forms called shinjitai; the older forms were then labelled kyūjitai. Inconsistent use of different variant forms was discouraged, and lists of characters to be taught to students at each grade level were developed. The first of these was the 1850-character tōyō kanji list in 1945, later replaced by the 1945-character jōyō kanji list in 1981. In 2010, the list of jōyō kanji was revised, expanding it to a total of 2136 characters. The Japanese government restricts characters that may be used in names: in addition to the jōyō kanji, names may also include the jinmeiyō kanji, an additional list of 983 characters historically prevalent in names.[183]


The South Korean Basic Hanja for Educational Use is a set of 1,800 characters standardised in 1972, with the first 900 hanja taught to middle school students, and the rest taught to high school students.[163] In March 1991, the Supreme Court of Korea published the 2,854-character Table of Hanja for Use in Personal Names.[184] The list expanded gradually: by 2015 it included a total of 8,142 hanja.[185]


Cumulative frequency of simplified Chinese characters in modern text[186]

Dozens of schemes have been devised for indexing Chinese characters and sorting them into dictionaries. Most of these are specific to the dictionary for which they were invented, and relatively few have seen widespread use. Often, character dictionaries incorporate several mechanisms by which users may locate entries. Methods for arranging Chinese dictionaries are divided into form-based orders that sort by visual properties, sound-based orders usually corresponding to an extant transliteration system, and meaning-based orders.[187] The Erya dictionary (c. 3rd century BCE) predated the graphical approach found in the Shuowen Jiezi, and instead organised characters based on their meaning.[188]

Many character dictionaries are indexed using radical-and-stroke sorting, where characters are grouped by radical, with each group in turn sorted by stroke number. Classification by radical was introduced by the Shuowen Jiezi, which used 540 radicals. The set of 214 Kangxi radicals were popularised by the 1716 Kangxi Dictionary but were originally introduced in the Zihui in 1615. Another form-based system is the four-corner method, where characters are classified according to the shapes at each of the character's corners. In modern Chinese, characters and words are also ordered by their frequency of use within a given corpus. Stroke-based sorting techniques include sorting by stroke count and stroke order. Some modern Chinese dictionaries arrange character entries alphabetically according to their pinyin spelling, while also providing a traditional radical-based index.[189]

Studies suggest that literate individuals in China have an active vocabulary of three to four thousand characters, while specialists in fields like classical literature or history may have a working vocabulary of five to six thousand.[190] Estimates of the total number of characters in use can be derived through analysis of encoding schemes and dictionaries: according to mainland Chinese, Taiwanese, Hong Kong, Japanese, and Korean sources, there are around 15,000 characters in the modern lexicon.[191] Concerning characters specifically invented for regional use, there are roughly 1,500 Japanese kokuji,[192] around 200 Korean gukja,[193] over 10,000 sawndip used to write Zhuang, and almost 20,000 Nôm characters created in Vietnam.[194]

See also


  1. ^ 漢字; simplified as 汉字.
    Chinese pinyin: Hànzì; Wade–Giles: Han4-tzŭ4; Jyutping: Hon3 zi6.
    Japanese rōmaji: kanji; Korean romanisation: Hanja; Vietnamese: Hán tự.
  2. ^ Zev Handel lists:[2]
    1. Sumerian cuneiform emerging c. 3200 BCE
    2. Egyptian hieroglyphs emerging c. 3100 BCE
    3. Chinese characters emerging c. 13th century BCE
    4. Maya script emerging c. 1 CE
  3. ^ According to Handel: "While monosyllabism generally trumps morphemicity—that is to say, a bisyllabic morpheme is nearly always written with two characters rather than one—there is an unmistakable tendency for script users to impose a morphemic identity on the linguistic units represented by these characters."[10]
  4. ^ Baxter provides the reconstructed Old Chinese pronunciations of this pair as /*ɡ-ljuŋ/[30] and /*k-ljuŋ/[31] respectively.
  5. ^ Originally a pictograph of a vulva. The Shuowen Jiezi gives the origin of as 女陰也; 'female yin [organ]'. By the 6th century BCE, the original definition had fallen into disuse. The use of the character in the definition itself is as a declarative sentence-final particle, and all appearances of the character in Classical texts from that time forward use it as a phonetic loan for the grammatical particle. In addition to being a Classical particle, in modern vernacular Chinese has acquired a meaning of 'also'.
  6. ^ a b was originally the third-person personal pronoun regardless of gender or animacy in Chinese. The feminine-specific form only emerged in the early 20th century, after the bulk of Japanese orthographic borrowing had already occurred.
  7. ^ In this case, the pronunciations have converged in Standard Chinese, but they have not in other varieties.
  8. ^ Reference for education
  9. ^ Reference for font foundries
  10. ^ Collectively the Standard Form of National Characters, which has been published online in full by Taiwan's Ministry of Education since 2017.[159]
  11. ^ The character ; is a plural suffix particle for pronouns.
  1. ^ Baxter–Sagart (2014) reconstruction of Old Chinese.
  2. ^ a b c Baxter's transcription for Middle Chinese.
  3. ^ Standard Chinese and Cantonese readings are given in pinyin and Jyutping, respectively. Japanese on'yomi readings are given in rōmaji.
  4. ^ a b Baxter (1992) reconstruction of Old Chinese.



  • Chinese Text Project Dictionary – Comprehensive character dictionary, including data for all Chinese characters within Unicode, and exemplary examples of use in Classical Chinese texts
  • – Character lookup by component description, character etymology, phonology, orthography, and dictionary
  • Chinese Etymology by Richard Sears
