The Fandom Coders Encyclopædia

text

Computing concept and file format

Text, in computing terms, is a sequence of bytes which have an unambiguous mapping to units of human writing. These units are called characters, although their precise definition varies. In computer languages, segments of text are represented by data structures known as strings. Text may also be saved to the file system as a text file.

§ Text Files

Text files are exactly those files whose bytes are intended to be directly interpreted as symbols of human writing. Not all files which represent human writing are text files; for example, a PNG of street graffiti contains human writing, but because the bytes of the file represent image data (and not letters or other symbols), it is not a text file. Additionally, the textual representation of some text files may not be their primary or most useful one; for example, an SVG file is a kind of text file, but it is usually rendered as an image.

Because all text files may be represented as a linear stream of human writing (even if they have other representations), any program which knows how to translate bytes into writing can open any kind of text file. When the goal of this program is to display and edit the text file for a human user, the program is called a text editor.

Numerous text editors exist for every platform, and their accessibility and ease‐of‐use have made human‐readable text files a core part of both the Unix philosophy and the Web.

§ Text Encodings

The mapping of bytes into writing used for a piece of text is its encoding. Today, most text is encoded as either UTF‐8 or UTF‐16, both of which map bytes to the set of characters defined by Unicode. If a program tries to read text but is incorrect about the encoding, the result is often an illegible string of characters known as mojibake. For example, if a computer accidently tries to read the UTF‐8 string ‹ Hello world! › as UTF‐16, the result is ‹ 䡥汬漠睯牬搡 ›.

§ Plain & Rich Text

Text comes in two varieties: plain and rich. In plain text, every character in the text is expected to hold its literal meaning, and no information about the semantics, formatting, or presentation of the text is provided. In contrast, rich text uses certain sequences of characters to imbue the text with additional meaning or properties, for example annotating that a given span is emphasized, or that it should appear in the colour blue. These sequences are known as markup, and defined collections of markup symbols together form a markup language.

This article was written by kibigo!.