Special thanks to DJB for comments
All files (normal or special) are represented in the system's filesystem implementation as an 'inode' (information node)
Perhaps slightly curiously, directories are not represented as files, although they too *are* represented as inodes within the filesystem implementation.
Directory inodes implement the abstraction needed to create the filesystem heirarchy. Essentially they just keep the information about the collection of files (and other directories) represented by a given directory, at some level in the directory hierarchy.
Contents of all files are a stream of bytes
This simple basic abstraction allows every command to operate on any file (more or less.)
Semantics are imposed on the data within a file by the particular command(s) that you apply to read it. Such commands assume the type of data the file contains.
Many of the UNIX core utilities operate on files that contain readable text (readable ASCII) e.g. grep, wc, sort
Some are more generic and can deal with any bytestream e.g. od, sed, cat
If a command anticipates a certain type of data to be contained within the file, and those are not the data the file actually contains, the command may complain, or exit with an error. Or in the limit, it might just blow chunks (though this is not desirable behavior for a well implemented command or utility.)
For example the tar(1) command will complain if the file you point it to is not a tar file. Similarly with gzip(1) and various others. See man file
The type of data contained in a file can be indicated in several ways. By convention the suffix of the filename is quite often used to indicate this, but this is not mandatory. E.g. (txt, jpg, odt)
For files containing binary data, there is also the convention of the magic number: a pattern of bytes at the head of the file which indicate the type of data it contains. See man magic. Some file formats don't do this. JPEG images are an example. You can most often only recognize these by knowing about these various formats. This makes the implementation of file(1) command rather tricky!
(read stdin, write to stdout, return 0 for success or error value). See man errno
This not only allows you to run most any command or utility on any file, it also forms the basis for the most basic form of programming using the shell. You can string together a number of commands in a 'pipe'
grep -v "^#" myfile.txt | sort | tee commentfree.txt | wc -l
This says, remove all lines that begin with a '#' character, sort the remaining lines into order, put the resulting output in a file named 'commentfree.txt' (as well as to stdout), and report the resulting number of lines
In UNIX, everything is a file 'normal' files, just contain data
Directories are also files, which essentially just keep the information about what is contained in a collection of files at some level in the directory hierarchy: od -c /somedir
Then, there are a number of other 'special' files:
[hardware] devices or more specifically the device 'drivers' that provide access to them: (/dev/net /dev/hd0 ...) and various other system aspects.
/proc files that provide per-process information and controls
/sys/* ways of viewing various system information: (/sys/mem, ...)
A number of UNIX core commands and utilities are provided to manipulate directories. Here are a few:
cp (1) - copy files and directories dir (1) - list directory contents find (1) - search for files in a directory hierarchy ls (1) - list directory contents mkdir (1) - make directories mv (1) - move (rename) files mountpoint (1) - see if a directory or file is a mountpoint pwd (1) - print name of current/working directory rm (1) - remove files or directories rmdir (1) - remove empty directories
This was clearly an explicit design choice of Denis's and Ken's, as terrible accidents would otherwise easily occur. UNiX hence provides a set of specific system calls and programmatic interfaces to manipulate directories:
chdir (2) - change working directory chmod (2) - change permissions of a file (or directory) chroot (2) - change root directory execveat (2) - execute program relative to a directory file descriptor fchdir (2) - change working directory futimesat (2) - change timestamps of a file relative to a directory fi... getcwd (2) - get current working directory getdents (2) - get directory entries getdents64 (2) - get directory entries lookup_dcookie (2) - return a directory entry's path mkdir (2) - create a directory mkdirat (2) - create a directory readdir (2) - read directory entry rename (2) - change the name or location of a file rmdir (2) - delete a directory alphasort (3) - scan a directory for matching entries bindtextdomain (3) - set directory containing message catalogs closedir (3) - close a directory dirfd (3) - get directory stream file descriptor fdopendir (3) - open a directory get_current_dir_name (3) - get current working directory getcwd (3) - get current working directory getdirentries (3) - get directory entries in a filesystem-independent format getwd (3) - get current working directory mkdtemp (3) - create a unique temporary directory opendir (3) - open a directory readdir (3) - read a directory readdir_r (3) - read a directory remove (3) - remove a file or directory rewinddir (3) - reset directory stream scandir (3) - scan a directory for matching entries scandirat (3) - scan a directory for matching entries seekdir (3) - set the position of the next readdir() call in the dir... telldir (3) - return current location in directory stream versionsort (3) - scan a directory for matching entries
A big part of what you'll want to do when you write commands, or scripts (command line programs which use commands) is to search for certain *patterns* within the data in a file and then go do something with that or based on that.
Regular expressions are essentially the way you *describe* a pattern that indicates what you are searching for (i.e. what is to be matched)