Music theory basics published: 22nd November 2024 updated: 11th December 2024

Not all vibrations that produce sound are intelligible or pleasing to the ear. We hear sounds around us all the time yet most of it is not really pleasing to the ear as compared to hearing musical sound. Why is that? the sound of music (organised sound) or tone has regular vibrations while noise (not organised) has irregular pattern of vibrations. नाद (Nāda) is the musical sound in the Indian classical music but also has spiritual and philosophical connotations. The reason why music sounds pleasing but not noise is due to Math. Human brains are pattern finding machines. Our brains evolved to reward us for the simple frequency ratios likely due to survival mechanism to detect regular non-threatening patterns in the environment. But sometimes we apply the same preference for mathematical symmetry and consonance into the realm of music for purely aesthetic purposes. The pleasure we derive from harmonic sounds is an example of taking an adaptive trait (pattern recognition) and redirecting it towards non-essential, cultural pursuits like music composition and appreciation. There is a popular theory is that music is a spandrel of language.

pattern brain meme

Musical notes and consonance:

Music judges two individual notes only by their mutual relations. Consonance is the perceived pleasantness to the certain musical relations. It refers to the phenomenon of identifying one note or pitch being related to or in harmony to each other. Consonant intervals have a high degree of frequency matching which creates a smooth and a stable sound. Two tones with lower frequency matching results to dissonance which is not pleasant to the ears. The mismatch in the frequencies is what gives them their unstable quality. The ratio of the matching frequencies must be as close to a whole number. Our brains are less attuned to perceiving the dissonant frequencies as pleasant.
From mathematical perspective simple frequency ratios like 2:1 for an octave create more overlapping peaks in the sound waves resulting in less [interference] and more stable vibration patterns. In a 2:1 octave ratio every 2 vibrations of the higher note and 1 vibration of the lower note align regularly. This creates less [beating] or interference between the frequencies. More complex ratios like 16:15 align less frequently and create more complex patterns that our auditory system finds as rough or instable - what we perceive as dissonance. Consonance is a matter of degree than absolutes as some sounds have a greater degree of matching than others. Unison or the perfect identities in music occur if both the notes (sounds) have the same frequency or the frequency of the second note is a multiple of the reference note. The second note is an octave to the reference note. However, there are notes other than octaves that have some degree of identity with the reference note. Ancient civilisations of Egypt, Greece, China, India, etc. knew about this relationship and used them in their musical systems.

Consonance occurs when waves of sound created by two notes combine with each other or oppose each other. If they oppose each other something called beats which cause disturbance of uneven nature. The number of beats is the difference between thr frequencies of the two notes. Low beats are not disturbing to the ear, as the difference increases do does the disturbance until it reaches a maximum point after which it decreases with the beat now as a secondary note which is in consonance with the original note. Sounds produced by the voice or other sources like stretched strings they contain besides their fundamental frequency or ptime note the overtones or harmonic upper partials. The overtones have a simple relationship with the fundamental frequency or note. Hence they do bot generate any beats and gives a note that blends with the fundamental note enriching its quality or the tonal colour timbre.

Consonance is of varying degrees and can range from perfect consonance to imperfect consonance. If two notes are not consonance in simple ratios to each other like an octave but are related through a third note which is then the common root the consonance is perfect but not absolute and is of the second order.
To sum it up, the more direct the relationship between two notes the greater the consonance and the relationship can range from absolute consonance to perfect dissonance with degrees such as perfect consonance, imperfect consonance and imperfect dissonance lying in between.

Having only two notes from an octave which has a perfect consonance creates a large gap between the two notes but also it's no possible to use just these two notes to create music. Hence we need to introduce more notes in the system and make a scale to facilitate musical compositions and this is where the varying degree of consonance comes in picture. How to use the varying degree of consonance to make most of the notes consonant with each other is a fascinating mathematical problem. Greek philosopher and mathematician [Pythagoras] too tried to solve this problem. This scale can be constructed into many unique ways and hence each culture has its own unique sound because each had their own musical scale built between the two octaves. However, broadly speaking there are two types of musical systems based on whether melody or harmony pre-dominates the music. In melodic musical systems the appeal and the emotion is conveyed through a succession of notes in relation to each other based on a fundamental frequency. In harmonic musical systems it is the relationship between different notes simultaneously played which is of importance. The sequential relationship matters here. Sound is temporal. Sound occurs over time, its properties are perceived and experienced as a series of changes that unfold in time. Sound has a beginning and an end. The duration of the sound is one of its key temporal aspects. Each note has a time value, the period over which it is sounded and this time value need not be absolute but rather relative to the entire piece as the whole piece can be speeded up or down.
Another thing that arises out of the regular strong and weak accents or pulsations in a piece of music which is referred to as meter. Each pulsation either strong or weak is called a beat. The duration of a note refers to the length of the sound in terms of time as measured by the number of beats. The beats are in a group of 2 or 3 and are arranged into a uniform measure. Time is an extremely important aspect of music and appears as the regular reoccurrence of strong and weak pulsations known as meter and as a result of arrangements of various note length referred to as the rhythm. This is the basis of time and rhythmical structuring in music in general.

Melodic vs Harmonic Musical Systems Melodic System (Sequential Notes) 1st harmonic (Sa) t₁ t₂ t₃ t₄ t₅ • Notes played in sequence • Emotional impact through relative pitch movement • Common in Indian Classical, Arabic Music Harmonic System (Simultaneous Notes) C E G F A C G B D I IV V • Multiple notes played together (chords) • Emotional impact through vertical relationships • Common in Western Classical, Jazz, Pop

Music has some (of many more) basic properties:

  • Pitch: It is a purely psychological construct. It answers us the question "What note is that?" "Is it high or low?" The terms note and tone refer to the same thing in the abstract sense but note is written and tone is heard.
  • Rhythm: It refers to the duration of a series of notes and the way the series groups in a particular way.
  • Tempo: This refers to the overall speed or pace of the music. If you were to tap your foot to it how fast or slow these regular the taps would be.
  • Timbre: It is the tonal colour of the sound. This is what differentiates a trumpet from a flute when both are playing the same note. Timbre arises from overtones present in the sound. Read more about it here
  • Loudness: It is psychological construct relating to how much energy an instrument produces or the air it displace or how intense the sound is.

The production of natural sounds:

When a string is plucked it will vibrate over the whole length of the string. At the same time the string vibrates over fractional divisions of its length (1/2, 1/3, 1/4, 1/5, 1/6, etc.) producing a series of harmonics (overtones) whose frequencies are inversely proportional (2x, 3x, 4x, 5x, 6x, etc., where x is the fundamental frequency of the string) to those fractional divisions. Theoretically an infinite number of these multiple modes of vibration exist, each mode producing its own harmonic. As one ascends the series, the amplitude, or loudness of each harmonic tends to diminish, so the higher modes produce harmonics that are usually too soft for us to hear.
In other words, when a string is plucked, it vibrates over its entire length, which is 1, and produces a fundamental tone. This tone is the loudest and the one you tend to hear. However, at the same time, the same string also divides itself into two equal parts, 1/2, and they vibrate at the same frequency which is twice that of the fundamental frequency. This frequency is not as loud as that of the fundamental, but your brain still picks up on it. At the same time, the string also divides itself into three equal parts, 1/3, and they vibrate at the same frequency which is three times that of the fundamental. This frequency is even softer than that of the 2nd harmonic, but your brain will still pick up on it. The string keeps dividing itself in this harmonic pattern over and over again creating an unending amount of frequencies. Since they get softer and softer they eventually become negligible to the ear.
A demo of standing waves on a string

First harmonic - षड्ज (Shadaj). Note: find the etymology of each harmonic
When a string is pluck it starts to vibrates in its full length and makes a sound. This is called षड्ज (Shadaj) or fundamental tone or 1st harmonic. Let's say it's 100Hz.

Second harmonic - तार षड्ज (Tara Shadaj).
Soon after the energy spreads and reduces through the string it also starts to vibrate in 2 parts and giving rise to the 2nd harmonic or तार षड्ज (Tara Shadaj). The frequency is 200Hz twice the fundamental frequency.

Third harmonic - पंचम (Pancham).
As the energy reduces further string starts to vibrate in 3 parts and gives rise to the 3rd harmonic पंचम (Pancham). The frequency is 300Hz thrice the frequency of the fundamental.

Fourth harmonic - अति-तार षड्ज (Ati-Tara Shadaj).
As the energy reduces further string starts to vibrate in 4 parts and gives rise to the 4th harmonic अति-तार षड्ज (Ati-Tara Shadaj). The frequency is 400Hz quadruple the frequency of the fundamental.

Fifth harmonic - गांधार (Gāndhār).
As the energy reduces further string starts to vibrate in 5 parts and gives rise to the 5th harmonic गांधार (Gāndhār). The frequency is 450Hz quintuple the frequency of the fundamental.

After a certain threshold the energy put in the string reduces so much that further harmonics are barely heard. The amplitude of each harmonic is inversely proportional to the square of its harmonic number, if the fundamental is at relative amplitude A, then 3f is at A/9, 5f is at A/25, 7f is at A/49. This can also be expressed by saying that the harmonics decrease by 12 dB per octave. [more on this in modular synth]

The non-linearities of the human ears

Linear Scale        0   1   2   3   4   5   6   7   8   9   10   (+1)
Logarithmic Scale   1   2   4   8  16  32  64  128 256 512 1024  (x2)
						

In a logarithmic scale numbers are placed by a multiplicative factor while in a linear scale they are placed by an additive factor. In the series above we have the same space but more numbers being represented through the scale. Even though the scale represent large value it still maintains accuracy at small levels. In logarithmic scale each value represents a multiplication of the base number rather than a linear addition.
log10(1) = 0
log10(2) ≈ 0.3010
log10(3) ≈ 0.4771
log10(10) = 1
log10(100) = 2
log10(1000) = 3
The human sound perception works the same way. It can respond to higher inputs ranging from 20Hz to 20,000 Hz while also responding to subtle differences around small inputs.

The human ear is logarithmic.

Linear Series       200 300 400 500 600  700  800  900  1000 1100    (+100)
Logarithmic Series  200 300 450 675 1012 1519 2278 3417 5126 7689    (x1.5)
						

The logarithmic sound is more evenly spaced than the linear sound. There is another interesting feature of the human ears: increased sensitivity of of certain frequencies. Human ears are more sensitive to frequencies around 2000Hz-5000Hz. The ear canal behaves like a resonator of and its resonance is around 3000Hz. The tympanic membrane and the ossicles also respond efficiently to the frequencies around this range. The 2000Hz-5000Hz range is also crucial for speech. Many important consonant sounds in human speech such as s, sh, t, and f, lie in this frequency band. The human ear is naturally tuned to be most sensitive to these frequencies because they are vital for understanding speech.
A breakdown of consonant sounds and their typical frequency ranges:

  • Fricatives (f, s, sh, v, z): 3-7 kHz
  • Plosives (p, t, k, b, d, g): High-frequency bursts around 4-8 kHz
  • Affricates (ch, j): Similar to plosives, around 4-7 kHz
  • Nasals (m, n, ng): Lower frequencies, mostly below 3 kHz

This means that sound around these frequencies are perceived louder than even if their sound pressure level [more on this modular synth] is the same. The difference in frequency sensitivity actually depends on sound levels. We hear mid-range frequencies (around 3-4 kHz) most clearly.
The ear perceives higher harmonics in relation to the fundamental frequency. Harmonic tones contain multiple frequency components but the brain perceives it has a coherent whole where the main pitch is of the fundamental frequency. The auditory system tends to group or bind components that are harmonically related. The harmonics evoke a pitch that is related to the fundamental frequency of the complex tone. Interestingly, this virtual pitch is perceived even when the energy at the fundamental frequency of the complex tone is absent or masked or hidden. The brain is so attuned to the harmonic series that if we hear a sound that has all the components except the fundamental frequency the brain fills it for us. This phenomenon is called restoration of the missing fundamental. A sound energy consisting of frequencies 100Hz, 200Hz, 300Hz, 400Hz, etc. is perceived as having a pitch of 100Hz. We don't perceive it as having a pitch of 200Hz because our brains know that a normal harmonic sound with a pitch of 200Hz would have frequency components 400Hz, 600Hz, 800Hz, etc. in its harmonic series. The brain is extremely attuned to the harmonic structure of sound. It understands that the harmonic series is built with fundamental frequencies and overtones frequencies as a multiple of the fundamental frequency. The brain uses the harmonic series to deduct the pitch of the sound along with the fundamental frequency. When it hears 200Hz, 300Hz, 400Hz, 500Hz it knows that 100Hz fits in the relationship of the harmonics of the sound since the ears are logarithmic and vibrating sound (eg. vibrating string) vibrates along its full length with multiple vibrations at the same time. So this tells us that pitch perception is not always based on the fundamental frequency, the brain uses the overtones too. Small speakers may not be able to produce the lower frequencies like 100Hz but our brains can still "hear" the pitch. Petr Janata a graduate student in biology in 1996 for his dissertation conducted a really interesting experiment on barn owls. He placed electrodes in the inferior colliculus [more on this later] of the barn owl which is a part of auditory system. Then he played the owls a version of Staruss' "The Blue Danube Waltz" made up of tones from which the fundamental frequency had been removed. He hypothesised that if the missing fundamental is restored at early levels of auditory processing, neurons in the owl's inferior colliculus should fire at the rate of the missing fundamental frequency and this is exactly what he found. The electrodes put out a small electrical signal with each firing and since the firing rate is same as frequency of firing, when Petr sent the output of these electrodes to a small amplifier, played back the sound of owl's neurons through a loudspeaker he could literally hear the melody of The Blue Danube Waltz clearly from the speakers. The firing rate of neurons were identical to the frequency of the missing fundamental frequency.

temporal cochlea spectral analysis graph

Auditory spectrum of the response to a musical note composed of all the harmonics of 62.5Hz at three points in time 200ms, 300ms, 400ms after the onset of the notes along the length of the cochlea. As the frequency increases the resolution of the spectral analysis decreases and becomes smooth after the eighth harmonic. The eight harmonics are said to be resolved i.e. they produce individual peaks in the spectrum.

question: if the human ears are logarithmic and for eg. when the ears hear a sound with frequency 500Hz how does the brain map the frequency to an interval with the fundamental frequency of let's say 100Hz? [need more research on how the brain works with harmonics in the sound]

The lower frequencies are individually identified by the brain and their spacing is calculated to extract the pitch of sound [need research on this]. The human brain is a pattern finding machine which extracts the most common relationship (interval) between each of the frequency component in the harmonic series. Even if the lowest frequency is missing the brain can calculate that frequency because of the consistent relationship between the other set of frequencies in the series. The sounds created which are in multiple proportions to the fundamental frequency or Shadaj (100, 200, 300, 400, 500) are called harmonics because they are in harmony with Shadaj. They sound pleasant to our ears. Pancham (3rd harmonic) and Gandhar (5th harmonic) are most pleasant. Tanpura a drone instrument creates an atmosphere of dominant Shadaj-Gāndhār-Pancham.
When the source of the sound vibrates again and again different harmonics/partials mix and this mixture has a particular quality. This is the timbre or नादगुण (Nādaguṇa). The harmonics of string instruments which have long strings like the Violin and Sarangi have a harmonic profile similar to that of human voice and hence are considered as best for accompaniment to human singing.

The pentatonic scale: [needs its own article]

The interval (relationship) 3:2 known as the fifth is a consonant interval. Known as षड्ज पंचम भाव (Shadaj Pancham Bhāva) in Indian music theory and is the basis of the music scale. This primitive scale was based on octave and fifth intervals having the ratio 1:2 and 3:2 respectively, both of which were known to ancient people.
Let's take the base frequency as f and its octave as F. If we add another note as fifth of the base frequency we end up with f C F. These notes are still not enough for a scale so let's add another perfect fifth note of C that is G and C octave C'. The resultant scale now is f C F G C'. Now D is the fifth of g which is an octave below G, one of the consonant notes in the scale derived above, including this we have f C D F G C'. Once again A' is the fifth of D we arrive at the scale C D F G A', the notes arranged in the ascending order.

This is called the pentatonic scale of ancient music you might see in Chinese and African musical systems. If we keep adding notes to an octave with fifth as the basis for the new notes we'll reach to a scale starting which would look like this if started with C: C D E F G A B also known as scale of C in Western musical systems. This is the scale [Pythagoras] explained both musically and mathematically. Although named after him in the West the scale was being used by ancient civilisations in Asia. The pentatonic scale, as a five-note scale, is foundational to many ancient musical traditions across different cultures, and it can be traced back to even earlier than the time of Pythagoras. Early forms of the pentatonic scale appear in the music of ancient Mesopotamia. Archaeological evidence, such as Babylonian cuneiform tablets, suggests that the early Mesopotamians used a five-note scale in their music. The Hurrian songs (or Hurrian Hymns) are a collection of music inscribed in cuneiform on clay tablets excavated from the ancient Amorite-Canaanite city of Ugarit, a headland in northern Syria, which date to approximately 1400 BCE. One of these tablets which is nearly complete contains the Hurrian Hymn to Nikkal (also known as the Hurrian cult hymn or a zaluzi-prayer to the gods), making it the oldest surviving substantially complete work of notated music in the world. In ancient Chinese music, the pentatonic scale was used in the "Gongche" notation system. The traditional Chinese pentatonic scale is built on five notes, and it has been a central element of Chinese music theory for thousands of years. Various African music traditions use pentatonic scales, particularly in sub-Saharan music. The use of the pentatonic scale in Indian music is not only a technical element of राग संगीत (Rāga Saṅgīta) but also has deep cultural and emotional significance. The five-note structure often evokes specific emotional states known as रस (Rasa - "essence", "flavor", or "emotion") and moods, which are central to the Indian music aesthetic.

The seven notes in Pythagorean scale were used in Greek music and formed the basis of 6 modes which were adopted by medieval European music. Out of the 6, 2 were adopted by Western music in the 17th century and now are called as Major and Minor scales. The scale that Indian classical music adopted is the diatonic scale. The prefix "dia-" in "diatonic" comes from the Greek word "διά" (dia), which means "through" or "across". In the case of "diatonic", it refers to a scale or system that spans across or through a full octave using a series of specific intervals, typically of whole and half steps. In Indian classical music, the concept of a diatonic scale is not as explicitly defined as it is in Western music.