Music theory basics published: 22nd November 2024 updated: 11th December 2024
Not all vibrations that produce sound are intelligible or pleasing to the ear. We hear sounds around us all the time yet most of it is not really pleasing to the ear as compared to hearing musical sound. Why is that? the sound of music (organised sound) or tone has regular vibrations while noise (not organised) has irregular pattern of vibrations. नाद (Nāda) is the musical sound in the Indian classical music but also has spiritual and philosophical connotations. The reason why music sounds pleasing but not noise is due to Math. Human brains are pattern finding machines. Our brains evolved to reward us for the simple frequency ratios likely due to survival mechanism to detect regular non-threatening patterns in the environment. But sometimes we apply the same preference for mathematical symmetry and consonance into the realm of music for purely aesthetic purposes. The pleasure we derive from harmonic sounds is an example of taking an adaptive trait (pattern recognition) and redirecting it towards non-essential, cultural pursuits like music composition and appreciation. There is a popular theory is that music is a spandrel of language.
Musical notes and consonance:
Music judges two individual notes only by their mutual relations.
Consonance is the perceived pleasantness to the certain musical
relations. It refers to the phenomenon of identifying one note or
pitch being related to or in harmony to each other. Consonant
intervals have a high degree of frequency matching which creates a
smooth and a stable sound. Two tones with lower frequency matching
results to dissonance which is not pleasant to the ears. The
mismatch in the frequencies is what gives them their unstable
quality. The ratio of the matching frequencies must be as close to a
whole number. Our brains are less attuned to perceiving the
dissonant frequencies as pleasant.
From mathematical perspective simple frequency ratios like 2:1 for
an octave create more overlapping peaks in the sound waves resulting
in less [interference] and more stable vibration patterns. In a 2:1
octave ratio every 2 vibrations of the higher note and 1 vibration
of the lower note align regularly. This creates less [beating] or
interference between the frequencies. More complex ratios like 16:15
align less frequently and create more complex patterns that our
auditory system finds as rough or instable - what we perceive as
dissonance. Consonance is a matter of degree than absolutes as some
sounds have a greater degree of matching than others. Unison or the
perfect identities in music occur if both the notes (sounds) have
the same frequency or the frequency of the second note is a multiple
of the reference note. The second note is an octave to the reference
note. However, there are notes other than octaves that have some
degree of identity with the reference note. Ancient civilisations of
Egypt, Greece, China, India, etc. knew about this relationship and
used them in their musical systems.
Consonance occurs when waves of sound created by two notes combine with each other or oppose each other. If they oppose each other something called beats which cause disturbance of uneven nature. The number of beats is the difference between thr frequencies of the two notes. Low beats are not disturbing to the ear, as the difference increases do does the disturbance until it reaches a maximum point after which it decreases with the beat now as a secondary note which is in consonance with the original note. Sounds produced by the voice or other sources like stretched strings they contain besides their fundamental frequency or ptime note the overtones or harmonic upper partials. The overtones have a simple relationship with the fundamental frequency or note. Hence they do bot generate any beats and gives a note that blends with the fundamental note enriching its quality or the tonal colour timbre.
Consonance is of varying degrees and can range from perfect
consonance to imperfect consonance. If two notes are not consonance
in simple ratios to each other like an octave but are related
through a third note which is then the common root the consonance is
perfect but not absolute and is of the second order.
To sum it up, the more direct the relationship between two notes the
greater the consonance and the relationship can range from absolute
consonance to perfect dissonance with degrees such as perfect
consonance, imperfect consonance and imperfect dissonance lying in
between.
Having only two notes from an octave which has a perfect consonance
creates a large gap between the two notes but also it's no possible
to use just these two notes to create music. Hence we need to
introduce more notes in the system and make a scale to facilitate
musical compositions and this is where the varying degree of
consonance comes in picture. How to use the varying degree of
consonance to make most of the notes consonant with each other is a
fascinating mathematical problem. Greek philosopher and
mathematician [Pythagoras] too tried to solve this problem. This
scale can be constructed into many unique ways and hence each
culture has its own unique sound because each had their own musical
scale built between the two octaves. However, broadly speaking there
are two types of musical systems based on whether melody or harmony
pre-dominates the music.
In melodic musical systems the appeal and the emotion is conveyed
through a succession of notes in relation to each other based on a
fundamental frequency. In harmonic musical systems it is the
relationship between different notes simultaneously played which
is of importance. The sequential relationship matters here. Sound
is temporal. Sound occurs over time, its properties are perceived
and experienced as a series of changes that unfold in time. Sound
has a beginning and an end. The duration of the sound is one of
its key temporal aspects. Each note has a time value, the period
over which it is sounded and this time value need not be absolute
but rather relative to the entire piece as the whole piece can be
speeded up or down.
Another thing that arises out of the regular strong and weak accents
or pulsations in a piece of music which is referred to as
meter. Each pulsation either strong or
weak is called a beat. The duration of a note refers to the length
of the sound in terms of time as measured by the number of beats.
The beats are in a group of 2 or 3 and are arranged into a uniform
measure. Time is an extremely important aspect of music and appears
as the regular reoccurrence of strong and weak pulsations known as
meter and as a result of arrangements of various note length
referred to as the rhythm. This is the basis of time and rhythmical
structuring in music in general.
Music has some (of many more) basic properties:
- Pitch: It is a purely psychological construct. It answers us the question "What note is that?" "Is it high or low?" The terms note and tone refer to the same thing in the abstract sense but note is written and tone is heard.
- Rhythm: It refers to the duration of a series of notes and the way the series groups in a particular way.
- Tempo: This refers to the overall speed or pace of the music. If you were to tap your foot to it how fast or slow these regular the taps would be.
- Timbre: It is the tonal colour of the sound. This is what differentiates a trumpet from a flute when both are playing the same note. Timbre arises from overtones present in the sound. Read more about it here
- Loudness: It is psychological construct relating to how much energy an instrument produces or the air it displace or how intense the sound is.
The production of natural sounds:
When a string is plucked it will vibrate over the whole length of
the string. At the same time the string vibrates over fractional
divisions of its length (1/2, 1/3, 1/4, 1/5, 1/6, etc.) producing a
series of harmonics (overtones) whose frequencies are inversely
proportional (2x, 3x, 4x, 5x, 6x, etc., where x is the fundamental
frequency of the string) to those fractional divisions.
Theoretically an infinite number of these multiple modes of
vibration exist, each mode producing its own harmonic. As one
ascends the series, the amplitude, or loudness of each harmonic
tends to diminish, so the higher modes produce harmonics that are
usually too soft for us to hear.
In other words, when a string is plucked, it vibrates over its
entire length, which is 1, and produces a fundamental tone. This
tone is the loudest and the one you tend to hear. However, at the
same time, the same string also divides itself into two equal parts,
1/2, and they vibrate at the same frequency which is twice that of
the fundamental frequency. This frequency is not as loud as that of
the fundamental, but your brain still picks up on it. At the same
time, the string also divides itself into three equal parts, 1/3,
and they vibrate at the same frequency which is three times that of
the fundamental. This frequency is even softer than that of the 2nd
harmonic, but your brain will still pick up on it. The string keeps
dividing itself in this harmonic pattern over and over again
creating an unending amount of frequencies. Since they get softer
and softer they eventually become negligible to the ear.
A demo of standing waves on a string
First harmonic -
षड्ज (Shadaj).
Note: find the etymology of each harmonic
When a string is pluck it starts to vibrates in its full length and
makes a sound. This is called षड्ज (Shadaj) or fundamental tone or
1st
harmonic. Let's say it's 100Hz.
Second harmonic -
तार षड्ज (Tara Shadaj).
Soon after the energy spreads and reduces through the string it also
starts to vibrate in 2 parts and giving rise to the 2nd
harmonic or तार षड्ज (Tara Shadaj). The frequency is 200Hz twice the
fundamental frequency.
Third harmonic - पंचम (Pancham).
As the energy reduces further string starts to vibrate in 3 parts
and gives rise to the 3rd harmonic पंचम (Pancham). The
frequency is 300Hz thrice the frequency of the fundamental.
Fourth harmonic -
अति-तार षड्ज (Ati-Tara Shadaj).
As the energy reduces further string starts to vibrate in 4 parts
and gives rise to the 4th harmonic अति-तार षड्ज (Ati-Tara
Shadaj). The frequency is 400Hz quadruple the frequency of the
fundamental.
Fifth harmonic -
गांधार (Gāndhār).
As the energy reduces further string starts to vibrate in 5 parts
and gives rise to the 5th harmonic गांधार (Gāndhār). The
frequency is 450Hz quintuple the frequency of the fundamental.
After a certain threshold the energy put in the string reduces so much that further harmonics are barely heard. The amplitude of each harmonic is inversely proportional to the square of its harmonic number, if the fundamental is at relative amplitude A, then 3f is at A/9, 5f is at A/25, 7f is at A/49. This can also be expressed by saying that the harmonics decrease by 12 dB per octave. [more on this in modular synth]
The non-linearities of the human ears
Linear Scale 0 1 2 3 4 5 6 7 8 9 10 (+1) Logarithmic Scale 1 2 4 8 16 32 64 128 256 512 1024 (x2)
In a logarithmic scale numbers are placed by a multiplicative
factor while in a linear scale they are placed by an additive
factor. In the series above we have the same space but more
numbers being represented through the scale. Even though the scale
represent large value it still maintains accuracy at small levels.
In logarithmic scale each value represents a multiplication of the
base number rather than a linear addition.
log10(1) = 0
log10(2) ≈ 0.3010
log10(3) ≈ 0.4771
log10(10) = 1
log10(100) = 2
log10(1000) = 3
The human sound perception works the same way. It can respond to
higher inputs ranging from
20Hz to 20,000 Hz while also
responding to subtle differences around small inputs.
The human ear is logarithmic.
Linear Series 200 300 400 500 600 700 800 900 1000 1100 (+100) Logarithmic Series 200 300 450 675 1012 1519 2278 3417 5126 7689 (x1.5)
The logarithmic sound is more evenly spaced than the linear sound.
There is another interesting feature of the human ears:
increased sensitivity of of certain frequencies.
Human ears are more sensitive to frequencies around
2000Hz-5000Hz. The ear canal behaves
like a resonator of and its resonance is around 3000Hz. The
tympanic membrane and the ossicles also respond efficiently to the
frequencies around this range. The 2000Hz-5000Hz range is also
crucial for speech. Many important consonant sounds in human
speech such as s, sh, t, and f, lie in this frequency band. The
human ear is naturally tuned to be most sensitive to these
frequencies because they are vital for understanding speech.
A breakdown of consonant sounds and their typical frequency
ranges:
- Fricatives (f, s, sh, v, z): 3-7 kHz
- Plosives (p, t, k, b, d, g): High-frequency bursts around 4-8 kHz
- Affricates (ch, j): Similar to plosives, around 4-7 kHz
- Nasals (m, n, ng): Lower frequencies, mostly below 3 kHz
This means that sound around these frequencies are perceived
louder than even if their sound pressure level [more on this
modular synth] is the same. The difference in frequency
sensitivity actually depends on sound levels. We hear mid-range
frequencies (around 3-4 kHz) most clearly.
The ear perceives higher harmonics in relation to the fundamental
frequency. Harmonic tones contain multiple frequency components
but the brain perceives it has a coherent whole where the main
pitch is of the fundamental frequency. The auditory system tends
to group or bind components that are harmonically related. The
harmonics evoke a pitch that is related to the fundamental
frequency of the complex tone. Interestingly, this
virtual pitch is perceived even when
the energy at the fundamental frequency of the complex tone is
absent or masked or hidden. The brain is so attuned to the
harmonic series that if we hear a sound that has all the
components except the fundamental frequency the brain fills it for
us. This phenomenon is called
restoration of the missing fundamental. A sound energy consisting of frequencies 100Hz, 200Hz, 300Hz,
400Hz, etc. is perceived as having a pitch of 100Hz. We don't
perceive it as having a pitch of 200Hz because our brains know
that a normal harmonic sound with a pitch of 200Hz would have
frequency components 400Hz, 600Hz, 800Hz, etc. in its harmonic
series. The brain is extremely attuned to the harmonic structure
of sound. It understands that the harmonic series is built with
fundamental frequencies and overtones frequencies as a multiple of
the fundamental frequency. The brain uses the harmonic series to
deduct the pitch of the sound along with the fundamental
frequency. When it hears 200Hz, 300Hz, 400Hz, 500Hz it knows that
100Hz fits in the relationship of the harmonics of the sound since
the ears are logarithmic and vibrating sound (eg. vibrating
string) vibrates along its full length with multiple vibrations at
the same time. So this tells us that pitch perception is not
always based on the fundamental frequency, the brain uses the
overtones too. Small speakers may not be able to produce the lower
frequencies like 100Hz but our brains can still "hear" the pitch.
Petr Janata a graduate student in biology in 1996 for his
dissertation conducted a really interesting experiment on barn
owls. He placed electrodes in the inferior colliculus [more on
this later] of the barn owl which is a part of auditory system.
Then he played the owls a version of Staruss'
"The Blue Danube Waltz"
made up of tones from which the fundamental frequency had been
removed. He hypothesised that if the missing fundamental is
restored at early levels of auditory processing, neurons in the
owl's inferior colliculus should fire at the rate of the missing
fundamental frequency and this is exactly what he found. The
electrodes put out a small electrical signal with each firing and
since the firing rate is same as frequency of firing, when Petr
sent the output of these electrodes to a small amplifier, played
back the sound of owl's neurons through a loudspeaker he could
literally hear the melody of The Blue Danube Waltz clearly from
the speakers.
The firing rate of neurons were identical to the frequency of
the missing fundamental frequency.
Auditory spectrum of the response to a musical note composed of all the harmonics of 62.5Hz at three points in time 200ms, 300ms, 400ms after the onset of the notes along the length of the cochlea. As the frequency increases the resolution of the spectral analysis decreases and becomes smooth after the eighth harmonic. The eight harmonics are said to be resolved i.e. they produce individual peaks in the spectrum.
question: if the human ears are logarithmic and for eg. when the ears hear a sound with frequency 500Hz how does the brain map the frequency to an interval with the fundamental frequency of let's say 100Hz? [need more research on how the brain works with harmonics in the sound]
The lower frequencies are individually identified by the brain and
their spacing is calculated to extract the pitch of sound [need
research on this]. The human brain is a pattern finding machine
which extracts the most common relationship (interval) between each
of the frequency component in the harmonic series. Even if the
lowest frequency is missing the brain can calculate that frequency
because of the consistent relationship between the other set of
frequencies in the series. The sounds created which are in multiple
proportions to the fundamental frequency or Shadaj (100, 200, 300,
400, 500) are called
harmonics because they are in harmony
with Shadaj. They sound pleasant to our ears. Pancham (3rd harmonic)
and Gandhar (5th harmonic) are most pleasant. Tanpura a drone
instrument creates an atmosphere of dominant Shadaj-Gāndhār-Pancham.
When the source of the sound vibrates again and again different
harmonics/partials mix and this mixture has a particular quality.
This is the timbre or नादगुण (Nādaguṇa).
The harmonics of string instruments which have long strings like the
Violin and Sarangi have a harmonic profile similar to that of human
voice and hence are considered as best for accompaniment to human
singing.
The pentatonic scale: [needs its own article]
The interval (relationship) 3:2 known as the fifth is a consonant
interval. Known as षड्ज पंचम भाव (Shadaj Pancham Bhāva) in Indian
music theory and is the basis of the music scale. This primitive
scale was based on octave and fifth intervals having the ratio 1:2
and 3:2 respectively, both of which were known to ancient people.
Let's take the base frequency as f and its octave as F. If we add
another note as fifth of the base frequency we end up with
f C F. These notes are still not enough
for a scale so let's add another perfect fifth note of C that is G
and C octave C'. The resultant scale now is
f C F G C'. Now D is the fifth of g
which is an octave below G, one of the consonant notes in the scale
derived above, including this we have
f C D F G C'. Once again A' is the fifth
of D we arrive at the scale C D F G A', the notes arranged in the
ascending order.
This is called the pentatonic scale of ancient music you might see in Chinese and African musical systems. If we keep adding notes to an octave with fifth as the basis for the new notes we'll reach to a scale starting which would look like this if started with C: C D E F G A B also known as scale of C in Western musical systems. This is the scale [Pythagoras] explained both musically and mathematically. Although named after him in the West the scale was being used by ancient civilisations in Asia. The pentatonic scale, as a five-note scale, is foundational to many ancient musical traditions across different cultures, and it can be traced back to even earlier than the time of Pythagoras. Early forms of the pentatonic scale appear in the music of ancient Mesopotamia. Archaeological evidence, such as Babylonian cuneiform tablets, suggests that the early Mesopotamians used a five-note scale in their music. The Hurrian songs (or Hurrian Hymns) are a collection of music inscribed in cuneiform on clay tablets excavated from the ancient Amorite-Canaanite city of Ugarit, a headland in northern Syria, which date to approximately 1400 BCE. One of these tablets which is nearly complete contains the Hurrian Hymn to Nikkal (also known as the Hurrian cult hymn or a zaluzi-prayer to the gods), making it the oldest surviving substantially complete work of notated music in the world. In ancient Chinese music, the pentatonic scale was used in the "Gongche" notation system. The traditional Chinese pentatonic scale is built on five notes, and it has been a central element of Chinese music theory for thousands of years. Various African music traditions use pentatonic scales, particularly in sub-Saharan music. The use of the pentatonic scale in Indian music is not only a technical element of राग संगीत (Rāga Saṅgīta) but also has deep cultural and emotional significance. The five-note structure often evokes specific emotional states known as रस (Rasa - "essence", "flavor", or "emotion") and moods, which are central to the Indian music aesthetic.
The seven notes in Pythagorean scale were used in Greek music and formed the basis of 6 modes which were adopted by medieval European music. Out of the 6, 2 were adopted by Western music in the 17th century and now are called as Major and Minor scales. The scale that Indian classical music adopted is the diatonic scale. The prefix "dia-" in "diatonic" comes from the Greek word "διά" (dia), which means "through" or "across". In the case of "diatonic", it refers to a scale or system that spans across or through a full octave using a series of specific intervals, typically of whole and half steps. In Indian classical music, the concept of a diatonic scale is not as explicitly defined as it is in Western music.