DNA, RNA, genes and chromosomes[2]

2.2 Every cell in the human body contains a nucleus, with the exception of red blood cells, which lose this structure as they mature. Within the nucleus are tightly coiled threadlike structures known as chromosomes (see Figure 1). Every chromosome has a long arm and a short arm, with a pinch point known as a ‘centromere’. Humans normally have 23 pairs of chromosomes, one member of each pair derived from the mother and one from the father (see Figure 2). One those pairs consists of the sex chromosomes—with two X chromosomes determining femaleness, and one X and one Y determining maleness. The other 22 chromosomes are known as ‘autosomes’.

2.3 Each chromosome has within it, arranged end-to-end, hundreds or thousands of genes (see Figure 3),[3] each with a specific location, consisting of the inherited genetic material known as DNA. Some chromosomes are significantly larger than others, and some are more densely packed with genes. Under the standard system of identification, scientists have numbered these autosomes from 1–22 in size order (that is, the number of base pairs),[4] with chromosome 1 being the largest (279 million base pairs, and an estimated 2968 genes).[5] Of the sex chromosomes, the X (163 million base pairs and an estimated 1184 genes) is similar in size to chromosome 7, while the Y is the smallest chromosome (51 million base pairs and an estimated 231 genes).[6]

2.4 DNA (deoxy-ribo-nucleic acid) is so called because it consists of a large acid molecule mainly found in the nucleus (nucleic) to which many sugar groups (ribo) that are missing an oxygen molecule (deoxy) are attached. DNA contains a code that directs the ‘expression’ or production of proteins, which form much of the structure of the cell and control the chemical reactions within them. The DNA of each gene is characterised by a unique sequence of bases which, when arranged in triplets (codons) in various orders, represent the ‘genetic code’.[7]

2.5 There are many different definitions for a ‘gene’, but one of the most commonly accepted is that a gene contains all of the information required to determine the expression of a specific protein or a chain of amino acids (a ‘polypeptide’). Sometimes a polypeptide can form a complete protein on its own (as in the case of insulin), but in most cases a number of polypeptides combine to create a single functional protein (as in the case of collagen and globin).

2.6 There are four basic building blocks (nucleotides) for DNA: Adenine (A) and Guanine (G), which are known as ‘purines’; and Thymine (T) and Cytosine (C), which are known as ‘pyrimidines’. These nucleotides link together to form long polynucleotide chains (see Figure 4). A DNA molecule consists of two of these chains, linked together by hydrogen bonds, running in opposite directions. Linkage of the chains follows a strict rule, known as ‘complementary base pairing’:

  • the base A can only pair with the base T, and vice versa; and

  • the base G can only pair with the base C, and vice versa.

2.7 There are over three billion of these base pairs of DNA making up the human genome. The two chains link together in a ladder-like shape, twisted into the now famous ‘double helix’ shape first described by Watson and Crick in 1953,[8] with sugars and phosphates forming the sides or backbone of the ladder and the base pairs forming the rungs (see Figure 5).

2.8 In humans, genes comprise only a small proportion of the DNA in a cell. Up to 98% of DNA consists of ‘non-coding’ regions—popularly, but incorrectly, referred to as ‘junk DNA’—which are full of repeat sequences, pseudogenes and retroviruses. There are no non-coding portions of DNA in bacteria—there are only genes, each one expressing a specific protein. In recent years, genetic scientists have increasingly come to believe that it is non-coding DNA that may be the basis for the complexity and sophistication of the human genome, which permits only 30,000 genes to produce about 200,000 proteins. A leader in this field, Professor John Mattick AO, Director of the Institute for Molecular Biology at the University of Queensland (and a Member of the Australian Health Ethics Committee), has surmised that non-coding DNA forms:

a massive parallel processing system producing secondary signals that integrate and regulate the activity of genes and proteins. In effect, they co-ordinate complex programs involved in the development of complex organisms.[9]

2.9 Proteins are critical components of all cells, determining colour, shape and function. Proteins can have a structural role (such as keratin, from which hair is made), or a functional role in regulating the chemical reactions that occur within each cell (such as the enzymes involved in producing energy for the cell). Proteins are themselves made up of a chain of amino acids. Within the DNA there is a code that determines which amino acids will come together to form that particular protein. The genetic code for each amino acid, consisting of three base sequences, is virtually identical across all living organisms.[10]

2.10 Different genes are switched on and off in different cells, leading to different proteins being made or expressed with different structures, appearances and functions—leading to the production of brain cells, nerve cells, blood cells, and so on. Contemporary stem cell research is based around the idea that it should be possible to learn how to use gene switches to coax stem cells into developing into the specialised cell or tissue needed for therapeutic purposes. Research recent has also begun to focus on ‘epigenetic’ changes to the human genome—subtle modifications to the genome that do not alter the DNA sequence, but may play a role in modulating gene expression. This may explain, for example, why many diseases appear only later in life, and why one twin may develop a genetic-linked disease while the other does not.[11]

2.11 When the instructions in a gene are to be read, the DNA comprising that gene unwinds and the two strands separate. An enzyme called RNA polymerase allows a complementary copy of one strand of the DNA to be made. This copy is made from RNA nucleotides, and is called ‘messenger RNA’ (or mRNA) because it serves to carry the coded genetic information to the protein-producing units in the cell, called ribosomes.[12] This process of reading the message in the DNA is called ‘transcription’. On the ribosomes, the amino acids are assembled in the precise order coded for in the mRNA.[13] The process of converting the message encoded in the RNA (mRNA) to protein using the ribosome is called ‘translation’. When the whole message has been translated, the long chain of amino acids folds itself up into a distinctive shape that depends upon its sequence, and is now known as a ‘protein’ (see Figure 6).[14]

[1] Chapter 2 of IP 26 also contained a genetics primer to assist public understanding of the issues involved in this Inquiry.

[3] As noted above, recent work by the Human Genome Project and related research mapping the human genome suggests that human beings have about 30,000–40,000 genes.

[4] However, this numbering system contains an historical anomaly: chromosome 22 originally was thought to be smaller than chromosome 21, but it is now known to be slightly larger.

[5] See the US Department of Energy’s ‘Gene Gateway’ website: US Department of Energy, Chromosome FAQs, <www.ornl.gov/hgmis/posters/chromosome/faqs.html>, 13 February 2003.

[6] It is believed that 150–200 million years ago, the X and Y chromosomes were about the same size, but that during evolution the Y chromosome has shrunk at the rate of five genes per million years: see S Jones, Y: The Descent of Men (2002) Time Warner Books, London.

[7] For an excellent popular account of modern genetics, see M Ridley, Genome: The Autobiography of a Species in 23 Chapters (1999) Fourth Estate, London.

[8] James Watson and Francis Crick, building upon work by Linus Pauling and RB Corey, and ‘stimulated by a knowledge of the general nature of the unpublished experimental results and ideas of Dr MHF Wilkins, Dr RE Franklin and their co-workers at King’s College, London’: see J Watson and F Crick, ‘A Structure for Deoxyribose Nucleic Acid’ (1953) 171 Nature 737. In 1962 Watson and Crick were awarded the Nobel Prize for this work.

[9] G O’Neill, ‘Ghost in the machine’, The Bulletin, 11 March 2003, 55.

[10] There are 64 different possible codons (given the four letters in the building blocks), and no codon can code for more than one amino acid. As there are only 20 different types of amino acids, some codons must encode the same amino acid. See R Hawley and C Mori, The Human Genome: A User’s Guide (1999) Harcourt Academic Press, Burlington, 32.

[11] See C Dennis, ‘Altered states’ (2003) 421 Nature 686, for a summary of recent research into epigenetics.

[12] RNA also carries the linear code and employs the same building block letters as DNA, except that it uses U (for uracil) in place of T (for thymine).

[13] Transfer RNA molecules (tRNA) also play a key role in carrying specific amino acids to the ribosome to be linked to the growing polypeptide or protein.

[14] See M Ridley, Genome: The Autobiography of a Species in 23 Chapters (1999) Fourth Estate, London, 9.