What is Knowledge? Plato’s Theaetetus

In Theaetetus Plato plunges into epistemology, the theory of knowledge. What is knowledge? How do we know things? How do we really know what reality is? and similar questions are the topics of discussion.

The dialog opens with a flashback to Socrates, at the end of his life shortly before his trial, talking to the geometer Theodorus about his current pupils. Theodorus tells Socrates that one, Theaetetus, shows particular promise. The youth arrives on the scene and Socrates decides to test him dialectically. After some preliminary banter, he asks Theatetus to define knowledge, Theaetetus initially starts by listing different kinds of knowledge such as geometry, calculation, shoe making, and others. After some prompting from Socrates, he states that knowledge is perception. Socrates accepts this provisionally but points out that different observers perceive things differently. Does this mean that knowledge is relative? This begins a section which introduces the debate between objectivism and subjectivism which has been a perennial question of Western philosophy. According to most of the pre-Socratic philosophers, including Protagoras, Heracleitus, and Empedocles, reality is not only relative but in a state of constant flux, so knowledge about reality is also relative and mutable.

"Theaetetus crater 4103 h1 4103 h2" by James Stuby based on NASA Images [CC0 via Wikipedia]

“Theaetetus crater 4103 h1 4103 h2 (on Luna)” by James Stuby based on NASA Images [CC0 via Wikipedia]

This argument is the precursor not only to later Epicurean theories of the universe, such as we read about in Lucretius, but also foreshadows both postmodern theories of subjectivism and much of modern physics including thermodynamics, relativity, and quantum theory. For example, the passage,

Yes, Theaetetus, and there are plenty of other proofs which will show that motion is the source of what is called being and becoming, and inactivity of not-being and destruction; for fire and warmth, which are supposed to be the parent and guardian of all things, are born of movement and of friction, which is a kind of motion;–is not this then the origin of fire?

Expresses at least the glimmerings of an intuitive understanding of the laws of thermodynamics.

Socrates leads Theaetetus to explore the implications of these ideas further. He also introduces the point that mad men and dreamers see things that are not there, so perception does not necessarily lead to true knowledge. This struck me as interesting, since at the same time I was reading Hermann Hesse’s novel Steppenwolf, in which the main character certainly seems to gain knowledge from events which occur only in his own mind. Be that as it may, however, Socrates’ arguments are more than enough to force Theaetetus to rescind his definition of knowledge as perception.

Socrates then steers the dialog into a discussion of the nature of memory. What about a man who has obtained true knowledge but then forgets or makes mistakes based on that knowledge?

At this point, Theaetetus needs a little help and, at the urging of Theodorus, begins to argue the opposite position. Among other things, he mentions the point that while all men might perceive things differently, wiser men and those with reason and specialized training in the subject at hand, are more likely to perceive correctly and gain true knowledge.

Socrates has another trick to spring, though, when leads Theaetetus to explore the question “What is perception?” and points out that sensory inputs are meaningless until processed by the mind (soul). He then asks about abstract concepts which can not be perceived with the senses. Theaetetus soon agrees that “[K]nowledge does not consist in impressions of sense, but in reasoning about them in that only, and not in the mere impression, truth and being can be attained[.]”

If knowledge is actually a sort of opinion, though, what about false opinions? This line of inquiry leads them back into an exploration of memory, since failure of memory is one possible cause of false opinions. This leads them back into deep waters. Socrates then points out that they have not adequately defined the verb “to know”, nor the concept of “false opinion”. With these points clarified he leads the boy on another discussion in which they define knowledge as “a way of understanding something by understanding it’s component parts.”

From this position, Socrates introduces yet a third conception of knowledge, understanding things by understanding the differences between them. But after discussing this, they conclude,

But how utterly foolish, when we are asking what is knowledge, that the reply should only be, right opinion with knowledge of difference or of anything! And so, Theatetus, knowledge is neither sensation nor true opinion, nor yet definition and explanation accompanying and added to true opinion?”

Since the definition itself includes the word “knowledge” and is thus circular. However, even though they have failed to provide the definition for which they were seeking, they are better for the attempt itself. Socrates then leaves for his indictment before the king archon, possibly never to see Theaetetus again.

Never mentioned explicitly, yet ever present in the dialog, is Plato’s theory of ideas. Plato believed that even though the material world is indeed mutable, imperfect, and subjective, everything in it partakes of perfect ideals or ideas which are unchanging and objective. Perhaps Socrates doesn’t bring up the theory of ideas because he wants to see if Theatetus will discover the concept on his own. Perhaps Plato is simply trying to keep the dialog to a manageable size and complexity. Or perhaps the state of epistemology conveyed in the dialog is actually as far as Socrates ever got on the subject, and this particular dialog is meant to express the master’s views without Plato’s later additions.

Advertisements

Steppenwolf by Hermann Hesse

Steppenwolf by Hermann Hesse cover image

Steppenwolf is an easy book to write about; the semiotics are so strong, the tropes are so plentiful, and the plot so powerful, that a critic has a wealth of material to seize on. At the same time, like all great works, it contains paradoxes and ambiguities which make it difficult or impossible to sum up the “meaning” or main idea of the book. Hesse himself, who lived to see the age of postmodern criticism, wrote that it was a “poetic work” in which the reader should find his own meanings. In the same author’s note, however, he states,

Yet it seems to me that of all my books Steppenwolf is the one that was more frequently and more violently misunderstood than any other, and frequently it is actually the affirmative and enthusiastic readers, rather than those who rejected the book, who have reacted to it oddly.

By these words we can infer that he did indeed have an objective message in mind and that he felt that at least some readers would be able to discover it.

The book is set in Weimer Republic Germany between the World Wars. It is a time of political and cultural uncertainty in which many of the old ideals and cultural norms no longer seem relevant. While the bourgeoisie, the class least affected by ideals and culture, continue their stolid lives, relatively unaffected, the rest of the society is adrift and devotes their lives to vice and transient material pleasures–living for the day because they know that the next war will start soon and be even more horrible than the last one. In this setting we find Harry Haller, a middle-aged intellectual who is almost completely alienated from his society, thoroughly lonely, and deeply depressed. Unable to form lasting relationships and convinced that the high culture he loves is dead, Haller repeatedly considers suicide but lacks the courage to go through with it. Haller’s life changes when he meets a “courtesan of moderately good taste” named Hermine. Hermine makes it her project to teach Haller to enjoy life, forcing him to learn to dance and engage himself with the beau monde of the city, associating with party girls, jazz musicians, and others whom he would never have approached on his own.

The book can be understood in various ways. Most literally, it is the story of a man’s mid-life crises. Like all of Hesse’s novels, it is partially autobiographical. Hesse wrote the book when he was fifty years old an “dealing with the problems of that age”. And so, the book is at least partially the story of a man who is approaching fifty who feels like his life has been wasted and compensates by dating younger women and trying to fit in on the modern musical scene. On a slightly deeper level, it is a case study in paranoid schizophrenia. Haller is far from “sane” in the conventional sense. Right up to the end of the book it is unclear which characters and events are real and which exist in his own mind. The word “schizomania” appears several times in the text and Haller himself excuses himself at one point by explaining that he is a “schizomaniac”. The parallels with other works treating with schizophrenia, such as A Beautiful Mind are quite obvious.

Beyond these interpretations, however, Steppenwolf is fundamentally an investigation into the concept of personality. All men, particularly men of genius, have personalities made up of many facets or aspects. Haller, who is still dealing with his divorce, is having trouble with his long distance relationship, has recently been fired from his writing job for his political views, and has moved to a city where he has no close friends, is clearly under massive stress. In this situation he is forced to integrate the various aspects of his personality or go mad.

This is far from easy, because his mind is occupied by a number of “people” who aren’t necessarily compatible. One of these is Harry Haller The Man, who is the somewhat artificial personality that Haller tries to present to the world. Shaped by bourgeoisie norms and long education, The Man is the least flexible (and likable) of Haller’s personalities. Opposed to The Man is The Steppenwolf, which represents both Haller’s animal nature and his individuality. The Steppenwolf is dangerous, because there is no place for him in civilized society but he gives Haller the strength to stand up for his convictions about the war and other issues. The beautiful and sensual Maria represents the part of Haller that loves freely and lives for pleasure, as well as the feminine part of his nature. This personality is initially completely suppressed, but waxes stronger as the book goes on. The wise and androgynous Hermine is Haller’s aspirational self. She represents mature sexuality and a balance of sensuality in which one pays for one’s pleasures but enjoys them unreservedly. She also represents religion, which used to be a factor in his life and will be again. Pablo, the brilliant young jazz musician who never talks about music but only plays it, represents Haller’s artistic soul–true art, not The Man’s dry intellectual analysis of art–and his emotions. He keeper of the “Magic Theater” i.e. Haller’s subconscious mind. Only by “meeting” each of these aspects and following the relationships between them through to their conclusions can Haller integrate the best parts of each of them into his core personality.

Max von Sydow and Dominique Sanda as Haller and Hermine in the 1974 movie adaptation of Steppenwolf

Max von Sydow and Dominique Sanda as Haller and Hermine in the 1974 movie adaptation of Steppenwolf

Major Tropes and Themes

Unification of Eastern and Western Thought – Like most of Hesse’s middlew and late works, Steppenwolf is infused with several ideas from Buddhism. Haller himself is represented as being a scholar of Eastern religion and it is implied that the ultimate end of his process of self discovery is to extinguish the self so as to become one with the all, a very Eastern concept.

Man is Never the Same Over Time – Haller at first seems like a static character, who has always been as he is now. It soon becomes apparent that he has changed greatly over time and is still changing. As Horace said, “Non sum homo eram” (I am not the man I used to be). His quest for self awareness and actualization is thus a never-ending process.

The “Real Man” – Society creates artificial men who are conformist and hypocritical. Real men are individualist and pursue their drives, particularly sexual drives, naturally and without guilt. Compare T.H. Lawrence’s Lady Chatterley’s Lover, from the same period, particularly the character Oliver Mellors. The real men often feel that they should have been born in a different time and place.

Conflict Between Intellectuals and the Bourgeoisie – Intellectuals can see the way things “ought to be” but the bourgeoisie won’t listen. As much as the intellectuals rebel, they can never completely escape their own bourgeoisie roots.

The Fine Line Between Genius and Madness – One of the most memorable monologues of the book contains the lines, “[M]any persons pass for normal, and indeed for highly valuable members of society, who are incurably mad; and many, on the other hand, are looked upon as mad who are geniuses.”

Substance Abuse to Suppress the Personality – Many of us writers, especially, have chosen to sedate our personalities with alcohol and other drugs rather than dealing with them, as Haller does early in the book.

The Inevitability of Death (and War) – Everyone dies, and all societies eventually go to war. It is useless for men to try to oppose these forces; they must learn to accept them.

Suicide as an Ongoing Process – Suicide has bad and good sides. On the one hand it can be a cowardly escape. On the other, it can represent killing the ego to seek enlightenment. In either case, it represents and ongoing decision or commitment to kill ones personality and actual physical death is strictly ancillary.

Parmenides of Plato

The Parmenides is Plato’s account of a meeting between the Italian-Greek philosopher Parmenides, then a venerable sage of sixty-five, and a young Socrates. Unlike earlier dialogues, which tended to focus on a particular topic, usually some aspect of virtue, it is not completely clear what Plato was trying to accomplish in the Parmenides. It is clear that Plato held Parmenides himself in high esteem, even though he would have been too young to have met him personally. A simple wish to memorialize him would not have been enough reason for him to write a dialog, however–particularly one as technical as this one. It is possible that he meant it as some sort of teaching aid for the dialectal method, but this also seems unlikely since there are so many good examples of dialectic in his earlier works. In addition, this is not the best written or most dramatic of Plato’s works. It flows awkwardly and the characters are poorly developed. It is easy to see why this has traditionally been one of the less popular Platonic dialogues with readers. So is this just an example of mid-career slump, or is there a deeper message to be gained?

Excavation area of Velia (Elea), Parmenides' Birthplace, near Ascea in Campania, Italy [Wikimeida Commons user AlMare CC BY-SA 2.5]

Excavation area of Velia (Elea), Parmenides’ Birthplace, near Ascea in Campania, Italy [Wikimeida Commons user AlMare CC BY-SA 2.5]

The dialog begins in a somewhat convoluted manner as a recollection by Cephalus of a recollection by Adeimantus (Plato’s older half-brother) of the meeting. Parmenides and Zeno are in town and Socrates and his friends have gone to see Zeno recite one of his own dialogues. Afterward, Socrates begins asking questions and, in the course of the conversation, starts arguing for an early version of his (Plato’s?) theory of ideas. Parmenides breaks in and offers several objections which Socrates is unable to answer. He then advises the young philosopher to be more rigorous in exploring all the implications of his hypotheses to their ultimate conclusion. Socrates then convinces Parmenides to provide a demonstration of his dialectal methods which he does, with Adeimantus as interlocutor.

Zeno has been speaking about “the One” (as opposed to “the Many”) so Parmenides chooses to examine the null hypothesis “The One does not exist”. There follows a rather long and tortuous dialog, the main structure of which are summarized in Jowett’s preface to his translation of the dialog):

1. One is.
2. One is not.
If one is, it is nothing.
If one is not, it is everything.

But is and is not may be taken in two senses:

Either one is one,
Or, one has being,

from which opposite consequences are deduced,
1.a. If one is one, it is nothing.
1.b. If one has being, it is all things.

To which are appended two subordinate consequences:
1.aa. If one has being, all other things are.
1.bb. If one is one, all other things are not.

The same distinction is then applied to the negative hypothesis:
2.a. If one is not one, it is all things.
2.b. If one has not being, it is nothing.

Involving two parallel consequences respecting the other or remainder:
2.aa. If one is not one, other things are all.
2.bb. If one has not being, other things are not.

This is barely more easy to follow than the dialog itself. In the end, though, Parmenides proves that the one must exist:

Parmenides: Then may we not sum up the argument in a word and say truly: If one is
not, then nothing is?

Adeimantus: Certainly.

Parmenides: Let thus much be said; and further let us affirm what seems to be the truth, that, whether one is or is not, one and the others in relation to themselves and one another, all of them, in every way, are and are not, and appear to be and appear not to be.

Adeimantus: Most true.

We are told that after this Adeimantus gave up philosophy and focused on training horses.

The existence of “The One” or Unity is certainly an important part of number theory and (much later) abstract algebra, but I don’t think Plato had math on his mind when he decided to write this dialog. We don’t know as much about Parmenides as we would like, but it seems from what we do know that his primary interest was cosmology. “The One” has a central place in Platonic and neo-Platonic cosmology as the First Hypostasis of the godhead: The One, The First Existant, The Unknowable, Infinite Unity which Christian theologians would eventually equate with The Father.

The first half of the dialog might have been a chance to memorialize a respected philosopher, but I think that Plato, who was himself turning more to metaphysics and cosmology in his later career, deliberately used the second half of the dialog to record an important proof that he knew would be useful in later work by himself and his students.

Perils of Reading Great Books out of Order (Pre-Plotinus)

I am now more than a year into my program of reading the Great Books to improve myself as a writer. At the onset I promised myself that, as much as was practical, I would try to read the the books in the order they were written. This is the advice that Grand Great Books Guru Mortimer Adler gives in How to Read a Book and elsewhere, since going in order allows you to trace the development of the “great conversation” of Western thought.

I was doing pretty well until I began working my way through Plato, but then I got bogged down. After reading seven dialogues plus the book-length Republic and writing seven blog posts on Platonic philosophy, I decided to skip ahead–surely eight works were enough to give me a taste of Plato’s work, and the dialogues would still be there when I got back to them, right?

All was well until I went to read Plotinus’ Enneads. I’ve been looking forward to Plotinus: not only was he the greatest of the neo-Platonists, and a fundamental influence on early Christian philosophy, but he was the last important pagan philosopher. I knew that as soon as I finished his works I could sail merrily into the middle ages. I knew he had a reputation as a tough author, but I didn’t see how much worse he could be than those I had already read.

Unfortunately, Plotinus is not only hard to read, his work is heavily based on that of Plato and Aristotle. By the time I had made it through the introductory matter in the Penguin edition, I realized that I had gone too far too fast. Plotinus continually references The Republic, Phaedo, and The Nicomachaean Ethics–all of which I had read quickly without bothering to study them deeply or writing blog posts, as well as Timaeus, Parmenides, The Sophist, The Categories, De Anima, and The Metaphysics–all of which I had skipped in my impatience. Therefore, regretfully, I am now putting my Plotinus aside for a few weeks and going back to classical Greece. Look for more Plato and Aristotle posts in the near future.

Automatic Indexing of LaTex Documents

A couple weeks ago I mentioned in a post that I was working on a Python script to automatically generate indexes of books written in the LaTex typesetting system.  At the time I promised to post the script in “a couple of days”.  Predictably, weeks have passed, my little script has ballooned into a full on open-source software project, and the code is now too long to post (or explain) in a single blog article.  If you’re interested, however, you can now download my alpha release from sourceforge.

The package includes two Python programs.  Indexmeister is a console utility which reads a file (in several formats, not just LaTex) and suggests terms for indexing.  It uses three different methods to figure out which terms are important.  Imbrowse is a Curses program which helps you interactively browse multi-file LaTex books and quickly insert the right tags to generate an index.

I made this video tutorial to show how the system works:

In the future I am thinking of adding a plug-in for LibreOffice, and possibly a graphical interface (probably using GTK bindings). Porting it to Windoze is not a priority, however.

Handyman Kevin Series II Premier

I rarely post updates here for my YouTube show, Handyman Kevin–mainly because it has its own dedicated blog. I thought I should mention, however, that the first episode of my second season premiered a few minutes ago:

 

The first season focused mainly on general Handyman skills. This season will have more of a focus on workshop tools and techniques. As before, we are planning to release thirteen fifteen to twenty-five minute episodes, each with an accompanying blog post.

Hesse’s Siddhartha

Siddhartha by Herman Hesse - coverSiddhartha is Hermann Hesse’s best known novel in the English speaking world. Unlike his earlier works which are semi-autobiographical and describe young men in dealing with crises of faith in contemporary Europe, Siddhartha is set in ancient India during the lifetime of the Buddha. When the book came out in 1927 it gave many westerners their first exposure to Eastern philosophy and religion. It is frequently included on lists of influential books of the 20th century and is a good candidate for inclusion on a Great Books reading list.

The full name of the “Supreme Buddha”, the founder of Buddhism was Siddhārtha Gautama. In Hesse’s book, however, he is represented by two discrete characters: Siddhartha, the protagonist, and Gotama, the founder of the religion.

Please note that the remainder of this post contains spoilers.

Siddhartha is a gifted son of a brahmin who is being groomed for a career in the ancient Vedic religion. In his twenties he becomes disillusioned with his fathers’ faith, which he believes is unlikely to lead to enlightenment. He and his friend Govinda leave their village and join a band of Samanas, wandering ascetic holy men who reject the teachings of the brahmins. Historically, by the time of the Buddha, their were numerous Samana sects with widely differing philosophies and practices. As portrayed by Hesse, they are very similar to the Cynic philosophers of the ancient world, who rejected all materialism and lived in voluntary poverty under a strict moral code. This is only one of the points where syncretism creeps in between Hesse’s “Eastern” novel and the Western philosophy of his literary background.

After three years Siddhartha and Govinda become frustrated with the Samanas’ program. Hearing that a new spiritual leader, Gotama, has achieved enlightenment they decide to seek him out and hear his teachings. Govinda is soon convinced and becomes a Buddhist monk. Siddhartha finds he has tremendous respect for Gotama Buddha and truly believes he is enlightened. However, he concludes that it is not possible to learn wisdom from a teacher, but only through personal experience. The split between organized religion and received authority, symbolized by Govinda and individual spiritualism and inquiry, symbolized by Siddhartha, becomes the most important theme for the rest of the book. Readers of my blog will also recall that the question of whether virtue (wisdom) can be taught was also of preeminent importance to Socrates and Plato–another incidence of Hesse’s syncretism.

After taking leave of Gotama and Govinda Siddhartha has an epiphany in which he decides to embrace materialism and accept the beauty of the universe in all its myriad forms, rejecting the idealistic philosophy of the Vedic and Buddhist religions, in which the world is seen as illusion. The parallels between his internal dialogue and the writings of the Epicureans, like Lucretius, are obvious. The practices that Siddhartha adopts are more like the bourgeoisie Epicureanism of Claudian Rome than the pure philosophy of Epicurus; he follows his new acceptance of materialism to the nearest city. Here he immediately embarks on a love affair with a high-profile courtesan, goes into business, and spends the next couple of decades making himself a wealthy self-made man. In the process he picks up a drinking problem and a gambling addiction. Finally, disgusted with himself, he walks away from everything and becomes a simple ferry-man on the banks of a river. Here, under the tutelage of a wise older ferryman he finally achieves inner peace.

The idea that philosophers should experience the world in their youth also shows up frequently in Plato, particularly in The Republic where the Guardians were not to be taught philosophy until they were thirty, and afterwards were to be turned adrift to make their way in the world for fifteen years, at which time they could assume their roles as philosopher-rulers.

Statue of Hermann Hesse in Calw, Germany [public domain via Wikipedia]

Statue of Hermann Hesse in Calw, Germany [public domain via Wikipedia]

It is natural that Hesse, who was raised in the Western tradition and educated in a European seminary (until he suffered a crisis of faith and dropped out), would interpret Eastern philosophy through the lens of his own background. It is also probably that I, raised in the same tradition, would criticize his work through a similar lens–particularly since I have been working with Plato and Lucretius recently and their writings are fresh in my mind. It is also true that authors, once they have created an individual style and enjoyed some commercial success, tend to follow it in subsequent works. So is this just a “typical” Hermann Hesse novel, but simply told in a new setting? I thought so until I read the final two chapters, in which Siddhartha’s personal philosophy reaches an ultimate formation which is distinctly, unarguably Asian.

The opposite of every truth is just as true! That’s like this: any truth
can only be expressed and put into words when it is one-sided.
Everything is one-sided which can be thought with thoughts and said with
words, it’s all one-sided, all just one half, all lacks completeness,
roundness, oneness. When the exalted Gotama spoke in his teachings of
the world, he had to divide it into Sansara and Nirvana, into deception
and truth, into suffering and salvation. It cannot be done differently,
there is no other way for him who wants to teach. But the world itself,
what exists around us and inside of us, is never one-sided. A person or
an act is never entirely Sansara or entirely Nirvana, a person is never
entirely holy or entirely sinful. It does really seem like this,
because we are subject to deception, as if time was something real.
Time is not real, Govinda, I have experienced this often and often
again. And if time is not real, then the gap which seems to be between
the world and the eternity, between suffering and blissfulness, between
evil and good, is also a deception.

The acceptance of paradox is one of the major traits which sets Eastern thought apart from Western thought. Westerners have always sought to categorize the universe, to break it down into ideas which are either one thing or another. Easterners except that a concept can be two, apparently contradictory, things at once. Even the most famous and enduring paradoxes in Western thought, the doctrine of the Trinity, was a product of Eastern thinkers and has never sat entirely comfortably with the West.

Likewise, the acceptance of nonlinear time is a hallmark of Eastern thinking. In the East, time can be circular if not completely illusory,

The sinner, which I am and which you are, is a sinner, but in times to come he will be Brahma again, he will reach the Nirvana, will be Buddha–and now see: these ‘times to come’ are a deception, are only a parable! The sinner is not on his way to become a Buddha, he is not in the process of developing, though our capacity for thinking does not know how else to picture these things. No, within the sinner is now and today already the future Buddha, his future is already all there, you have to worship in him, in you, in everyone the Buddha which is coming into being, the possible, the hidden Buddha. The world, my friend Govinda, is not imperfect, or on a slow path towards perfection: no, it is perfect in every moment, all sin already carries the divine forgiveness in itself, all small children already have the old person in themselves, all infants already have death, all dying people the eternal life. It is not possible for any person to see how far another one has already progressed on his path; in the robber and dice-gambler, the Buddha is waiting; in the Brahman, the robber is waiting. In deep meditation, there is the possibility to put time out of existence, to see all life which was, is, and will be as if it was simultaneous, and there everything is good, everything is perfect, everything is Brahman.

When I read this last chapter I realized that everything which proceeded it was part of Hesse’s design to, masterfully, lead his Western readers to a place where they might be able to appreciate these viewpoints.

On The Nature of the Universe by Titus Lucretius Carus

On the Nature of the Universe (aka Of the Nature of Things) book cover, Penguin edition

The entire universe is constantly moving. Objects, images, even souls are really unending streams of atoms, eternally reconfiguring themselves. Everything contains the seeds of its own creation and destruction. No sooner have the atoms assumed a form than it starts to decay–whether that thing is a person, a world, or a universe. This is the world view of first century Epicureanism, which the poet Lucretius tried to spread to the masses by casting it in the form of a book-length philosophical poem called De Rerum Natura. As a poem, it was apparently a hit when it was published posthumously about 55 BCE (possibly after having been edited by Cicero, although this story is usually considered apocryphal). Nonetheless, Epicureanism never really took off in the Roman Empire. The claims that there was no afterlife, nothing except matter, and that the gods, if they existed, had no interaction with the world of men, held no resonance with the people. The takeaway point, that the philosopher should live simply, enjoying simple pleasures and avoiding ambition and the pursuit of wealth, was anathema to Roman society, which was, if possible, even more bourgeoisie than our own. Stoicism and Neoplatonism were the dominant philosophies of Rome, until both were replaced (and largely absorbed by) Christianity.  Epicurus, Lucretius, and their fellows were centuries before their time; it was not until Spinoza, their natural scion, rediscovered and built upon their ideas in the 16th century that Western Civilization began to seriously incorporate these ideas in its main stream of thought.

In Three Philosophical Poets: Lucretius, Dante and Goethe, George Santayana writes the Epicurus was primarily a moral philosopher who adopted and adapted the natural philosophy of Democritus to support his moral platform, “Epicurus, the Herbert Spencer of antiquity, was in his natural philosophy an encyclopaedia of second-hand knowledge.”  Lucretius, on the other hand, puts the natural philosophy in the foreground in his poem, striving to present a well justified, internally consistent system–a grand unified theory, if you will. When I read it, I was surprised how many things he got right, well before his time. For instance, his understanding of air resistance is fairly sophisticated. He also correctly identified smells as being composed of tiny particles which slowly diffused through the air. He was half right when he advanced a similar explanation for light (photons sometimes behave like a particle, and sometimes like a wave depending on circumstances) but makes up for it by correctly arguing that light will move faster in a vacuum than in a medium like air or water. He also correctly identifies that the shapes of particles are an important determinant of the physical properties of substances. At times he brushes tantalizingly close to a notion of entropy.

Of course he gets plenty of things wrong, mainly because he is mistaken about some of his fundamental axioms. For instance, his anatomy suffers from the fact that he thinks the mind is lodged in the upper abdomen. He does not question that the earth is the center of the solar system. Most importantly, because he feels everything is made up of matter, he advances completely erroneous explanations for many phenomena which really involve energy. For example, he believes that lightning is a concentrated form of the kind of matter which is found in fire. He sees the human brain as being composed of a multitude of microscopic moving particles which shift around rapidly, sort of like a very complex pachinko machine.  He believes that magnets extrude microscopic fibers of iron to entangle other iron pieces. He believes that what we would call chemical bonds are caused by a physical hooking together of the shapes of atoms. Many of these errors were unavoidable, however, since he had no instruments with which to detect energy or fundamental forces. And on one level he was absolutely correct: Einstein would eventually prove, with his famous E=mc2, that everything is matter, or at least convertible into matter.

Despite these occasional quaint misconceptions, On the Nature of Things is a fascinating piece of work. To me, the epicurean viewpoint is much more intuitive that that of Plato and Aristotle, whose books I have recently been studying. I attribute this to the fact that, since my early training was in engineering, I have taken quite a few science classes in my life, so it is very easy for me to slip into the materialist/naturalist viewpoint. Then again, Spinoza–who, as I said, is the Epicureans philosophical heir–has long been one of my favorite philosophers. That being said, I find that I just can’t accept Lucretius’ contention that there is nothing beyond the material world. As fabulous and infinite as the universe (multiverse?) is, I just can’t accept that this is all there is. Lucretius seems to have been unquestioning in his atheism. For myself–even if I were not a Christian–I just find it hard to be that sure about anything.

Note About Editions:

Lucretius’ original poem was written in Latin in dactylic hexameter, a meter which isn’t compatible with English (or Latin, really–Lucretius literally couldn’t use certain words and phrases because they wouldn’t fit). English translations are either in verse or prose. The poetry translations give more of a sense of the original experience, but the prose translations are much easier to read. Project Gutenberg has William Leornards’ blank verse translation. Penguin’s prose translation (by Ronald Latham) is sold as On the Nature of the Universe. It would be preferable, of course, to read the poem in the original language, but that would require a better recollection of high school Latin than I can boast.

Off to Grad School Again: The Second Essay

The other day I posted the first of the essays I had to write for my application to CSUDH’s Humanities Master of Arts External (HUX) program. As promised, here is the second, longer essay. The prompt asked me to describe two to three events, works, or people which inspired my interest in the humanities. I chose to write about two professors I worked under as a teaching assistant the last time I was in graduate school who made particularly effective use of the Great Books in their courses.

Two professors, Dr. Sean Jasso and Dr. Paul Beehler, did more to inspire my interest in studying and teaching the humanities than anyone else I have met. Ironically, I met both of them not by taking humanities courses, but by being assigned as their teaching assistant in business school. Each of them, however, is serious about integrating the humanities in their undergraduate business classes and expects their assistants to do the same. While working for them I learned more about writing, criticism, and the great authors of the Western canon than I did in my entire undergraduate career.

Dr. Sean Jasso’s background is in hospitality management but his research is in public policy and corporate ethics. For several years he has been fine-tuning a class titled “Business Ethics and Law in Society”. The main text for the course is Michael Sandel’s Justice, which uses real world examples to illustrate the ideas of ethical philosophers such as Aristotle, Kant, Rawls, and Mill. All of these authors were new to me. I nearly panicked the first time a student appeared in my office saying that she “didn’t really understand Kant’s theory of categorical imperatives,” and could I explain it for her. As every teacher knows, however, teaching a subject is the best way to understand it. My own pedagogical style relies heavily on Socratic questions to encourage students to think critically and make connections, so my weekly discussion sections became a shared journey of inquiry with my students as we found new ways to apply the teachings of these philosophers to weekly case studies.

With Dr. Jasso’s help, I soon found ways to apply the philosophy we were teaching to situations in my professional life. One ethical issue that affects everyone in higher education is academic integrity. Catching a student cheating or plagiarizing creates an ethical dilemma for any teacher teacher, especially an overworked graduate assistant. To simply ignore the offense and pass the student is easy, but is a betrayal of one’s duty and, in utilitarian terms, hurts the whole society by lessening the value of a university education for all students. Failing the offender and turning them over for disciplinary action is nearly as easy and can be justified on the grounds that cheating is categorically wrong and that punishing cheaters rewards those students who do not offend. Dr. Jasso believes, however, that because a teacher’s purpose is to educate, a cheating incident needs to be used as an additional opportunity to teach the student. He expects his assistants to call a meeting the student and himself. In this meeting teaching assistant confronts the student, who is given an opportunity to confess. Students who come clean are then prompted to explain why their actions were wrong and allowed to write an essay titled “Why Cheating is Wrong and I Won’t do it Again”, supporting their points with material from the class. If the teaching assistant is satisfied with the essay then they are not referred for disciplinary action (they still have to repeat the course). These “cheater meetings” were emotionally exhausting for the teaching assistant and created extra grading work, but Dr. Jasso convinced me that they were the right thing to do.

Dr. Paul Beehler is an English professor who teaches “Business Writing and Communications” for the School of Business Administration. One of the texts for his course is Machiavelli’s The Prince. As their term project students are required to write a research paper analyzing the strategy of a real corporation in terms of Machiavellian philosophy. When grading papers and exam blue books I found that I usually knew within a few paragraphs whether I was looking at ‘B’ or ‘C’ work (there were very few ‘A’s), but a letter grade is almost useless to a student because it doesn’t tell them what they are doing right and wrong. Dr. Beehler pushed me to become not only an editor, but a critic: deconstructing a student’s work and offering comments on their style, logical reasoning, creativity, and use of semiotics. This was a painful process for me, because Dr. Beehler spot checks his assistants’ grading work and often returns papers to be regraded. I was frequently frustrated when his opinion of a paper differed widely from my own. As time went on, however, I realized that my criticism tended to be fairly shallow and he was teaching me to read at a deeper level– to go beyond mechanics and rhetorical flourishes and assess the sophistication of a student’s thoughts. I soon I realized that I was applying a deeper level of analysis to everything I read, including my own work. I was also able to give much better comments to students who brought in their work in progress to show me during office hours. This made me a better critic and editor which in turn made me a better writer.

Another benefit of teaching the class under Dr. Beehler is that it introduced me to Machiavelli’s work, which I now understand represents a watershed in Western philosophy. Machiavelli stands upon the divide between the Renaissance and the Enlightenment and represents one of the first articulations of the basically humanistic path which Western thought has followed for the past five centuries. His decision to embrace republican political philosophy over the traditional divine right of kings not only influenced all of the enlightenment authors who followed him, but eventually led the way to the liberal democracies in which we now live.

Even though I never took a course of theirs, nor did research under them, Dr. Jasso and Dr. Beehler taught me more than any of the professors I knew in professional school. Dr. Jasso introduced me to the great ethical philosophers and showed me how to integrate their theories into my professional life. Dr. Beehler pushed me to a higher level of writing and textual criticism, making me a better writer. Both inspired what I suspect will be a lifelong interest in the Western canon and the humanities in general, and teaching under them was one of the most valuable aspects of my professional school experience.

Word Frequency Analysis – The Most Common Words

There are any number of reasons why you might need a list of the most common words in the language. In my case, I was working on a piece of software to speed the process of building indexes for my print books. My program reads the book and suggests a list of words that the author might want to include in the index. It needed a list of the most common words so it would know not to bother suggesting them. I’ll post that script in a couple of days. For now, though, I thought I would give you a very simple piece of Python code that reads a directory full of text files, counts how many times each word occurs, and prints a list of those which show up most often. I set it to give me the most common 1000 words. You could generate a list of any length, though, just by changing one number in the code.

If you don’t care to look behind the curtain and just want to cut and paste my word list, feel free to scroll down to the bottom of the post.

For raw data, I used a sample of 37,358 Project Gutenberg texts. PG is kind enough to offer an interface for researchers like me to harvest books. Note that this would work nearly as well with a much smaller sample. But I had already downloaded the books for another project, so I figured I might as well use them. If you use a PG harvest for your data set, make sure and remove the Human Genome Project gene sequence files (a full dump contains at least three copies of the full human genome). Otherwise, this script will have major grief when it tries to count each gene as a word.

Note that, as currently written, this script requires GNU Aspell and a system that works correctly with pipes. This means it should run fine on nearly any Unix-like system, but you Windoze people are on your own.

The first part of the script loads a a few standard modules. Then it gets a listing of the current directory and starts looping through each text file in it. With each iteration it prints a status message with the file name and percent completion. With scripts like this that take a day or two to run I like to be able to see at a glance how far along I am. As an aside, if you access your computer through a terminal like I do you will probably want to use GNU Screen or a similar utility to protect yourself from accidental disconnects while it’s still running.

#! /usr/bin/env python

'''Word frequency analyzer'''

import os, stat, string, subprocess

wordcounts={}

filelist = os.listdir('.')

counter = 0
for f in filelist:
    
    counter += 1
    
    if os.path.splitext(f)[1] == '.txt':
        print f+'t', str(counter)+' of '+str(len(filelist))+'t', 
        print str((float(counter)/float(len(filelist)))*100)[:6]+'%'

The next portion opens each book file and reads it in. Next, because I’m using PG books as a data set I need to trip off all of the boilerplate license text which occurs at the beginning and end of the files. Otherwise, because similar text appears in every file, it will skew the word distributions. Luckily, PG marks the actual text of the book by bracketing it in the words “START OF THIS PROJECT GUTENBERG EBOOK” and “END OF THIS PROJECT GUTENBERG EBOOK”. The front part is easy, we just do a string find to get the location of the first line-feed character after the start text appears. The end part is a little trickier; the easiest way to get it is to reverse the whole book. This means, however, that we also need to flip the search text. Pretty neato, huh?

   with open(f, "rb") as infile:  book=infile.read()
    
        #try to determine if this is a Gutenberg ebook.  If so, attempt
        #to strip off the PG boilerplate 
        if "PROJECT GUTENBERG EBOOK" in book:
    
            a = book.find("START OF THIS PROJECT GUTENBERG EBOOK")
            if a <> -1:
                b = book.find('n', a)
            c = list(book); c.reverse()
            book = string.join(c, '')
            
            d = book.find('KOOBE GREBNETUG TCEJORP SIHT FO DNE')
            if d <> -1:
                e = book.find('n', d)
            c = list(book); c.reverse()
            book=string.join(c, '')
            book = book[b:len(book)-e]

The next step is to check the book text for words that aren’t in the dictionary, simply because there is no reason to count words that aren’t part of Standard English. The easiest way to do this on a Linux system like mine is to run the system’s spellcheck, Aspell, on the file. We also want to eliminate duplicate words from this list, since it will save iterations later.

        #see which words aren't in the dictionary
        oddwords = subprocess.check_output(
                    "cat "+f+" | aspell list", shell=True).split()

        #find unique words
        u_oddwords = []
        for w in oddwords:
            if w not in u_oddwords: u_oddwords.append(w)

Next, we go through the book text and strip out most of the punctuation. The string containing the punctuation to be removed looks a lot like the string you get by calling string.punctuation. Note, though, that I left in the “‘” and “-” characters because they are actually a part of contractions and compound words, respectively. I also split the book text, which at this point is one big string, into a list of words and capitalize them.

        #strip out most of the punctuation
        book2=''
        for i in range(len(book)):
            if book[i] not in '!"#$%&()*+,./:;<=>?@[\]^_`{|}~':  
                book2=book2+str(book[i])
                
        book=str(book2).capitalize().split()

In the final segment of the script we count how many times the words occur and update the counters, which are kept as a dictionary object. Then we convert the dictionary to a list, sort it, and print the 1000 most common words to a CSV data file. If you need a different number of words, just change the 1000 to another value.

        for w in book:  
            if w not in u_oddwords:
                if w not in wordcounts:
                    wordcounts[w] = 1
                else:
                    wordcounts[w] += 1
                    

final_list = []
for w in wordcounts:
    final_list.append([wordcounts[w], w])

final_list.sort()
final_list.reverse()

                    
with open('wordcounts_pg', 'w') as wc_output:
    
    for i in range(min(1000, len(final_list)-1)):
        wc_output.write(final_list[i][1]+', '+str(final_list[i][0])+'n')
        

That’s all there is to it. Pretty easy, huh? Now set it to run, detach the terminal, and ignore it until this time tomorrow. My machine can count words in about 1500 books per hour, so it takes about 25 hours to make it through the full sample.

And now, finally, here is the list of words. Feel free to cut and paste it to use for your own projects:

Word Occurrences
the 149164503
of 81154540
and 73797877
to 60771291
a 47925287
in 41773446
that 26590286
was 24584688
he 24462836
i 24025629
it 22795878
his 20173668
is 18378165
with 18081192
as 17645451
for 17473870
had 14408612
you 13939609
be 13252982
on 13207285
not 13181744
at 13015022
but 12718486
by 12438046
her 11878371
which 10826405
this 10263128
have 10196168
from 10088968
she 9778689
they 9715080
all 8819085
him 8771048
were 8314601
or 8143254
are 7787136
my 7572900
we 7412199
one 7373621
so 7203582
their 7018823
an 6518028
me 6419080
there 6267776
no 6185033
said 5938853
when 5899530
who 5878132
them 5808758
been 5787319
would 5689624
if 5655080
will 5166315
what 4895509
out 4556168
more 4440752
up 4416055
then 4222409
into 4129481
has 4000893
some 3929663
do 3914008
could 3749041
now 3747314
very 3630489
time 3571298
man 3559452
its 3544086
your 3522411
our 3517346
than 3494543
about 3349698
upon 3337366
other 3316391
only 3285019
any 3236410
little 3183383
like 2993385
these 2979508
two 2943507
may 2934056
did 2915540
after 2853393
see 2852408
made 2842273
great 2839852
before 2774768
can 2746279
such 2734113
should 2708032
over 2672597
us 2651042
first 2553483
well 2517899
must 2484839
mr 2465607
down 2433044
much 2428947
good 2376889
know 2372135
where 2353232
old 2291164
men 2286995
how 2261780
come 2217201
most 2188746
never 2160804
those 2135489
here 2122731
day 2071427
came 2061124
way 2042813
own 2037103
go 2009804
life 2007769
long 1992150
through 1989883
many 1982797
being 1976737
himself 1941387
even 1915129
shall 1890432
back 1865988
make 1852069
again 1848115
every 1845835
say 1817170
too 1810172
might 1807261
without 1781441
while 1759890
same 1701541
am 1696903
new 1687809
think 1665563
just 1660367
under 1649489
still 1643537
last 1616539
take 1614771
went 1595714
people 1593685
away 1582685
found 1574065
yet 1563963
thought 1556184
place 1543300
hand 1500131
though 1481938
small 1478723
eyes 1469270
also 1467931
house 1438223
years 1435529
1433313
another 1415606
don’t 1381480
young 1379348
three 1378462
once 1377940
off 1376942
work 1375035
right 1360201
get 1345597
nothing 1344419
against 1325938
left 1289397
ever 1269433
part 1261573
let 1260289
each 1258840
give 1258179
head 1254870
face 1253762
god 1249406
0 1239969
between 1225531
world 1219519
few 1213621
put 1200519
saw 1190392
things 1188437
took 1172602
letter 1167755
tell 1160034
because 1155609
far 1154860
always 1152942
night 1152416
mrs 1137055
love 1121812
both 1111644
sir 1100855
why 1097538
look 1095059
having 1069812
mind 1067461
father 1062643
called 1062190
side 1053255
looked 1051044
home 1036554
find 1036485
going 1034663
whole 1033731
seemed 1031466
however 1027701
country 1026854
got 1024945
thing 1022424
name 1020634
among 1019175
seen 1012779
heart 1011155
told 1004061
done 1000189
king 995498
water 994392
asked 993082
heard 983747
soon 982546
whom 979785
better 978434
something 957812
knew 956448
lord 956398
course 953585
end 947889
days 929530
moment 926478
enough 925144
almost 916006
general 903316
quite 902582
until 902333
thus 900738
hands 899106
nor 876106
light 869941
room 869532
since 864596
woman 864072
words 858824
gave 857475
b 853639
mother 852308
set 851757
white 850183
taken 848343
given 838078
large 835292
best 833941
brought 833270
does 826725
next 823345
whose 821731
state 820812
yes 817047
oh 815302
door 804702
turned 804433
others 800845
poor 800544
power 797133
present 792424
want 791194
perhaps 789201
death 788617
morning 786748
la 783512
rather 775384
word 774340
miss 771733
less 770410
during 763957
began 762442
themselves 762418
felt 757580
half 752587
lady 742708
full 742062
voice 740567
cannot 738450
feet 737299
order 736997
near 736832
true 735006
1 730887
it’s 727886
matter 726818
stood 725802
together 725703
year 723517
used 723293
war 720950
till 720824
use 719314
thou 714663
son 714275
high 713720
round 710093
above 709745
certain 703716
often 698006
kind 696975
indeed 696469
i’m 690646
along 688169
case 688098
fact 687334
myself 684387
children 683334
anything 682888
four 677704
dear 676320
keep 675722
nature 674055
known 671288
point 668710
p 668356
friend 666493
says 666011
passed 665792
within 665633
land 663605
sent 662540
church 659035
believe 656459
girl 652783
city 650397
times 649022
form 647388
herself 646989
therefore 644835
hundred 640059
john 639007
wife 636379
fire 632762
several 632704
body 630129
sure 629252
money 629251
means 627640
air 626921
open 626306
held 625660
second 622526
gone 614808
already 613870
least 609236
alone 606078
hope 602206
thy 599253
chapter 597339
whether 596307
boy 596048
english 594784
itself 591838
2 591413
women 589579
hear 587189
cried 586705
leave 586112
either 581618
number 576685
rest 575648
child 574531
behind 572007
read 571445
lay 571286
black 569530
government 567320
friends 567282
became 564384
around 559161
river 556286
sea 552753
ground 550622
help 549284
c 548349
i’ll 546929
short 546465
question 545629
reason 545464
become 544896
call 544576
replied 544286
town 543694
family 542309
england 542109
lost 537241
speak 537188
answered 536154
five 535088
coming 534713
possible 534639
making 530530
hour 530471
dead 529575
really 528631
looking 528622
law 528248
captain 525928
different 522269
manner 519256
business 516115
states 511757
earth 511042
st 510820
human 510666
early 508769
sometimes 507383
spirit 506297
care 505984
sat 505109
public 504862
close 503948
towards 503262
kept 502051
french 501813
party 500749
truth 500365
line 498822
strong 498492
book 496520
able 494330
later 494101
return 492237
hard 490701
mean 489853
feel 487798
story 486538
m 485841
received 483744
following 481558
fell 480591
wish 480562
person 480508
beautiful 479656
seems 477423
dark 476293
history 475744
followed 474307
subject 473058
thousand 470929
ten 469675
returned 469387
thee 467513
age 466838
turn 466674
fine 466630
across 466545
show 465685
arms 465504
character 464946
live 464642
soul 463939
met 463300
evening 463176
die 462851
common 459553
ready 457764
suddenly 456627
doubt 455415
bring 453346
ii 453190
red 450793
free 447675
that’s 445572
account 444530
cause 444403
necessary 444147
can’t 443812
need 443326
answer 442440
miles 441924
carried 438793
although 438423
fear 437796
hold 437493
interest 437382
force 436993
illustration 436577
sight 435854
act 435269
master 433105
ask 432510
idea 432424
ye 432036
sense 430693
an’ 430321
art 430226
position 429722
rose 428624
3 427441
company 427142
road 425669
further 425131
nearly 424118
table 424064
everything 423740
brother 423088
sort 422809
south 421800
reached 420190
london 418755
six 418131
didn’t 416216
cut 412716
taking 412571
continued 411607
understand 411326
appeared 409564
sun 407584
none 407168
else 406851
big 406799
o 406388
longer 406382
deep 406170
army 405897
beyond 405580
view 404378
strange 400814
natural 400483
talk 399814
north 398556
suppose 396693
court 396267
service 393925
bed 393878
past 393609
ought 393331
street 392970
cold 391836
hours 391460
toward 390231
added 389818
spoke 389420
seem 388757
neither 388355
late 388105
probably 387568
real 386926
clear 385649
chief 385350
run 385269
certainly 385179
est 384982
united 384930
stand 384385
forward 384028
front 383866
purpose 382457
sound 382443
feeling 382032
eye 380164
happy 378251
i’ve 377633
except 374853
knowledge 374155
blood 373563
low 373268
remember 373173
pretty 372548
change 372221
living 371264
american 369773
bad 369425
horse 369396
peace 369168
meet 366864
effect 365907
boys 364460
en 364172
school 362681
comes 362575
france 360771
fair 359826
forth 359249
died 359161
fall 358176
placed 357047
note 354944
led 354740
saying 354703
length 354502
pass 353234
gold 350268
entered 349397
doing 348304
latter 347844
written 347699
laid 346808
4 344382
according 343990
daughter 343682
opened 343526
dr 340867
trees 339826
distance 339817
office 339771
attention 339722
hair 337835
n 337111
prince 335635
wild 335514
wanted 335167
society 335139
husband 332251
play 331807
wind 330079
green 329633
greater 329453
tried 328784
west 328702
important 327851
ago 327793
bear 325469
various 325246
especially 324511
mine 321967
paper 320046
island 320002
glad 319989
makes 319717
instead 319188
faith 318882
lived 318731
pay 318090
heaven 316878
ran 315958
s 315761
blue 315697
minutes 315172
duty 315065
foot 314708
ship 314700
fellow 314523
letters 313624
persons 311105
action 310840
below 309831
heavy 309808
york 309749
strength 308836
pleasure 307965
immediately 307823
remained 307750
save 306991
standing 306911
whatever 306070
won’t 305381
trouble 305338
e 305293
window 305257
object 305202
try 304928
parts 304007
period 303992
desire 303985
beauty 303513
opinion 303459
arm 303347
system 302641
third 302389
chance 301890
books 301331
george 300975
doctor 300779
british 300353
silence 300238
he’s 300053
enemy 298899
hardly 298533
5 296045
greek 295622
exclaimed 294602
send 293592
food 293239
happened 293092
lips 292334
sleep 291632
influence 290698
slowly 290590
works 289252
months 288930
generally 288629
gentleman 287966
beginning 287473
tree 287341
boat 286781
mouth 285685
there’s 285569
sweet 285425
drew 284944
deal 284389
v 284339
future 284186
queen 284002
yourself 283364
condition 283335
figure 283153
single 283016
smile 282793
places 282793
besides 281838
girls 281703
rich 281130
afterwards 281017
battle 280676
thinking 280651
footnote 280245
presence 279893
stone 279829
appearance 279691
follow 279498
iii 279239
started 278072
caught 277993
ancient 277595
filled 277238
walked 276882
impossible 276720
broken 276365
former 276016
century 275990
march 275880
274800
field 274479
horses 274255
stay 274139
twenty 273187
sister 272290
getting 271641
william 270478
knows 269506
afraid 269150
result 268749
seeing 268724
you’re 268500
hall 267020
carry 266780
arrived 266706
easy 266309
lines 265956
wrote 265929
east 265852
top 265242
wall 264942
merely 264898
giving 264484
raised 264154
appear 264015
simple 263923
thoughts 263760
struck 263694
moved 263492
mary 263463
direction 263444
christ 263262
wood 263260
born 263084
quickly 262966
paris 262393
man’s 262105
visit 261882
outside 260418
holy 260348
entirely 259045
somewhat 259020
week 258960
laughed 258562
secret 258198
village 257758
henry 257557
christian 257504
danger 257486
wait 257012
wonder 256770
learned 256420
stopped 256191
tom 256117
covered 256117
6 255876
bright 255349
walk 255090
leaving 254851
experience 254763
unto 254610
particular 254564
loved 254479
usual 254307
plain 253867
to-day 253804
seven 253567
wrong 253172
easily 252954
occasion 252780
formed 252707
ah 252144
uncle 252120
quiet 252035
write 251743
scene 251380
evil 250993
married 250965
please 250781
fresh 250507
camp 249947
german 248539
beside 248522
mere 248276
fight 247957
showed 247904
grew 247866
expression 247804
scarcely 247641
board 247578
command 247398
language 247302
considered 247260
regard 247101
hill 246854
finally 246533
national 246452
paid 246364
joy 246060
worth 245352
piece 244733
religion 244677
perfect 244671
royal 244615
tears 244448
president 244135
value 244084
dinner 243572
spring 242721
produced 242576
middle 242282
charles 242134
brown 241885
expected 241668
lower 241299
circumstances 241150
remain 241102
wide 240773
political 240686
charge 240464
success 240254
per 240083
officers 239806
hath 239618
indian 239572
observed 239548
lives 239448
respect 238787
greatest 238784
w 238776
cases 238527
tone 238005
america 237215
youth 236992
summer 236698
garden 236552
music 236354
waiting 236223
due 236178
modern 235763
jack 235557
unless 235428
study 235093
allowed 234852
leaves 234652
bit 233774
race 233156
military 232907
news 232435
meant 232274
afternoon 232063
winter 231867
picture 231735
houses 231575
goes 231281
sudden 230675
proper 230476
justice 230410
difficult 229784
changed 229658
grace 229281
chair 228931
10 228875
private 228392
eight 228222
hot 227873
reach 226608
silent 226552
‘i 226540
flowers 226379
laws 226197
noble 225931
watch 225328
floor 225326
killed 225020
built 224484
declared 224477
judge 224393
colonel 224303
members 224213
broke 224166
fast 223897
duke 223481
o’ 223293
shot 223105
sit 222222
usually 222162
step 222119
speaking 222101
attempt 221687
marriage 221054
walls 220575
stop 220466
special 220316
religious 220300
discovered 220260
beneath 219894
supposed 219260
james 219013
gives 218988
forms 218743
turning 218692
authority 218686
original 218519
straight 218414
property 218393
page 218233
plan 218185
drawn 217873
personal 217458
l 217130
cry 217022
passing 216926
class 216527
likely 216216
sitting 215841
cross 215821
spot 215719
soldiers 215683
escape 215311
complete 215288
eat 215120
bound 214985
conversation 214895
trying 214332
meeting 213898
determined 213756
simply 213506
shown 213457
bank 213261
shore 212917
running 212509
corner 212507
soft 212163
journey 212007
isn’t 211316
i’d 211132
reply 210852
author 210827
believed 210653
rate 210607
prepared 210558
lead 210548
existence 210220
enter 209851
indians 209589
troops 209398
wished 209068
glass 208986
notice 208859
higher 208770
social 208685
iron 208019
rule 207943
orders 207856
building 207813
madame 207780
mountains 207700
minute 207575
receive 207440
offered 207306
h 206821
names 206725
learn 206618
similar 206437
closed 206419
considerable 206102
lake 206017
wouldn’t 206012
8 205864
pleasant 205487

And here is the complete script:

#! /usr/bin/env python

'''Word frequency analyzer'''

import os, stat, string, subprocess

wordcounts={}

filelist = os.listdir('.')

counter = 0
for f in filelist:
    
    counter += 1
    
    if os.path.splitext(f)[1] == '.txt':
        print f+'t', str(counter)+' of '+str(len(filelist))+'t', 
        print str((float(counter)/float(len(filelist)))*100)[:6]+'%' 
    
        with open(f, "rb") as infile:  book=infile.read()
    
        #try to determine if this is a Gutenberg ebook.  If so, attempt
        #to strip off the PG boilerplate 
        if "PROJECT GUTENBERG EBOOK" in book:
    
            a = book.find("START OF THIS PROJECT GUTENBERG EBOOK")
            if a <> -1:
                b = book.find('n', a)
            c = list(book); c.reverse()
            book = string.join(c, '')
            
            d = book.find('KOOBE GREBNETUG TCEJORP SIHT FO DNE')
            if d <> -1:
                e = book.find('n', d)
            c = list(book); c.reverse()
            book=string.join(c, '')
            book = book[b:len(book)-e]
                
        
        #see which words aren't in the dictionary
        oddwords = subprocess.check_output(
                    "cat "+f+" | aspell list", shell=True).split()

        #find unique words
        u_oddwords = []
        for w in oddwords:
            if w not in u_oddwords: u_oddwords.append(w)
            
        
        #strip out most of the punctuation
        book2=''
        for i in range(len(book)):
            if book[i] not in '!"#$%&()*+,./:;<=>?@[\]^_`{|}~':  
                book2=book2+str(book[i])
                
        book=str(book2).capitalize().split()
        
        for w in book:  
            if w not in u_oddwords:
                if w not in wordcounts:
                    wordcounts[w] = 1
                else:
                    wordcounts[w] += 1
                    

final_list = []
for w in wordcounts:
    final_list.append([wordcounts[w], w])

final_list.sort()
final_list.reverse()

                    
with open('wordcounts_pg', 'w') as wc_output:
    
    for i in range(min(1000, len(final_list)-1)):
        wc_output.write(final_list[i][1]+', '+str(final_list[i][0])+'n')