Hello Readers,
Here we begin exploring Natural Language Processing in Python using the nltk module. Previously, I posted a Text Mining blog series, specifically with Twitter data. With Python we progress one step further into Text Analysis: language processing.
Let us get started. Start up Python (I am using 2.7), and install the nltk module with "pip install nltk" from the command line or use the installer from their website.
Starting nltk
nltk consists of multiple lexical sources, corpora, or structured text, and text processing libraries. You first need to download the text sources using "import nltk" and "nltk.download()", after which a download box should appear. Choose the destination folder, select the book option in Collections, and click download.Figure 1. nltk Download |
To check the installation, "from nltk.book import *" would import all the books in the collection. In your command prompt, it might look similar to this:
Figure 2. IPython in the Command Prompt with nltk Books |
Note that it might take more than a few seconds for the large texts to load. Below (and above in Figure 2), you can view the loaded nine text names, and you can print them again with "texts()". To gather information about a specific text, just type in the text number, "text1", and it turns out to be Moby Dick written by Herman Melville in 1851.
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | In [2]: from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908 In [3]: text1 Out[3]: <Text: Moby Dick by Herman Melville 1851> |
Looking through the texts, there are numerous variety among them. From Jane Austen, to the Book of Genesis, it provides quite a selection: the Inaugural Address Corpus, Chat Corpus, which covers uncensored chat room dialogue, and text from the Wall Street Journal. Typing the text key, such as "text1", will give more information about the text.
Searching Text
Now that we have access to the text, we can begin exploring the contents. One useful method of the Text object is ".concordance()". It prints the context of a particular word where it occurs in the text. This is an excellent way to understand how the author uses a specific word. For example, we search for the word "monstrous" in the first text, Moby Dick below. The method returns 11 matches of the "monstrous" and also gives the words around where it occurs in the text.Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | In [4]: text1.concordance('monstrous') Building index... Displaying 11 of 11 matches: ong the former , one was of a most monstrous size . ... This came towards us , ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r ll over with a heathenish array of monstrous clubs and spears . Some were thick d as you gazed , and wondered what monstrous cannibal and savage could ever hav that has survived the flood ; most monstrous and most mountainous ! That Himmal they might scout at Moby Dick as a monstrous fable , or still worse and more de th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l ing Scenes . In connexion with the monstrous pictures of whales , I am strongly ere to enter upon those still more monstrous stories of them which are to be fo ght have been rummaged out of this monstrous cabinet there is no telling . But of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u In [5]: text2 Out[5]: <Text: Sense and Sensibility by Jane Austen 1811> In [6]: text2.concordance('affection') Building index... Displaying 25 of 79 matches: , however , and , as a mark of his affection for the three girls , he left them t . It was very well known that no affection was ever supposed to exist between deration of politeness or maternal affection on the side of the former , the tw d the suspicion -- the hope of his affection for me may warrant , without impru hich forbade the indulgence of his affection . She knew that his mother neither rd she gave one with still greater affection . Though her late conversation wit can never hope to feel or inspire affection again , and if her home be uncomfo m of the sense , elegance , mutual affection , and domestic comfort of the fami , and which recommended him to her affection beyond every thing else . His soci ween the parties might forward the affection of Mr . Willoughby , an equally st the most pointed assurance of her affection . Elinor could not be surprised at he natural consequence of a strong affection in a young and ardent mind . This opinion . But by an appeal to her affection for her mother , by representing t every alteration of a place which affection had established as perfect with hi e will always have one claim of my affection , which no other can possibly shar f the evening declared at once his affection and happiness . " Shall we see you ause he took leave of us with less affection than his usual behaviour has shewn ness ." " I want no proof of their affection ," said Elinor ; " but of their en onths , without telling her of his affection ;-- that they should part without ould be the natural result of your affection for her . She used to be all unres distinguished Elinor by no mark of affection . Marianne saw and listened with i th no inclination for expense , no affection for strangers , no profession , an till distinguished her by the same affection which once she had felt no doubt o al of her confidence in Edward ' s affection , to the remembrance of every mark was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if |
Additionally, we can search the second text, Jane Austen's Sense and Sensibility for the word "affection". ".concordance()" returns the first 25 matches from the total 79 in the text, while showing the word context.
What about the similarities among contexts of a certain word? We would use ".similar()" and it would show us the similar context words from our ".concordance()" output. For the word 'monstrous', in text1, some words used in the same context are : "imperial", "subtly", "impalpable", etc. In Moby Dick, Melville uses 'monstrous' quite differently when compared to 'monstrous' in Sense and Sensibility by Austen.
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | In [9]: text1.similar('monstrous') Building word-context index... imperial subtly impalpable pitiable curious abundant perilous trustworthy untoward singular lamentable few determined maddens horrible tyrannical lazy mystifying christian exasperate In [10]: text2.similar('monstrous') Building word-context index... a great good very amazingly exceedingly as so sweet heartily remarkably extremely vast In [12]: text2.common_contexts(['monstrous','very']) a_pretty is_pretty a_lucky am_glad be_glad |
Austen uses 'monstrous' with words such as 'great', 'good', 'amazingly', and 'sweet'. Therefore, Melville writes 'monstrous' with sometimes a positive or negative connotation, whereas Austen associates 'monstrous' with a positive connotation in Sense and Sensibility. You can also see the similar context words for two words with the ".common_contexts()" method. The words 'monstrous' and 'very' are belong in the same group as 'pretty', 'lucky', and 'glad', all positive adjectives.
Inaugural Address Corpus
Now we move on to the large corpus composed of Presidential inaugural addresses. Looking at a keyword such as 'freedom', ".concordance()" prints the first 25 of 183. Many of the early mentions have the structure: "freedom of ___", where blank is religion the press, etc.Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | In [14]: text4 Out[14]: <Text: Inaugural Address Corpus> In [15]: text4.concordance('freedom') Building index... Displaying 25 of 183 matches: s at the bar of the public reason ; freedom of religion ; freedom of the press blic reason ; freedom of religion ; freedom of the press , and freedom of perso ligion ; freedom of the press , and freedom of person under the protection of t e instrumental to the happiness and freedom of all . Relying , then , on the pa s of an institution so important to freedom and science are deeply to be regret be fairly and fully made , whether freedom of discussion , unaided by power , te and personal rights , and of the freedom of the press ; to observe economy i rdinary lot of humanity secured the freedom and happiness of this people . We n s inseparable from the enjoyment of freedom , but which have more than once app the abuse of power consists in the freedom , the purity , and the frequency of ation to the civil power ; that the freedom of the press and of religious opini own ; to cherish the principles of freedom and of equal rights wherever they w l Government into conflict with the freedom of elections , and the counteractio f appointment more independence and freedom from such influences might be expec e it might exert in controlling the freedom of the elective franchise . If such of the Executive in controlling the freedom of the elective franchise through t from the mother country that " the freedom of the press is the great bulwark o the people as the genuine spirit of freedom , and , like the false Christs whos by the gifts of civil and religious freedom , who watched over and prospered th , which have been the offspring of freedom , and not of power . This most admi ween church and state , and perfect freedom of opinion is guaranteed to all sec o circumscribed the limits of human freedom and planted their standard , where encouragement and potent appeal for freedom will be its own history -- its tria ur sympathies in the cause of human freedom and universal advancement . But the this Constitution the boundaries of freedom have been enlarged , the foundation In [16]: text4.similar('freedom') Building word-context index... aided practicability dissolution all obstruction protest aegis integrity assembled aggression votes paris boastfulness refunding impoverishment under pride merchant lure risk In [17]: text4.common_contexts(['freedom','terror']) of_and the_of of_that |
What do the contexts of 'freedom' have in common with each other? Again, with ".similar()" we see "aided", "practicability", "dissolution", "all", "obstruction", and many other terms. What about the contexts of 'freedom' and 'terror'? With the recent War on Terror, do any of the word choices reflect the events of the times? Not really, with words "of", "and", "the", and "that", it appears that those two words do not use similar words when spoken in an address.
Dispersion Plot
Since the Inaugural Address Corpus contains addresses from over the years, we can use a dispersion plot to uncover any temporal trends in word usage. With ".dispersion_plot()" we pass an array of terms which we want to visualize. Let's look at the terms 'citizens', 'democracy', 'freedom', 'duties', and 'America'.
Code:
1 | In [19]: text4.dispersion_plot(['citizens','democracy','freedom','duties','America']) |
The result is this plot:
Figure 3. Dispersion Plot, Text4 |
We can see that 'citizens' is best distributed, therefore, most used through the years in addresses. 'freedom' and 'America' see quick increases in usage around the 80,000 word offset. More addresses contain 'freedom' and 'America', while less addresses contain the word 'duties' in recent years when compared to earlier addresses. This could be explained by the World Wars and the emergence of the US as a subsequent world power- where fighting for 'freedom' and 'America' were relevant political keywords. We can see that some words call into favor while others, such as 'duties', fall from favor.
Okay folks, you have just read a post on natural language processing using the nltk module in Python. You learned how to search for word contexts, compare the contexts, and find similar words with different word contexts. Also, when the text is organized by time, we can use the dispersion plot to visualize word frequency and usage by time. Stay tuned for the next post in this series!
Thanks for reading,
Wayne
@beyondvalence
Text Analysis Series:
1. Natural Language Processing in Python: Part 1. Texts
2. Natural Language Processing in Python: Part 2. Counting Vocabulary
3. Natural Language Processing in Python: Part 3. Indexing Lists
4. Natural Language Processing in Python: Part 4. Frequency Distributions, Word Selections, & Collocations
.
Is it okay to post part of this on my website basically post a hyperlink to this webpage? coconut oil pulling
ReplyDeleteDeposit Menggunakan GO Pay di Agen Judi Online terpopuler di Indonesia !
ReplyDeleteMinimal Deposit 50ribu saja, Tersedia di situs www.bolavita.fun Anda dapat menikmati berbagai jenis judi online yang tersedia ...
Info selengkapnya silahkan hubungi :
WA : +62812-2222-995
BBM : BOLAVITA
Keluaran Togel Singapura Hari Ini yang terbaru