python Programming Glossary: corpus
Python random.sample with a generator http://stackoverflow.com/questions/12581437/python-random-sample-with-a-generator I am trying to get a random sample from a very large text corpus. The problem is that random.sample raises the following error...
Python Multiprocessing storing data until further call in each process http://stackoverflow.com/questions/14437944/python-multiprocessing-storing-data-until-further-call-in-each-process code sample tfidf_vect ftext.TfidfVectorizer N 100000 corpus 'This is the first document.' 'This is the second second document.'.. 'Before fit_transform' X tfidf_vect.fit_transform corpus model lm.LogisticRegression model.fit X y report_memory 'After..
How to calculate cosine similarity given 2 sentence strings? - Python http://stackoverflow.com/questions/15173225/how-to-calculate-cosine-similarity-given-2-sentence-strings-python in order to use tf idf you need to have a reasonably large corpus from which to estimate tfidf weights. You can also develop it..
How to get the wordnet sense frequency of a synset in NLTK? http://stackoverflow.com/questions/15551195/how-to-get-the-wordnet-sense-frequency-of-a-synset-in-nltk According to the documentation i can load a sense tagged corpus in nltk as such from nltk.corpus import wordnet_ic brown_ic.. i can load a sense tagged corpus in nltk as such from nltk.corpus import wordnet_ic brown_ic wordnet_ic.ic 'ic brown.dat' semcor_ic.. But how can get the frequency of a synset from a corpus To break down the question first how to count many times did..
POS tagging in German http://stackoverflow.com/questions/1639855/pos-tagging-in-german they provide. You'll need to tell nltk about some German corpus to help it tokenize German correctly. I believe the EUROPARL.. help it tokenize German correctly. I believe the EUROPARL corpus might help get you going. See nltk.corpus.europarl.german this.. the EUROPARL corpus might help get you going. See nltk.corpus.europarl.german this is what you're looking for. Also consider..
Feedparser - retrieve old messages from Google Reader http://stackoverflow.com/questions/1676223/feedparser-retrieve-old-messages-from-google-reader my intent is to do Natural Language Processing over this corpus and would like to be able to retrieve many past entries from..
How is it that json serialization is so much faster than yaml serialization in python? http://stackoverflow.com/questions/2451732/how-is-it-that-json-serialization-is-so-much-faster-than-yaml-serialization-in-p orders of magnitude without some profiling data and a big corpus of examples. In any case be sure to test over a large body of..
Iterating through String word at a time in Python http://stackoverflow.com/questions/2768628/iterating-through-string-word-at-a-time-in-python I tried using re module matches. But As i have a huge text corpus that i have to search through. This is taking large amount of..
How do I count words in an nltk plaintextcorpus faster? http://stackoverflow.com/questions/3902044/how-do-i-count-words-in-an-nltk-plaintextcorpus-faster do I count words in an nltk plaintextcorpus faster I have a set of documents and I want to return a list.. this project done faster def searchText searchword counts corpus_root 'some_dir' wordlists PlaintextCorpusReader corpus_root.. corpus_root 'some_dir' wordlists PlaintextCorpusReader corpus_root '. ' for id in wordlists.fileids date id 4 12 month date..
Creating a new corpus with NLTK http://stackoverflow.com/questions/4951751/creating-a-new-corpus-with-nltk a new corpus with NLTK I reckoned that often the answer to my title is to.. a bunch of .txt files and i want to be able to use the corpus functions that NLTK provides for the corpus nltk_data. i've.. to use the corpus functions that NLTK provides for the corpus nltk_data. i've tried PlaintextCorpusReader but i couldn't get..
custom tagging with nltk http://stackoverflow.com/questions/5919355/custom-tagging-with-nltk
How do i find the frequency count of a word in English using WordNet? http://stackoverflow.com/questions/5928704/how-do-i-find-the-frequency-count-of-a-word-in-english-using-wordnet corpora wordnet cntlist.rev . Code example from nltk.corpus import wordnet syns wordnet.synsets 'stack' for s in syns for.. in the source file or in the documentation which corpus was used to create this data. So it's probably best to choose.. to create this data. So it's probably best to choose the corpus that fits best to the your application and create the data yourself..
Fast n-gram calculation http://stackoverflow.com/questions/7591258/fast-n-gram-calculation calculation I'm using NLTK to search for n grams in a corpus but it's taking a very long time in some cases. I've noticed.. there's a potentially faster way of finding n grams in my corpus if I abandon NLTK If so what can I use to speed things up python..
|