Learning the 1000 to 2000 Most Common Words in Languages

I’ve found that working through my two Esperanto vocabulary decks in Anki has been incredibly helpful with my Esperanto. I’ve been reading Alice in Wonderland in Esperanto, and though I don’t have a dictionary and can’t understand everything, I’m at least able to understand enough to actually read it.

Next Language:
Since I’ve finished all of my Esperanto flash cards, and I think the next step for me is to just read an Esperanto book with a dictionary, I’m looking for a new language to memorize.

I was thinking about memorizing Spanish or Portuguese vocabulary, since I know the basic grammar in those languages, and conversational fluency isn’t so far away.

Another idea was to take a language that I don’t know at all, and memorize the 1,000 to 2,000 most common words in the language. Some possibilities are Hawaiian, spoken Japanese, or spoken Mandarin. The grammar in those languages isn’t extremely complex. If I don’t know the vocabulary in advance, then I can keep track of exactly how long it takes to memorize a certain number of words. I have opportunity to practice Japanese and Mandarin in California, so either would be put to practical use in conversation. :slight_smile:

The Project
While looking through the available shared decks on Anki, I couldn’t find the perfect deck in any language. I thought that maybe we could put together some decks for various languages based on word frequency.

  • the most common 1000-2000 words in Spanish (not conjugated)
  • the most common 1000-2000 words in Portuguese (not conjugated)
  • the most common 1000-2000 words in Mandarin (Pinyin or other Latin system)
  • the most common 1000-2000 words in Japanese (Romaji)
  • Etc.

Does anyone have decks like this or know an easy way to make them? My thought is that we could collaboratively create these resources in spreadsheet or CSV format and them make them available via public domain in the Mnemotechnics.org wiki.

If anyone is interested in this project, leave a comment below! :slight_smile:

3 Likes

Hi Josh,

I’m about to start something like this for Italian.

I seem to recall that there is a distinction between the most common written words in a language and the most common spoken words. Which will you be using?

A couple of intersting links along the lines we are thinking of:
http://www.towerofbabelfish.com/Tower_of_Babelfish/Learn_Italian.html

Either would be fine, but my goal is speaking… :slight_smile:

If you have an Italian vocabulary list in spreadsheet or CSV format that you would be willing to share, let me know. Maybe I’ll join you in memorizing the first 1000 Italian words.

I met him in Vienna. Very smart guy. He is releasing a book on learning languages that should be great.

I was just browsing his site and found a link to this word frequency list:
http://jbauman.com/aboutgsl.html

I just remembered that I used Google Translate to compare 1000 common words in multiple languages:

I just put those other 2000+ words into Google Translate to make this spreadsheet that can be edited and exported into Anki or another flash card system:

It’s automatic translation, so there may be mistakes, but it could be a starting point… :slight_smile:

1 Like

That’s fantastic, Josh. Thanks for sharing. (I assume that you were able to automate the process somehow rather than typing in all those entries and translating one word at a time?!).

I must admit that I am a little daunted by the volume of information - the biggest chunk of data I’ve taken on before was the capitals of all the countries in the world. Do you work through from most frequently used to least, alphabetically, or just by whatever you fancy doing on a particular day?

I also notice that you used a ‘loose’ version of the memory town when studying Esperanto. I can certainly see the appeal with some words but where do you store words like ‘the, be, of, and’ etc etc ? Also, in your Anki decks have you gone for English/Esperanto or Picture/Esperanto (on the basis that the latter helps you to think in the target language rather than actually thinking in English and then translating)?

The translations were done by Google Translate using this function that I learned from Dale:

=GoogleTranslate(A2,"en","es")

That translates cell A2 from English (en) to Spanish (es).

The two Esperanto decks I’m using are “Esperanto 101” and “626 common Esperanto words”. You can find them in the shared decks. If you give Esperanto a try for a few weeks, it might even help you learn Italian more quickly. :slight_smile:

I set Anki to load new cards in a random order. An ideal deck would use tags to separate the parts of grammar – for example, to mark all the “feminine nouns” or “-ar verbs” with tags. Then all the parts of speech could be put into different sections of a memory town.

Since I studied some Esperanto before I started with these cards, I already knew the (la), be (esti), of (de, da), and and (kaj). I think that simple words can just be memorized by repetition. For a language like German, one might need mnemonics for words like “the”.

Esperanto’s grammar is so simple that I didn’t feel like I needed to separate the parts of speech as much. There are just a few groups of words that don’t have cognates with English, German, or Romance languages. One example group includes words like preskaux, baldaux, anstataux, ankoraux – those all got placed in one section of my memory town.

In Esperanto any root word can change its meaning with prefixes and suffixes. So a word like zorgi (to care for) might also be an adverb (zorge – carefully), an adjective (zorga – careful), and a noun (zorgo – a care). All you have to do is memorize “zorg-” and you know all the others as well as the opposites like malzorge, malzorga, etc. Creating a more precise memory palace didn’t seem to help much.

I really like Gabriel Wyner’s suggestion to use pictures and not words on the flash cards, but I don’t have a lot of free time, so I just used someone else’s Esperanto decks, and there were only words on them. :slight_smile:

I go through the Esperanto cards daily as Anki suggests them to me, and most words are instinctual now. I know the answer before I have time to think about mnemonic images or translations. When I’m reading, I do find myself recalling some of the images in order to verify the meaning.

Wow, I need to get back up to speed with my linking technique. This is going to be slow progress coming up with the initial links - hopefully the pay off will come in being able to more easily remember the vocabulary.

If you guys don’t mind (I know its better if I come up with my own links) how would you tackle something like?:

English : except
Italian : ad eccezione di

To Richie:
ad eccezione di

ad- like advertisement
ec- like economy
cezi- cesar (Julius)
one, di - like numbers 1, 2

so some advertisement in economy where is Julius Cesar (Cezi) with counting down 1,2 except number 3

Can be?

I’d just memorize eccezione, which might be “a Chet Atkins eating a calzone”.

Aren’t ad" and “di” prepositions? Can’t you also phrase it like “ad eccezione del…” or “ad eccezione degli…”?

You could put all the words that take that preposition in one location in your memory town. The “ad” (a) could be memorized with brute force or repetition, since it’s a simple, common word.

I am.
I actually do this. I use Memrise to find interesting lists of words, which I download as spreadsheets.
I upload those spreadsheets, sometimes merging them together into one, into Anki, then use AnkiDroid on my phone while on the train, metro, etc.

So this is an excellent project and I can help you with lists etc.

Doh! Of course, the words I used there mean ‘with the exception of’ - I’ll lose the clutter and focus on eccezione for now. Thanks for the suggestions.

I have downloaded on my second generation kindle the following: A frequency Dictionary of Spanish by Mark Davies.
IT’s a kindle book and shows the 5000 most common words taken from both speech and written materials. It has the ranking of the words according to frequency numbered. If I did not already know most of the words I would have considered associating the words with the pegs for the numbers. Don’t know how to get it to a usable file format however and it was not free, though as I recall it was inexpensive. But one can always pay for it and download it from kindle. The verbs are also listed together in the main corpus and separately as well. Real conversation in a language really requires quick and natural verb conjugation access in ones head through practice. The verbs given of course are only in infinitive form. I was actually hoping there might exist such a frequency distribution for 30k Spanish words which is about one needs to converse at college level spanish. But 5K is the best I could find.

Great… what is the best method? Should I make a Google Spreadsheet for each language and then add some editors? What languages should we do?

I think these would be useful for starters:

  • Spanish
  • French
  • Portuguese
  • German
  • Modern Greek
  • Japanese (Romaji and Kana if possible – that way people can memorize the words without necessarily having to learn Kanji)
  • Chinese (Pinyin as well as characters)
  • Arabic

Add others if you have suggestions…

Does anyone here know a way to scrape text out of a Kindle book? :slight_smile:

I have a kanji system that I have developed. Here is the anki file(the deck is nowhere near finished) the finished ones have a red and blue color coding. http://www.fileden.com/files/2012/8/25/3340838/Kanji%20and%20Algebra%20Acrostics%20book.zip
The system uses kanjiabc etymology Here is a pic.
kanji_small.png

Hi Drsleep,

30k sounds very challenging! Do you have some good Internet sources for a stat like that - or similar?

Thanks

Gavino

Download the Kindle app for your PC, open the book, copy/paste?

The Kindle itself also gives you an option to highlight things, I believe it stores all these highlights in a folder, and you can then plug the Kindle into the PC and copy the folder over.

https://ankiweb.net/shared/decks/spanish
I think I saw a deck in there that was based off of frequency, but i’m not sure.

I found this article and the study it includes rather interesting:

Gavino

1 Like

Hi all. Does anyone know the good site to start learning Germany language, which have common words? Thanks a lot for your suggestion.

http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

3 Likes

Gracias, Carboneum. I don’t care where they come from 10K words is great!