Stopwords in Multiple/Other Languages

Life is slowly returning to order once again. I am attempting to slog through almost 1,000 messages in my MySQL folder, most of which are list questions that have already been answered, so it does not take long to get through them. However, occasionally I find a question that has not been answered, or a gem of a question that I want to expose to a wider audience.

This question fell under both categories. Basically, someone wanted stopwords in other languages, and wondered if there was a place to get them. (English stopwords can be found at http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html.)

I did a quick web search and found a site that has a bunch of language stopwords:
http://www.ranks.nl/stopwords/

Catalan stopwords
Czech stopwords
Danish stopwords
Dutch stopwords
French stopwords
English stopwords (default)
German stopwords
Hungarian stopwords
Italian stopwords
Norwegian stopwords
Polish stopwords
Portugese stopwords
Spanish stopwords
Turkish stopwords

I hope this helps some folks…..

And now, a tricker question — if there are folks doing fulltext matching in other languages, what is your list of stopwords or where did you get it from? (I am very sure there are tons of sites in the native langauge that lists stopwords, but for admins that do not speak every language their application supports, they can be hard to find!)

What about folks doing searches within a field that may contain multiple languages? Have you created a file to include the stopwords all languages that your application supports? If you have not, should you?

(The documentation on how to change the stopword file parameter is at http://dev.mysql.com/doc/refman/5.0/en/fulltext-fine-tuning.html)

Life is slowly returning to order once again. I am attempting to slog through almost 1,000 messages in my MySQL folder, most of which are list questions that have already been answered, so it does not take long to get through them. However, occasionally I find a question that has not been answered, or a gem of a question that I want to expose to a wider audience.

This question fell under both categories. Basically, someone wanted stopwords in other languages, and wondered if there was a place to get them. (English stopwords can be found at http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html.)

I did a quick web search and found a site that has a bunch of language stopwords:
http://www.ranks.nl/stopwords/

Catalan stopwords
Czech stopwords
Danish stopwords
Dutch stopwords
French stopwords
English stopwords (default)
German stopwords
Hungarian stopwords
Italian stopwords
Norwegian stopwords
Polish stopwords
Portugese stopwords
Spanish stopwords
Turkish stopwords

I hope this helps some folks…..

And now, a tricker question — if there are folks doing fulltext matching in other languages, what is your list of stopwords or where did you get it from? (I am very sure there are tons of sites in the native langauge that lists stopwords, but for admins that do not speak every language their application supports, they can be hard to find!)

What about folks doing searches within a field that may contain multiple languages? Have you created a file to include the stopwords all languages that your application supports? If you have not, should you?

(The documentation on how to change the stopword file parameter is at http://dev.mysql.com/doc/refman/5.0/en/fulltext-fine-tuning.html)

Comments are closed.