1317 messages over 165 pages: << Previous 1 2 3 4 5 6 7 ... 143 ... 164 165 Next >>
PeterMollenburg Senior Member AustraliaRegistered users can see my Skype Name Joined 5485 days ago 821 posts - 1273 votes Speaks: English* Studies: FrenchB1
| Message 1137 of 1317 24 September 2014 at 2:31am | IP Logged |
Thanks for posting that vocab test here emk. I found it useful myself. Great restults
for
you btw!
Edit:
My results of the receptive French test:
Les 1000 mots les plus fréquents du français: 29/30
Les 2000 mots les plus fréquents du français: 25/30
Les 3000 mots les plus fréquents du français: 27/30
Les 4000 mots les plus fréquents du français: 25/30
Les 5000 mots les plus fréquents du français: 22/30
On the whole a pretty ordinary result. I did take too long on the first level and was
running out of time more and more towards the end of the test. I must admit that in
the 5000 range I was a little lost at times and guessed a bit, so 22/30 is probably
more than I deserved.
Edited by PeterMollenburg on 25 September 2014 at 12:37am
1 person has voted this message useful
| Mork the Fiddle Senior Member United States Joined 3978 days ago 86 posts - 159 votes Speaks: English* Studies: Norwegian, Latin, Ancient Greek
| Message 1138 of 1317 24 September 2014 at 11:35pm | IP Logged |
Thanks to emk for posting the reference to this very interesting and informative test for French vocabulary. I took the test this afternoon, and my results were as follows:
Level 1 - 1000 words - 29/30
Level 2 - 2000 words - 24/30
Level 3 - 3000 words - 27/30
Level 4 - 4000 words - 28/30
Level 5 - 5000 words - 24/30
So I "passed" levels 1, 3 and 4 (passing means getting at least 27 correct out of 30). Unsurprisingly, I "failed" level 5, but oddly I "failed" level 2.
Reading without (lexical) difficulty modern French novels such as L'Etranger, La condition humaine and even Proust's A la recherche is easy enough for me, but the only novel I have any numbers for is Octave Mirbeau's Le journal d'une femme de chambre.
Based on a copy from Gutenberg.org, the novel has 109,611 words (660,000 characters), counted by LibreOffice. According to MyClippings from Kindle, on which I read the novel, the Kindle dictionary was consulted 240 times for Le journal, so 240 lookups work out to 0.22% of all the words.
II should say that I did not look up ALL the words I did not know, just those that tripped up my reading. How many extra words would that be? Really, I have no way of estimating, but probably no more than another 240. So make my "unknowns rate" = 0.44%. Certainly less than 1.0 %, and good enough for me.
I have no idea how "typical" or "difficult" in vocabulary Le journal is, so I am not going to extrapolate anything of significance from the numbers for this one novel. It does feel to me the other French fiction that I read is in line with this result, and I do feel a high comfort level reading French fiction. When I was getting up to speed reading French a couple years ago, French seemed like a foreign language to me, but it no longer does. I don't read French nearly as fast as I do English, but currently I am tackling Murakami's 1Q84 without feeling any need to look up anything in the English translation.
* Edited to correct the spelling of 'chambre.'
Edited by Mork the Fiddle on 24 September 2014 at 11:44pm
1 person has voted this message useful
| Sizen Diglot Senior Member Canada Joined 4348 days ago 165 posts - 347 votes Speaks: English*, French Studies: Catalan, Spanish, Japanese, Ukrainian, German
| Message 1139 of 1317 25 September 2014 at 4:55am | IP Logged |
I definitely feel like something went wrong:
Les 1000 mots les plus fréquents du français: 29/30
Les 2000 mots les plus fréquents du français: 27/30
Les 3000 mots les plus fréquents du français: 30/30
Les 4000 mots les plus fréquents du français: 30/30
Les 5000 mots les plus fréquents du français: 30/30
Something just feels... wrong... about these results. I can't help but notice that emk
and I got the same results for the first 2 levels and that most everybody got 29 on
the first level. I wonder if there are just some questions that are deceptively
tricky.
On the other hand, the active test is just plain hard.
Les 1000 mots les plus fréquents du français: 16/18
Les 2000 mots les plus fréquents du français: 18/18
Les 3000 mots les plus fréquents du français: 16/18
Les 4000 mots les plus fréquents du français: 15/18
Les 5000 mots les plus fréquents du français: 11/18
A few French words regarding administration and business were in there, which I'm
not too worried about, but the rest I honestly just didn't know or couldn't think of.
I kind of feel like doing the English test now to see if some of the questions are
just tricky or if I really just don't know my basic vocabulary! Maybe tomorrow though.
Edited by Sizen on 25 September 2014 at 5:21am
2 persons have voted this message useful
| Elenia Diglot Senior Member United Kingdom lilyonlife.blog Joined 3865 days ago 239 posts - 327 votes Speaks: English*, French Studies: German, Swedish, Esperanto
| Message 1140 of 1317 25 September 2014 at 1:48pm | IP Logged |
Sizen wrote:
Something just feels... wrong... about these results. I can't help but notice that emk
and I got the same results for the first 2 levels and that most everybody got 29 on
the first level. I wonder if there are just some questions that are deceptively
tricky. |
|
|
Now I feel worse about only getting 15 for the first level. Strangely enough, that was
the level I thought I did best in. For levels 3 and 5, I got 27/30, and for level four I
got 28/30. These were my best marks, although I found these levels the most difficult and
definitely made a few educated guesses.
1 person has voted this message useful
| akkadboy Triglot Senior Member France Joined 5417 days ago 264 posts - 497 votes Speaks: French*, English, Yiddish Studies: Latin, Ancient Egyptian, Welsh
| Message 1141 of 1317 25 September 2014 at 5:58pm | IP Logged |
I took the French active test and score 16/18 thrice, 17/18 and 18/18 once, not so good for a (fairly educated) native speaker...
But some sentences looks like they haven't been written by a native speaker :
Quote:
- Ne disant pas la vérité, c'est une forme de tro.... une personne.
- On lui a choisi comme la ve.... du film.
- (...) pour qu'il leurs aide (...) |
|
|
Others sound very weird to me as a French speaker from France (the presence of "teenager" may point to Quebec) :
Quote:
- notre fils est un teenager, hier il est devenu qua... ans
- (...) recueillis par les associations d'animaux
- Après avoir fait du sport, il est toujours baigné en sueur |
|
|
As for this one, it's tricky, "barreau" may be among the 3000 most common words but "membre du barreau" is a pretty technical term in which the word "barreau" doesn't have the same meaning as the stand alone word "barreau".
Quote:
- (... )membre du barreau |
|
|
2 persons have voted this message useful
| Arnaud25 Diglot Senior Member France Joined 3851 days ago 129 posts - 235 votes Speaks: French*, English Studies: Russian
| Message 1142 of 1317 25 September 2014 at 8:24pm | IP Logged |
If you want to have 100% in the 1000 most frequent russian words, select the first answer to the first question, the second answer to the second question and the third answer to the third question: 100%
A very scientific test, indeed...
1 person has voted this message useful
| geoffw Triglot Senior Member United States Joined 4697 days ago 1134 posts - 1865 votes Speaks: English*, German, Yiddish Studies: Modern Hebrew, French, Dutch, Italian, Russian
| Message 1143 of 1317 28 September 2014 at 3:36pm | IP Logged |
One more data point of getting 29/30 on the first section and having my worst section be the second one. As my
score for section 2 was 21, I was unsurprised at the following message:
"You know the 1000 most frequent French words, but still have some work to do on the next level (1001-2000)."
I WAS rather surprised, however, when after taking the German test and getting 29 or 30 on each of the five
sections I got the following message:
"You know the 1000 most frequent German words, but still have some work to do on the next level (1001-2000)."
This was especially odd, given that I got 29 on the first section, but a perfect 30 on the second. Oh you quirky
internet tests...
EDIT: after careful parsing, I think that language is meant to be exemplary, and not for my own result. However,
since there is no other language given interpreting my result, it looks very much like it's meant to be an
assessment of my result.
Edited by geoffw on 28 September 2014 at 3:45pm
1 person has voted this message useful
|
emk Diglot Moderator United States Joined 5541 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 1144 of 1317 29 September 2014 at 4:01pm | IP Logged |
A good performance all around on the vocabulary test! It looks like almost everybody had a little trouble with the 2000 band. And I agree that the questions required deep knowledge of the words, and the scoring system was pretty unforgiving.
French Natural Language Processing
For some reason, online language learning communities tend to include a large number of programmers. And of course, we occasionally feel the urge to build our own tools. And some of these tools would work better if they had detailed knowledge of how the language worked.
If you want to do natural language processing, it turns out that there are some pretty awesome resources out there. But you need to know where to look for them. Let me provide a list of some things I've found.
First, some general-purpose collections of data:
- Lexique (a nice, basic lexicon of French with frequency data, under a Creative Commons non-commercial license).
- Lefff lexicon (a massive, LGPLed dictionary of French inflectional forms).
- Opus (multilingual parallel corpus, mostly open source, includes UN data and Open Subtitles).
Some unprocessed raw materials, and conversion tools:
- Wiktionnaire (online, human-readable dictionary, also downloadable).
- Gutenberg project French books (public domain books in French, also downloadable).
- Calibre (converts DRM-free ebooks to high-quality text).
Next, an actual French parsing toolchain:
- French Treebank, enhanced (collection of structurally-annotated sentences in French, not actually required to use tools below).
- Description of the Treebank tagging system, including parts of speech.
- MElt (part-of-speech tagger using above conventions, with included tokenizer).
- melt2connlx (converts MElt output into maltparser input).
- maltparser (natural language parser with a trained French model).
Using MElt and melt2connlx, I can produce a file which looks like:
When I feed this through maltparser, it produces pretty interesting output:
Column 2 is the surface form that appears in the text. Column 3 is the underlying lemma. Column 4 is the "course-grained" part of speech, and column 5 is "fine-grained" part of speech. Column 7 points towards the "parent node", and column 8 gives the nature of that relationship.
For example, on line 3, the verb avoir is the root of the sentence. Line 2 has a relation of "suj" to line 3, which indicates that it is the subject. Line 1 has a relation of "det" to line 2, showing it's the determinative. And so on.
The parser is decent—it gets confused on a semi-regular basis, but if it were a human, I'd say it's a strong B1 with a huge vocabulary and weak common sense about the real world. Natural language parsing has gotten surprisingly good in the last 15 years. Next thing you know, the computers will be signing up for HTLAL and asking us language learning questions. :-)
5 persons have voted this message useful
|
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.4844 seconds.
DHTML Menu By Milonic JavaScript
|