Lingocracy Learn by reading what you like (Learning Techniques, Methods & Strategies) Language Learning Forum

Lingocracy Learn by reading what you like
Tags: Reading
Share with: Delicious Digg reddit Facebook StumbleUpon
Language Learning Forum : Learning Techniques, Methods & Strategies

134 messages over 17 pages: << Previous 1 2 3 4 5 6 7 ... 16 17 Next >>

Yaan
Triglot
Groupie
France
Joined 3886 days ago
61 posts - 88 votes

Speaks: French*, English, Mandarin
Studies: Spanish, Esperanto

Message 121 of 134

21 April 2014 at 7:22pm | IP Logged

"About Chinese, the word detection works really well"
Thanks, but I'm afraid, it's not completely the case :)

Actually, at the moment we use a big big dictionary words (+ your own words) to segment the text into words. And the rule is quite simple, form the "longest word from left to right".
This means that when you have a text with 不知道, it will be split as 不知/道, even though the right segmenatation is 不/知道.
Here we asume that 不知 and 知道 are in the dictionary list.

MDBG is doing much better than us: http://goo.gl/WWjlGR

Some good Chinese segmenter exist (http://goo.gl/jWjJM5), but we are looking for something that is cross-language and dictionary based.

Have some idea about how we can improve that? Sorry about this quite technical question.
1 person has voted this message useful

Crush
Tetraglot
Senior Member
ChinaRegistered users can see my Skype Name
Joined 5677 days ago
1622 posts - 2299 votes

Speaks: English*, Spanish, Mandarin, Esperanto
Studies: Basque

Message 122 of 134

22 April 2014 at 4:54am | IP Logged

My idea would just be to give you alternatives. If you hover over it, perhaps you could press the +/- keys to cycle through other possibilities, though since it takes a little while for the definitions to load, it might be better to have a key to open up a list of possible translations that that character could be in. For example, in your example, you would have: 不知道. If you hover over 不, the automatic translation would show you 不知. However, if you press +, you would see other possible translations appear: 不 and 知道. Hover over that to see it's translation. Clicking on it would separate the word for you.

You could also make it search by the actual character the mouse is hovering over, so that "好久不见" would be put together, but if you hovered over 不, you would see a box showing just 不见 and 不 as possible alternative words. With my first idea, hovering over it would let you search for 好久, 不见, 好, 久, 不, 见, and any other possibilities depending on what came before and after it. Or maybe it would show you 好久不见 and 好久不见 It might not be the best option, but it seems to be the simplest.

I'm not sure how the MDBG one works, but one idea you might look into would be to perhaps use frequency data of some sort and give more frequent phrases a higher priority. 知道 is almost certainly much more common than 不知. Though 成语 are almost certain to be used that way (so while 好久 and 不见 might be more common than 好久不见, it's almost certain that it's being used together rather than as individual words).

As i've mentioned before, another option is just to use user input to segment texts after an initial runthrough of your algorithm. This would increase the load on the server a bit i imagine, but i'm pretty sure it would speed up text loading times and as more people went through the text it would get more and more accurate.

EDIT: And just to add, i don't really think a wiki-style system is necessary, i don't know that it's that imperative to keep track of change history, at least not for now, and a small DIFF/patch style system would probably work just fine and save a lot of space.

There are also some interesting links in the Wikipedia article on Text Segmentation, such as the Word Split overview (pdf). I don't see why a general text segmentation algorithm would change any from between Chinese and other languages as the idea is to take a text without spaces (just like in Chinese) and break it down into meaningful words.

Edited by Crush on 22 April 2014 at 5:12am
2 persons have voted this message useful

Yaan
Triglot
Groupie
France
Joined 3886 days ago
61 posts - 88 votes

Speaks: French*, English, Mandarin
Studies: Spanish, Esperanto

Message 123 of 134

25 April 2014 at 1:57pm | IP Logged

Yes, it's an good idea to show the other possible combinations, that way when the segmentation is wrong, you can still have this alternative.
As for MDBG, yes, I also think it includes some sort of algorithm based on word frequency.

Thanks for the ressources on text segmentation :)

Quote:

As i've mentioned before, another option is just to use user input to segment texts after an initial runthrough of your algorithm. This would increase the load on the server a bit i imagine, but i'm pretty sure it would speed up text loading times and as more people went through the text it would
get more and more accurate.

Yes, we also thought about that, and we are working on implementing something like this that will make the word lookup much more snappier.

1 person has voted this message useful

ihoop
Newbie
United States
Joined 4422 days ago
29 posts - 66 votes

Speaks: English*
Studies: Spanish, Mandarin

Message 124 of 134

22 May 2014 at 4:06pm | IP Logged

Still no option for traditional chinese characters?
1 person has voted this message useful

Yaan
Triglot
Groupie
France
Joined 3886 days ago
61 posts - 88 votes

Speaks: French*, English, Mandarin
Studies: Spanish, Esperanto

Message 125 of 134

29 May 2014 at 11:20am | IP Logged

Languages like Chinese or Japanese are more tricky to add. Traditional Chinese is planned but we need a little more
time :)
1 person has voted this message useful

Gustavo Russi
Tetraglot
Newbie
Brazil
Joined 3655 days ago
9 posts - 16 votes
Speaks: Portuguese*, English, French, Italian

Message 126 of 134

29 May 2014 at 10:21pm | IP Logged

Very interesting website! I'm reading some tales in German and comprehending most of it.
Thank you!
2 persons have voted this message useful

mercutio
Newbie
United Kingdom
Joined 3655 days ago
19 posts - 26 votes
Speaks: English*

Message 127 of 134

11 June 2014 at 10:45pm | IP Logged

ok here is some feedback.... I am using a chromebook and have used flewent and
lingualy, potentially your new site could be better than them. I tried getting learning
with texts to work but gave up as it seemed totally confusing to get even set up!

So heres some feedback...there needs to be some form of instructions! I just kinda
guessed how to use the site, it looks good though and was effective but I was guessing
how to use it largely, for example it said something about selecting words?!?! like
drag and click? I was just hovering over words and it was giving me word transations
but then I selected one and it underlined it in red?! why? these little things need
explaining a bit on the site. I did like the way I could read things and get tricky
words instantly translated, it was a very useful experience.

I think you definitely need lots of A1-a2 material because I have found that sort of
thing very hard to find! I was pleased to see things at my level on there.

Also it said I was "learning 3 words" but I clicked that button and nothing happened,
it wasnt clear where or what these three words were and why and how they suddenly got
labelled as words I was learning. All in all I think the idea and the site is really
well done but needs a little bit of explanation how it works etc
2 persons have voted this message useful

mercutio
Newbie
United Kingdom
Joined 3655 days ago
19 posts - 26 votes
Speaks: English*

Message 128 of 134

11 June 2014 at 11:07pm | IP Logged

ive just thought of something that no other "reading with text" kind of site has done as
far as I know or done well. I will try to explain. I am TERRIBLE at the past tense, I am
only around a1 level in Spanish so during my test of the site I saw words in past tense
that I DIDNT know but when I hovered over them I saw they were from verbs I did know. So
it would be cool is there was somehow a way to help people like me learn conjugations of
verbs in context. What I mean is for example the site would show a literal translation of
a word, like "ate" but it shows that word to directly translate as to have eaten in the
past, which is true, i.e "ate" but it doesnt link it back in anyway to the verb "to eat"
in Spanish as the verb endings change it would be great to have some association between
the direct translation and its root verb so I can start recognising conjugation patterns,
I hope I explained that well.

2 persons have voted this message useful

This discussion contains 134 messages over 17 pages: << Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Next >>

Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum

This page was generated in 0.3281 seconds.