
Note to myself: here the system has cut in 用法 文 instead of 用 法文

Just out of curiosity, will it be possible to manually fix this kind of misinterpretation in the future?

yup, it will be, I already know how to code it and so, just a question of time now:)
Actually here the problem is that I choose a quick and dirty way to split Chinese sentences into words, the software read from left to right and try to find the longest string it knows, and then continue etc. etc.
When I will have some free time I will replace that by something smarter, based on statistics, so that it will know that 用法 + 文 is far less probable than 用 + 法文
and eventually one day (we're working on that) have something even smarter based on sentence pattern and grammatical class of words. (and still stat)
Anyway tatoeba here is already great "real world" test for this kind of software :)

Wow, it would be wonderful to see how smart a system the romanization software will develop into. Once again thanks for all your work :-) !
Tags
View all tagsSentence text
License: CC BY 2.0 FRLogs
This sentence was initially added as a translation of sentence #971754
added by nickyeow, September 6, 2011
linked by nickyeow, September 6, 2011
linked by sysko, September 6, 2011
linked by Yorwba, July 10, 2021