Note to myself: here the system has cut in 用法 文 instead of 用 法文
Just out of curiosity, will it be possible to manually fix this kind of misinterpretation in the future?
yup, it will be, I already know how to code it and so, just a question of time now:)
Actually here the problem is that I choose a quick and dirty way to split Chinese sentences into words, the software read from left to right and try to find the longest string it knows, and then continue etc. etc.
When I will have some free time I will replace that by something smarter, based on statistics, so that it will know that 用法 + 文 is far less probable than 用 + 法文
and eventually one day (we're working on that) have something even smarter based on sentence pattern and grammatical class of words. (and still stat)
Anyway tatoeba here is already great "real world" test for this kind of software :)
Wow, it would be wonderful to see how smart a system the romanization software will develop into. Once again thanks for all your work :-) !
Etîketî
Heme etîketan bivîneLîsteyî
Sentence text
License: CC BY 2.0 FRDekewtişî
Na cumle verî sey açarnayîşê cumleya #971754
hetê nickyeowra ame îlawekerdene, September 6, 2011
hetê nickyeow ra ame girêdayene, September 6, 2011
hetê sysko ra ame girêdayene, September 6, 2011
hetê Yorwba ra ame girêdayene, July 10, 2021