These are live blog notes from the Unicode 2013 conference in Santa Clara, Day 2.
Usual disclaimer for live blogging: These are informal notes taken by me, Dave Crossland, at the event, and may or may not be similar to what was said by the people who spoke on these topics. Probably if something here is incorrect it is because I mistyped it or misunderstood, and if anyone wants corrections, they should email me (firstname.lastname@example.org) – or post a comment. Thanks!
In fact I didn’t even read this document once before posting it, so its probably FULL of errors. What do you want for free? 🙂
(slides are using merriweather)
Wikipedia scale! It started a a microcosm andhas become a macrocosm.There is a lot of activity from users around theworld. 30m articles, 4.3 in enlgish. 287 languages, and more in incubators. 796 production websites,567 incubator.
500M unique monthly visitors. 24b monthly page views. 3.7bn mobile pageviews, that is growing very fast.
30m articles, its the world of content. how much is english? not even 15%. 1m-100k articles in 37 languages; arabic, european ones, cyrillic ones. its a big drop,but the level of activity is different. 99k – 10k, 73 languages. these are vibrant communities, we dont push a language to production unless there is an active global community contributing regularly. these have crossed the bar in terms of maintaining regular contributions and adding content to the web and wikipedia. 10k-1k articles, 102 languages. these are new language communities coming online.
its the power of the web, net, and broadband, fuelled by multilingual support in software, to make this happen.
the engagment by billions of users globally who access the web on a desktop and afford a smart phone or tablet. thats the activity here.
who are our users?
Early adopters: large langauges, europeans.
next genration: those underrepresented on the web and with mobile access, and where the language tech isn’t good. what langauges have 100Ms of users and poor tech? 1.5Bn people in india, 1.5Bn+ in CJK. Arabic, other RTL languages from India to Middle East? Those are the next generation. That’s where the growth of the web is happening.
We see the long tail languages. Native American languages have tiny but passionate communities. Latin America, Venezuala, Brasil, Mexico, there are indigenous langauges that ‘didnt matter’ but now emerge online as they can come together freely and addresss topics they are interested in and share and learn.
wikipedia is seeing the next generation of web users come online. 1Bn is a small number in that.
What are the Factors igniting this growth?
The euro languages were early adoptors. What is missing in the space beyond that? Look at a content world, wikipedia, twitter, facebook, quora – in these contnet communities, do we have growth igniters?
1st class user experience
broadband net connection
abundance of devices to access net – in japan its common to have 3+ devices, this room averages 2…
seamless language support – we take it for granted in English.
high quality fonts to read with – you can’t imagine how frustrating it is to have poor quality fonts to read with. we take it for granted.
input keyboards to write with – you dont think twice about text entry.
search – this drives the web today. all content in engish, we assume that bing or google or yahoo will find things we want.
Where are we headed?
Wikipedia is becoming a teenager. It could become a pleasant person, or a pest.
The level of engagement by users around the world, we could see a content commons for the web emerging, that makes online education uniquitous, and enabled all people to generate high quality content that is rich, and creating a cycle of brining people online, often with mobile computing (Android, iOS, others)
COmmoditization of langauge software is something I want to see – I wanted to see libre and open langauge software. we must facilitate every user of the web and mobile platofrms. there is no room for proprietary software in this space. it must be as seamless as you buying a smart phone for everyone.
We must keep the web open and free. Or we can not see content grow in new languages equally on the web.
It is a world of transitions, things are changing.
Keeping things libre and open licensed. The licenses must allow people to use their own languages on the web.
Supporting 287 languages, we must have high quality langauges assets for web and mobile. ADobe did a good job for the desktop, but we need web fonts and input tools. These are SERIOUSLY LACKING today. The wikipedia langauges under 100k articles strogly correlates with language support out of the box in the most popualr OS.
We lack libre language tools: spellchekers, suggestion engines, content development tools, machine translation tools (terrible), multilingual search (its poor). Overall we have a broken multilingual UX.
The MAIN wikiepdia user base’s experience is TERRIBLY broken. we are so dismayed, syaing for the last 15 years of computing that we can’t take language support for granted. wikiedpai has to go find things to make itself possible, and find libre software.
unifying the language selector. wikiedpai addded a universal lanague selector, so a german reading japanese can happen.
smart handling of scripts. how to type and search my own script?
we have called out for collaborators across the world for libre fonts. we have released libre fonts, you can download and use them for any time. there are 63 languages with 83 styles.
we have 139 input methods for 64 languages contributed by our users. we want onscreen key maps for every mobile device. we need these libre licensed, so any user can easily type any language. i am happy to work with anybody on that 🙂
jsvascript i18n support need to be improved for grammar, plurals, gender. we do this for 287 languages today, but we can do more.
traslation tools, side by side proofreading for software UI and message localiation.
we want transltion tools and language aids.
Google Noto Fonts, Web Fonts
Red Hat Indic Fonts and IME support – the first libre fonts project, since MS and Adobe didnt want to do libre fonts at that time
Adobe and Google’s libre font rendering initiative is great, thanks for that collaobration, its an important project and I hope wikipedia can help with it
I applaud the open standards efforts for non latin langauge support. w3c, unciode consortium, these really matter in this space. The w3c japanese layout specification led by richard isida is a great exmaple. wikipedia and red hat are working on indic layout specs. there is no information for a font designer to build a font for another language from the OT spec alone. we need language specifications that any technical developer or font designer can use to start making lanugages. please contact me about that.
we are working now on better onscreen keymaps, and will welcome collaboration with that.
The multilingual UX must be first class.
www.github.com/wikimedia – collaborate and contribute here.
I understand the need to monetize, I come from industry, but libre software is the way of the web, you need to be with this. if we dont have fundemental building blocks to seed the market, who will use the fonts you are trying to sell? they will pirate them anyway. who cares?
we need to create open dictioanties and glossaries for people to use. we need to make machine transation work for all language pairs.
The BiDi work that unicode consortium has taken is great!
we need to be able to share our knoweldge and contribute togehter to seed the platform that we all see will change the world more and more.
it is important because the energy of the few 100 people working to make multilingual support better will help so many other people come online and contribute.
we need touch input and other features on devices that we take for granted to work worldwife.
we are doing a collaboration summit with redhat, and googel and mozilla, KDE and GNOME, all come to our lngauge summit. every 2 years, we do it at Red Hat at Pune, India. We do one in the valley too.
We do cross platform language assets. fixing bugs, triaging issues,
IMAGINE A WORLD IN WHICH EVERY SINGLE HUMAN BEING CAN FREELY SHARE IN THE SUM OF ALL KNOWLEDGE.
A: We raise $40m annually in $10-20 payments from the global public. we are a distributed community supported platform. we never felt the problem was money. it was mindshare, getting the best language engineers to work on our problems.
Q: biggest obscatcle for very small langauge communites? those with 2k articles? these language communties with 100s of mative speakers. we are the main obstacles for them?
A: One language, there are 10 people who are superusers. they make it for their kids to learn from. If there is interest from a community, that is driven and passionate, they can make more content than a much large community that is complacent. the population effects going to 1M articles, but to 10k its something a small group can do. but lack of language assets is a technical issue. it is a real issue.
Q: You mentioned what wikimedia is doing. wiki data effort, to make strucutred information, that can help machine translation.