PINGDINGSHAN, China (Reuters) - In a village in central China’s Henan province, amid barking dogs and wandering chickens, villagers gather along a dirt road to trade images of their faces for kettles, pots and tea cups.
At the front of the line, a woman stands in front of a camera zip-tied to a tripod. She holds a photograph of her head with the eyes and the nose cut out in front of her face and slowly rotates side to side.
Villagers waiting their turn take a numbered ticket. Some of them say it’s the third or fourth time they’ve come to do this sort of work.
The project, run out of a sleepy courtyard village house adorned with posters of former China leader Mao Zedong, is collecting material that could train AI software to distinguish between real facial features and still images.
“The largest projects have tens of thousands of people, all of whom live in this area.” said Liu Yangfeng, CEO at Qianji Data Co Ltd, which collects and labels data for several of China’s largest tech firms and is based in the nearby city of Pingdingshan.
“We are creating more data sets to serve more AI algorithm companies, so they can serve the development of artificial intelligence in China,” said Liu, declining to disclose his clients.
The boom in demand for data to train AI algorithms is feeding a new global industry that gathers information such as photos and videos, which are then labelled to tell the machines what they are seeing.
Companies involved in data labelling or data annotation as it is also called include crowdsourcing platforms such as Amazon.com’s Mechanical Turk which offer users small amounts of money in return for simple tasks, outsourcing firms such as India’s Wipro Ltd as well as professional labellers like Qianji.
Cognilytica, a U.S. research firm specialising in AI, estimates the global market for machine-learning related data annotation grew 66% to $500 million in 2018 and is set to more than double by 2023. Some industry insiders say, however, that much of the work done is not disclosed, making accurate estimates difficult.
China has emerged as a key hub for data collection and labelling thanks to insatiable demand from a burgeoning artificial intelligence sector backed by the ruling Communist Party, which sees AI as an engine of economic growth and a tool for social control.
A plethora of firms have invested heavily in an area of AI known as machine learning, which is at the core of facial recognition technology and other systems based on finding patterns in data.
These include tech giants Alibaba Group Holding Ltd, Tencent Holding Ltd, Baidu Inc as well as younger companies such as AI specialist SenseTime Group Ltd and speech recognition firm Iflytek Co Ltd.
The result has been a proliferation of AI products and services in China, from facial recognition-based payment systems to automated surveillance and even AI-animated state media news anchors. Chinese consumers mostly see these technologies as novel and futuristic, despite concerns raised by some over more invasive applications.
Weak data privacy laws and cheap labour have also been a competitive advantage for China as it races to become a global leader in AI. The Henan villagers were happy to trade several sessions in front of a camera for a tea cup, or several hours for a stove-top pot.
Beijing-based BasicFinder, a leading data labelling firm with locations across Hebei, Shandong and Shanxi provinces, boasts a robust mix of domestic and overseas clients.
At a recent visit to its Beijing offices, some staff were labelling images of sleepy people that will be used by an autonomous driving project to identify drivers who might be falling asleep at the wheel.
Others were labelling British documents from the 1800s for a Western online ancestry service, marking fields for dates, names and genders on birth and death certificates.
According to BasicFinder Chief Executive Du Lin, hiring trained labellers in China is cheaper than using Western crowdsourcing marketplaces.
A Princeton University project related to autonomous driving initially put a task on Amazon’s Mechanical Turk but as the task became more complicated, people began making mistakes and BasicFinder was brought in to help correct the results, said Du.
In that project, one trained BasicFinder labeller was able to do the work of three crowdsourced labellers, he added.
“Gradually they saw they were paying less for labelling from us, so they hired us to label all the works from the very beginning,” said Du.
Princeton declined to comment.
For labelling employees, the reasons for joining China’s data industry are straightforward. The work, though sometimes tedious, is an upgrade on other jobs available to young workers who want to return home to small Chinese cities and villages.
Labellers at Qianji make roughly 100 yuan ($14.50) a day marking data points on photographs of people, surveillance footage and street images.
The work is usually simple, according to the employees, though some overseas content poses a challenge.
“One time we thought we were classifying Europe-style cooker machines that have a washer attached,” said Jia Yahui, a labeller at Qianji. “Later we were told it’s actually two separate things, a stove and a dishwasher.”
The labelling work brings some of the employment benefits of the tech sector to rural areas, but those benefits may prove short-lived if AI improves enough to perform many of the tasks labellers do.
“We think this industry will still exist in three to five years. It may not be a long-term career - we can only think of the five-year plan for now,” said Qianji CEO Liu.
Reporting by Cate Cadell; Editing by Jonathan Weber and Edwina Gibbs