An investigation of unpaid crowdsourcing

Borromeo, Ria Mae; Toyama, Motomichi

doi:10.1186/s13673-016-0068-z

Review
Open access
Published: 05 August 2016

An investigation of unpaid crowdsourcing

Ria Mae Borromeo¹ &
Motomichi Toyama¹

Human-centric Computing and Information Sciences volume 6, Article number: 11 (2016) Cite this article

6150 Accesses
22 Citations
1 Altmetric
Metrics details

Abstract

The continual advancement of internet technologies has led to the evolution of how individuals and organizations operate. For example, through the internet, we can now tap a remote workforce to help us accomplish certain tasks, a phenomenon called crowdsourcing. Crowdsourcing is an approach that relies on people to perform activities that are costly or time-consuming using traditional methods. Depending on the incentive given to the crowd workers, crowdsourcing can be classified as paid or unpaid. In paid crowdsourcing, the workers are incentivized financially, enabling the formation of a robust workforce, which allows fast completion of tasks. Consequently, in unpaid crowdsourcing, the lack of financial incentive potentially leads to an unpredictable workforce and indeterminable task completion time. However, since payment to workers is not necessary, it can be an economical alternative for individuals and organizations who are more concerned about the budget than the task turnaround time. In this study, we explore unpaid crowdsourcing by reviewing crowdsourcing applications where the crowd comes from a pool of volunteers. We also evaluate its performance in sentiment analysis and data extraction projects. Our findings suggest that for such tasks, unpaid crowdsourcing completes slower but yields results of similar or higher quality compared to its paid counterpart.

Background

Computer technologies, which were once just used to aid individuals and organizations in their operations, have become a necessity. Due to the ability of computers to perform high speed and exact calculations, many human processes have been automated. Nevertheless, humans still perform better than computers in areas such as ideation, judgment, and perception. To maximize the strengths of both humans and computers, human computation, a computer science technique that is involved in the design or analysis of information processing systems in which humans participate as computational elements [1], has been widely studied.

Crowdsourcing is a form of human computation defined as the practice of obtaining information or services by soliciting input from a large number of people via the internet [2]. It has gained popularity over the past decade due to its ability to provide relatively cheap and fast solutions.

Based on the crowdsourcing taxonomy deduced by Hosseini et al. [3] there are four pillars of crowdsourcing: the requester, the crowd, the task, and the platform. The requester is the entity that has a set of tasks to crowdsource. The task is the activity to be done by the crowd, who is a group of people from the general public willing to participate with or without financial incentive. The crowdsourcing platform is the system used by requesters to publish tasks and by crowd workers to complete tasks.

Crowdsourcing platforms can either be paid or unpaid. In paid crowdsourcing platforms such as the Amazon mechanical turk (MTurk) [4], CrowdFlower [5], and Microworker [6], the crowd comes from a pool of paid workers. By contrast, in unpaid crowdsourcing platforms such as Crowd4U [7] and Zooniverse [8], the crowd comes from volunteers who receive no financial incentive.

Paid crowdsourcing platforms provide requesters with tools that promote their tasks to be completed by someone in a timely fashion, at equal or better quality than a full-time workforce. They enable the provision of monetary incentives along with other incentives, like recognition, which improves the likelihood of getting satisfactory results promptly [9].

Unpaid crowdsourcing platforms provide functionalities similar to paid platforms without built-in provision for monetary incentives. They employ volunteers as workers thus the completion time of tasks could be difficult to estimate. Nevertheless, since running a task in unpaid platforms does not require payment to workers, it may be an economical alternative for individuals and organizations who are more concerned about the budget than the task turnaround time.

In this study, we explore unpaid crowdsourcing by reviewing crowdsourced applications used in disaster response and relief, traffic management, and education that are powered by volunteer workers. We also implement sentiment analysis and data extraction applications in an unpaid crowdsourcing platform and compare their performance to their counterparts in paid platforms.

This paper is organized as follows. First, we briefly review the basics of crowdsourcing and introduce crowdsourcing platforms. Second, we discuss unpaid crowdsourcing and examples of its applications that are used in various fields. Third, we report the results of our sentiment analysis and data collection studies. After that, we share insights on our findings then finally conclude and present our future work.

Crowdsourcing essentials

The term Crowdsourcing was first published in 2006 in Jeff Howe’s Wired Magazine article entitled The Rise of Crowdsourcing [10]. He further defined it as the act of taking a job traditionally performed by an employee and outsourcing it to an undefined, generally large group of people in the form of an open call [11]. Since then, varying definitions of crowdsourcing have emerged. In 2012, Estellés-Arolas et al. attempted to come up with an integrated definition of crowdsourcing. They analyzed 40 original definitions from research papers in the databases of ACM, IEEE, ScienceDirect, SAGE and Emerald, and came up with a definition that covers any crowdsourcing initiative. Their definition is as follows: Crowdsourcing is a type of participative online activity in which an individual, an institution, a non-profit organization, or company proposes to a group of individuals of varying knowledge, heterogeneity, and number, via a flexible open call, the voluntary undertaking of a task. The undertaking of the task, of variable complexity and modularity, and in which the crowd should participate bringing their work, money, knowledge and/or experience, always entails mutual benefit. The user will receive the satisfaction of a given type of need, be it economic, social recognition, self-esteem, or the development of individual skills, while the crowdsourcer will obtain and utilize to their advantage what the user has brought to the venture, whose form will depend on the type of activity undertaken [12]. In addition to the four pillars of crowdsourcing, they pointed out two other elements: the incentive and the solution. The descriptions of the elements they identified are as follows.

Requester An individual, an institution, a non-profit organization, or a company that has a problem to be solved or tasks to be completed
Task The work to be done which is of variable complexity and modularity
Crowd A group of individuals of varying knowledge, heterogeneity, and number who provides solutions or completes tasks; also known as workers
Platform An application that provides crowdsourcing functionalities related to crowd and task management
Incentive Satisfaction of a given type of need, be it economic, social recognition, self-esteem, or the development of individual skills
Solution What the user has brought to the venture

A popular crowdsourcing example is the search for Jim Gray. Gray was a computer scientist who received a Turing Award for his seminal contributions to database and transaction processing research and his technical leadership in system implementation [13]. Early in 2007, he failed to return from his sailing trip around the Farallon Islands. One of the efforts done to find him and his boat was capturing satellite images of the ocean then asking volunteers from the general public to identify which images should be further examined. His colleagues posted tasks in the MTurk that asked volunteers to compare a few 300 by 300 pixels image sub-tiles to a template then provide a score for each sub-tile for evidence of features similar to the ones provided in the template [14]. In this example, the requester is Gray’s colleagues; the task is to find images that should be further examined; the crowd consists of MTurk workers; the platform is MTurk; and the solution is the aggregate of results received for each sub-tile. There is no explicit incentive or extrinsic reward, but the workers were likely driven by an intrinsic motivation to help.

Asking people to count malaria-infected blood cells on images of a Petri dish using public displays is another example of crowdsourcing, specifically called situated crowdsourcing [15]. Situated crowdsourcing is a unique form of crowdsourcing, typically unpaid, where workers who are available serendipitously, perform tasks that are associated with a particular context. The context, which may be a single place, a single event or a single event in a single place, offers a resource for structuring the activity of the crowd [16]. In this example, the requester is a group of researchers; the task is to count malaria-infected blood cells on images of a Petri dish; the crowd consists of people with physical access to the public displays; the platform is a public display terminal; and the solution is the count of malaria-infected blood cells. Similar to the previous example, there is no explicit incentive or extrinsic reward, but the crowd could have been motivated by the perceived altruism involved in the task.

Crowdsourcing platforms

A crowdsourcing platform is a system that connects requesters and workers. It is commonly a web application that provides functionalities for task and crowd management. For requesters, the choice of a crowdsourcing platform may depend on the nature of the projects they want to crowdsource and the incentive they are willing to provide to the workers. Some crowdsourcing platforms are specialized in particular projects e.g. InnoCentive [17] for research and development and ClickWorker [18] for managing e-commerce data. Other platforms are general-purpose such as MTurk, Microworker, and CrowdFlower.

General purpose platforms usually specialize in handling simple tasks or microtasks. Microtasks are tasks that require minimal time and cognitive effort but when combined can result in major accomplishments [19]. They are typically simple, repetitive, independent, and short. Examples of microtasks include labeling, transcription, text tagging, and sentiment analysis. However, there are more complex tasks that are context-heavy, interdependent, require more cognitive effort, and may take many hours to complete [20]. These tasks are called macrotasks. Examples of macrotasks include programming, document editing, and sentence translation. Macrotasks may be deployed on general purpose platforms in their original form or decomposed into microtasks. Larger tasks with more complex requirements such as software engineering and journalism are typically published in generic online outsourcing marketplaces or global online work platforms such as Upwork [21] and Freelancer [22].

We can also classify crowdsourcing platforms as either paid or unpaid. In paid platforms, requesters can tap into a large population of workers around the globe to accomplish tasks in a fraction of the time and money of more traditional methods [23]. On the other hand, requesters rely on volunteers or other crowd gathering techniques in unpaid platforms. In the following subsections, we will discuss examples of paid and unpaid platforms.

Paid platforms examples

The Amazon mechanical turk or MTurk for short is one of the most popular crowdsourcing platforms, available only to requesters in the United States. It has been used in studies from human linguistic annotation to image classification and has been a topic of interest for researchers on human–computer interaction (HCI), information retrieval, computer science, economics, and data mining [24]. As a marketplace for work, it provides requesters an on-demand and scalable workforce and allows fast turnaround time and relatively low costs of setting up a project [4]. It also implements a mechanism for filtering workers based on the workers’ qualifications to uphold the good quality of work. According to Ipeirotis, approximately 80 % of the mechanical turk workers are from the US and 20 % are from India [25].

CrowdFlower is another general purpose crowdsourcing platform that distributes tasks to its workers and workers from different partner labor channels [26]. Partnering with a multitude of labor channels with workers from all over the world (e.g. clixsense [27] and neobux [28]) significantly increases the scalability of the platform and diversifies the workforce [29]. For quality assurance, CrowdFlower screens workers by testing them using questions whose answers are already known (gold questions), by tracking contributor response velocity to answer distribution, and by assigning a confidence score to every unit of completed work [5].

Another platform similar to MTurk and CrowdFlower is the microworkers platform. Like CrowdFlower, it is open to requesters and workers regardless of geographic location. It employs task management and verification measures by using text Captcha test, gold questions, system and employer task verification, and individual or mass task rating [6].

Unlike the platforms mentioned above where all crowdsourcing processes can be done remotely through the Internet, bazaar is a paid platform for situated crowdsourcing. Requesters create and manage tasks using a web application while workers complete tasks using the touch screens in physical kiosks that can be placed in different locations [30]. Workers are paid with HexaCoins, a virtual currency which workers can exchange for money or goods.

Unpaid platforms examples

Crowd4U is a volunteer-based microtasking platform developed by the University of Tsukuba, Japan, whose crowd members consists of volunteers from universities or other research institutions. It provides a form-based rapid application development (RAD) tool to allow easy creation of commonly implemented tasks such as majority voting and translation [31]. To create more complex tasks, requesters must be knowledgeable in CyLog, a Datalog-like language that incorporates a proper feedback system for humans at the language level [32]. Tasks may be published and completed in the official Crowd4U website or embedded in other websites [33]. Additionally, tasks may also be performed in mobile devices by installing an application that requires a user to perform a microtask to unlock the screen.

Bossa is an open source software framework for volunteer-based crowdsourcing developed by the University of California, Berkeley. A Bossa instance must be hosted on a Linux server. Within the instance, requesters can generate, present and manage tasks by using PHP scripts while volunteers can perform tasks and interact with other volunteers. Bossa also provides support for dealing with the differences in the skills of volunteers by maintaining estimates of the skill level of volunteers and ensuring that for each task, there is a consensus of compatible results among a sufficient set of volunteers [34]. Bossa also integrates with Bolt, a web-based training framework, which could be used to train volunteers on how to perform tasks.

PyBossa is a free, open-source framework for crowdsourcing [35], developed by SciFabric, a company that develops open source software for crowdsourcing research. As its name suggests, it is based on Bossa but rewritten in the Python programming language. Similar to Bossa, programming skills are required to create and manage tasks thus making it not as user-friendly as the popular paid crowdsourcing platforms. Nevertheless, it is equipped with features that help in the management of tasks and analysis of results. Requesters have the option to deploy their instance of PyBossa or publish tasks in Crowdcrafting [36], a live instance of PyBossa, where tasks may be created, managed, and completed.

Zooniverse describes itself as the world’s largest and most popular platform for people-powered research [8]. It started with Galaxy Zoo, a website where volunteers can participate in research by classifying galaxies from the Sloan Digital Science survey. Similar projects followed and as of writing, Zooniverse hosts dozens of projects, where anyone can participate in crowdsourced scientific research. Additionally, anyone can create a project in Zooniverse and tap on its community of volunteers.

CrowdButton [37] is a work-in-progress volunteer-based platform for situated crowdsourcing. The platform is made of a server and multiple Wi-Fi enabled reporting devices. The device consists of one replaceable question area, four arcade game buttons with built-in LED lights. The initial implementation supports multiple-choice questions, which volunteers can answer by pressing a physical button in a specific location. The answers are sent to the server where they are aggregated.

Unpaid crowdsourcing

In general, the use of unpaid crowdsourcing platforms is referred to as unpaid crowdsourcing. More specifically, unpaid crowdsourcing is a type of crowdsourcing wherein the workers are not incentivized by money but are motivated by other factors such as reputation, status, peer pressure, fame, community identification, and fun [38]. However, some non-monetary incentives depend on the nature of the task or the identity of the requester [39]. For example, unpaid sentiment analysis and data extraction crowdsourcing projects may not attract volunteers because they can be tedious and would primarily benefit the requester. To recruit volunteers in unpaid crowdsourcing, requesters must turn to other means of crowd gathering techniques such as requiring users, making users work in return for a service, and piggybacking on the user traces of a well-established system [40]. It appears tedious compared to paid crowdsourcing but through this approach, requesters can take advantage of a certain level of crowd control, an opportunity to get high-skilled volunteers, and a workforce without labor cost.

Crowd control may be implemented by making tasks visible only to certain people. For example, a crowdsourcing task within an educational institution can only be made available to students who fit certain criteria. On the other hand, requesters have the opportunity to get high skilled volunteers if a task is available to the public because all sorts of workers can work on a task, including high-skilled workers, who typically charge a high fee. It can be argued that these advantages have their corresponding disadvantages such as not getting enough workers and getting spammed by malicious workers. Nonetheless, there have been many successful unpaid crowdsourcing projects that we will tackle in the following subsections. We will discuss citizen science and several projects in disaster response and relief, traffic management, and education that relied on volunteer workers.

Citizen science

Citizen science is a form of research that involves the participation of the general public to aid in carrying out scientific research [41]. It is very similar to crowdsourcing as they both involve tasks to be accomplished and workers to accomplish the tasks through a platform. However, their objectives are different. The objective of crowdsourcing projects could be personal, business or scientific, while the primary objective of citizen science projects is to advance scientific research [42].

Typically, the workers who are volunteers with or without specific scientific training, perform or manage research-related tasks such as observation, measurement or computation [43]. Many projects involve both citizen science and crowdsourcing. Nevertheless, not all citizen science projects include crowdsourcing, and not all crowdsourcing projects include citizen science [42].

While the paid and unpaid crowdsourcing platforms discussed in earlier can be used for citizen science, there are other specialized platforms where one can create citizen science projects, such as CitSci.org [44], Zooniverse [8], and iNaturalist [45].

Disaster response and relief

Volunteer-based crowdsourcing has been used in disaster response and relief projects. Disaster response and relief is a unique application of volunteer-based crowdsourcing because volunteers may not just be volunteers who solve computational problems but also volunteers who help in the actual response and relief operations. In Chu et al. [46] CROSS system, disaster surveillance data collection is crowdsourced through volunteers who explore threatened areas. CROSS works by calling out for volunteers through social networks such as Facebook and Twitter. It then plans exploration routes for volunteers who responded based on the volunteers’ locations. CROSS continuously interacts with volunteers throughout their exploration by collecting reports, integrating messages and displaying useful information on a map.

Starbird’s Tweak the Tweet (TtT) [47] was used in the aftermath of the Haiti earthquake in 2010. It is an idea for utilizing the Twitter platform for crowdsourcing information provisions during disasters and mass emergency events. Unlike other more formalized crowdsourcing systems deployed on crowdsourcing platforms, TtT operates completely within the existing functionality of Twitter. The idea is to ask volunteers who are exploring affected areas to add special hashtags into their crisis-related tweets to make the tweets machine-readable. Other crisis-related tweets without the proper hashtags are formatted by another group of volunteers into tweets with the special hashtags. As a result, the volunteers were able to provide a self-activated, self-organized layer of human computation that provides a mechanism for searching and filtering disaster-related information on social media.

Like TtT, the artificial intelligence for disaster response (AIDR) [48] system takes its input data from Twitter. The tweets are classified into a set of user-defined categories using machine learning techniques and crowdsourcing. It works by allowing the user to input keywords, geographic locations or languages to filter tweets. Based on the filtering criterion, it harvests relevant tweets. It crowdsources the labeling of the harvested tweets using the PyBossa platform. The labels from volunteers are then used as input to the system’s machine learning algorithm to be able to classify tweets automatically. AIDR was successfully tested to classify informative tweets posted during the 2014 Pakistan earthquake.

Traffic management

In traffic management, the crowd is primarily used for collecting data regarding traffic conditions, in specific geographic locations. One of the most used traffic applications is Waze, a community-based traffic and navigation application [49]. It is a data collection platform that allows users to post observations of traffic incidents such as road construction, hazards, or accidents [50]. When the application is running on a mobile device, the user passively contributes traffic and other road data but can also voluntarily send more details about his or her location. The information collected from Waze users enables the application to give other users real-time traffic updates and suggest optimal routes for their destinations.

Another application of crowdsourcing in traffic management is in the improvement of traffic light timing in intersections. Riley et al. introduced a tool that allows mobile phone users to play a game challenging users to find the optimal light timing for a simulated traffic intersection [51]. The crowd is challenged to create new configurations for optimal timing or improve configurations from the high score list. The authors further plan to model an actual road intersection and see if the application can find an improved signal configuration.

Artikis et. al. [52] proposed an intelligent urban traffic management system that uses fixed sensors positioned on intersections, and mobile sensors installed in public transport vehicles. Sensors provide information to the system while humans serve as judges to resolve conflict in the data collected by the sensor. The system queries human volunteers or workers close to the location of the conflict to determine which data is real or not.

Education

The use of crowdsourcing in education has also been explored. In 2011, Bow et al. [53] developed a crowdsourcing model for creating tools to aid studying preclinical medicine at the John Hopkins University School of Medicine. They developed a simple Java program and used Google Drive to enable the crowd to create and edit flashcards simultaneously. The crowd, which consists of medical students in the class of 2014, populated a database with more than 16,000 flashcards. An analysis of the students’ exam scores revealed that the students in the class of 2014 outperformed those students in the class of 2013 who did not have access to the system.

Aside from the creation of study tools, crowdsourcing has been used in grading or evaluation of students’ requirements. CrowdGrader [54] is a system available to the general public that lets students submit and collaboratively review and grade homework [55]. Within the application, students can submit homework and evaluate several submissions of other students. The students are given an overall crowd score, which reflects the quality of their homework and the quality of reviews they give. The system is beneficial to students as they can benefit from the feedback of their peers and learn from the solutions submitted by others. Aside from those, CrowdGrader also helps instructors facilitate student learning and handle grading and evaluation of large classes.

In the two previous illustrations of crowdsourcing in education, the crowd consists of students. However, Dow et al. [56] studied how input from an anonymous online crowd affects student learning and motivation for project-based innovation work. The crowd’s role was to provide students with authentic users’ opinions and realistic market forces in their projects. According to students, the online crowd helped them to identify needs quickly and inexpensively in the early stages of the innovation process. However, in the later stages, the students received a large, quantity of feedback with low quality.

Comparison of paid and unpaid crowdsourcing

In the previous sections, we differentiated paid and unpaid crowdsourcing based on the incentives they provide to workers and the platforms where they are deployed in. Only a few studies have compared the two based on the quality of results. In one of these studies, Mao et al. [57] adapted an annotation task that was originally performed by volunteers in the Planet Hunters citizen science project, to an experiment in MTurk with paid workers. They investigated how three types of payment schemes (pay per task, pay for time, and pay per annotation) influenced the behavior of paid workers compared to volunteers. Their findings show that given appropriate incentives, paid crowd workers might work at a faster rate and achieve similar accuracy compared to volunteers who are working on the same task [57]. Goncalves et al. [15] compared the performance of unpaid situated crowdsourcing for counting Malaria-infected blood cells with the performance of the same task deployed in MTurk. They observed that in unpaid crowdsourcing through public displays, the accuracy of results were lower but the rate of uptake of tasks was higher compared to MTurk.

Since related studies are still inconclusive, we perform experiments to investigate further the quality of results produced by paid and unpaid crowdsourcing. In the following subsections, we will discuss two crowdsourcing projects that are commonly done in MTurk and CrowdFlower: sentiment analysis and data extraction. We deployed these projects in both paid and unpaid platforms then compared their results based on completion time, crowd costs and quality of results. We used PyBossa as the unpaid crowdsourcing platform and CrowdFlower as the paid platform.

Case study 1: sentiment analysis

Sentiment analysis, which is also referred to as opinion mining, is defined as the task of finding opinions of authors about specific entities [58]. It may be conducted manually by experts determining the sentiment of a given text, automatically using machine learning algorithms and statistical methods, and by crowdsourcing. In this study, we performed sentiment analysis on student evaluation comments by paid and unpaid crowdsourcing. The comments, which were written in English, were collected from the 2006 to 2012 Student Evaluation of Teaching (SET) comments [59] for Professor Jonathan Cox, a Mathematics professor at the State University of New York at Fredonia. From the unstructured document that contained the comments, we were able to extract 418 comments with an average number of 46.52 words each and a standard deviation of 35.48.

Paid version

We first created the sentiment analysis project in CrowdFlower. CrowdFlower provides a template for sentiment analysis, which includes default settings such as the payment per task and the number of judgments per task. The default values are 0.01 USD and 3, respectively. Using the template, we only had to provide instructions and specify the input data source. Fig. 1 shows the instructions and Fig. 2 shows a sample task. The cost of the project totaled 15.35 USD, which included the 0.01 USD payment per task and the platform service charges. The project was completed in 2.90 h by 86 workers. Each worker performed an average of 14.58 tasks with a standard deviation of 7.64.

For every task, CrowdFlower chooses the response which has the highest confidence score. The confidence score is based on a worker’s trust score, a value that ranges from 0 to 1, where 0 is the lowest and 1 is the highest. The confidence score is calculated by adding the trust scores of contributors then dividing it by the sum of trust scores of contributors for a specific task [60]. The summary generated by CrowdFlower showed that the paid crowd detected 141 positive, 231 negative, and 46 neutral comments.

Unpaid version

We used the same settings, instructions, and input data as the paid version to create a crowdsourced sentiment analysis in PyBossa. Fig. 3 shows the landing page and Fig. 4 shows a sample task. We advertised the project on Twitter and Facebook and sent personal email messages to invite certain people to participate. Forty-six volunteers completed all the tasks in 44.8 h. Each volunteer performed an average of 28.50 tasks with a standard deviation of 60.65.

The output of the project was a CSV file containing the task runs. Post processing had to be performed to derive the sentiment of each comment. We adopted CrowdFlower’s formula in deriving the final judgment. However, since we did not have the volunteers’ trust scores, we assumed that they are all trustworthy and assigned them a trust score of 1. The resulting formula is equivalent to the rule of the majority. When a task received three different responses, it was classified as neutral. We derived 138 positive, 204 negative and 76 neutral comments from the unpaid crowd’s responses.

Results

In Table 1, we summarize the comparison of the two methods in terms of crowd cost, completion time, and accuracy. Crowd cost is the amount paid to the platform and the crowd workers; completion time refers to the time from when the project was launched to the time when all required responses were received, and accuracy is the degree of similarity of the results compared to a gold standard. We use the manual evaluation of the same comments in [61] as the gold standard to measure the method’s accuracy.

Table 1 Comparison of paid and unpaid Crowdsourced sentiment analysis

Full size table

As presented in Table 1, the paid method completed significantly faster than the unpaid version while the accuracy of the two crowdsourced methods is marginally similar.

Table 2 Comparison of paid and unpaid Crowdsourced data extraction

Full size table

Case study 2: data extraction

In the Database Laboratory at Keio University Faculty of Science and Technology, Japan, students take turns in studying research papers related to their research topics and presenting these papers to the entire laboratory. The presentation files are stored in a digital repository, which is available to all other students. From 2008 to 2015, 341 research papers have been presented. However, to date, there is no summary or index of the presentations, making the search for a particular paper presentation difficult. In this project, we aim to extract information from the digital repository and create a meaningful index that could be useful to the researchers. Since the project requires data to be extracted from PDF files that do not have a standard format, automatic data extraction is difficult, and manual data extraction is tedious. Nevertheless, in this case, crowdsourcing is a suitable option.

Aside from the benefits of the project to the research laboratory, we also want to compare the quality of results from paid and unpaid crowdsourcing for this type of task. To achieve this, we created a gold set of the 70 research papers presented in 2008 and 2009 by manually pasting the APA bibliographic information of each paper to a text file then running a script to get the desired information. We intend to use the gold set as a baseline to evaluate the quality of results from paid and unpaid crowdsourcing and as gold test questions when we crowdsource all the 341 research papers.

We designed a task to provide a worker with a link to the digital archive and ask him or her to extract the title, authors, source, year of publication, presenter, and presentation date of 70 research paper presentations from 2008 to 2009. It is important to note that one-third of the research papers presented was in Japanese, and the rest were in English.