Data Dilemma

Big Data is used to determine everything from job offers to prison sentences. And it’s a lot less objective than it seems.

By Chelsea Barabas

Tagged dataEmploymentinnovationLabortechnology

In the coming years, the way workers relate to technology will play a big part in determining who will have opportunities and who will struggle—but not quite in the way commonly thought. Many of today’s experts have emphasized the importance of preparing workers with the “twenty-first century skills” needed to thrive: knowledge of STEM, entrepreneurship, and creative thinking. Underlying this emphasis on “re-skilling” the workforce is an assumption that workers need to adapt to changing roles and responsibilities as automation and information technology take over critical functions in the workplace.

But this framing reflects too narrow an understanding of the role that technology plays in driving an unequal labor market. Both skilled and unskilled workers are increasingly assessed and controlled not by human supervisors but by opaque algorithms, which help determine everything from one’s prospects in the job market to one’s likelihood of receiving a promotion. Battles over the future of work aren’t going to play out between machine and man. They will be between the workers who generate data and the people who are well positioned to leverage it.

There are a rapidly growing number of companies in the business of aggregating and analyzing data to infer a worker’s skills, market demand, and compatibility with the firm. The types of data that these companies use are diverse and vary with the nature of the job. For computer programmers, they might include open repositories of code. For truck drivers, the data used might be less directly related to their work, such as comments made on social media. Third-party companies also use these data to build machine learning models that recommend desirable candidates for recruitment. They then sell those models, via subscription services, to firms looking to more efficiently recruit new employees and make smarter hiring decisions.

Yet, as the prominence of these data-driven recruitment practices grows, there remains very little awareness among workers about how the data they are generating online might impact their future career prospects. We should, therefore, be asking more fundamental questions about how information technology, particularly data and machine learning, will reshape critical relationships of power and control in the workplace. Only then can we conceive of the legal and structural safeguards necessary to ensure widespread prosperity in the future economy.

The Rise of Oppressive Meritocracy

In Average Is Over, economist Tyler Cowen argues that the economic winners of the future will be “infovores,” the 20 percent of the population who know how to harness information technology to their competitive advantage. Meanwhile, the remaining 80 percent of the workforce will be comprised of a new economic underclass, who will endure stagnant or even falling wages.

For Cowen, the thing that will determine whether one can “master the machine” is largely based on a person’s ability to learn skills that complement the superpowers of our machine counterparts. There’s a certain class of high-skilled information workers—journalists, financial analysts, academics, medical practitioners, data scientists, etc.—who will have the intelligence and discipline necessary to synthesize information and use it to increase their productivity through work alongside machines. The rest will be competing directly with intelligent software for their jobs. This includes everyone from blue collar workers in factories and transportation to white collar workers in management and accounting.

But that doesn’t necessarily mean that machines are eating all of our jobs. Automation enthusiasts argue that human partnerships with machines will open up opportunities for new, hyper-productive forms of work. Chess is a popular way that Cowen and others have illustrated the synergistic side of relationships between computers and humans. Chess software has an immense advantage over humans because it can “train” in a few minutes on a far greater number of chess games than any one human could in her entire lifetime. But that doesn’t mean that such technology will render humans obsolete. By 1997, chess software like IBM’s Deep Blue was good enough to beat the world’s best chess player. But when the software was provided to chess experts as a tool rather than as a competitor, those players could perform at a higher level than any human or machine could by itself.

Data is driving power shifts between the workers who generate it and those who have the resources to extract its value.

This happy story of increased productivity is what Cowen and others envision the future of work to be like for those able to labor in complement to the machines. At the heart of this harmonious man-machine relationship is an emphasis on making decisions based on data analysis, in order to transcend our overreliance on predictably flawed human intuition. This pairing of machine learning with human synthesis and decision-making will not only result in more productive workers, says Cowen. It will also breathe fresh air into America’s favorite idea—meritocracy. The workers of tomorrow will be identified, recruited, and incentivized according to data-driven metrics on their performance and potential, not their race or gender or other messy social proxies for talent.

According to Cowen, we will be able to produce greater insight into workers’ performance, as more and more of our workflows become digital. For example, journalists can now get real-time feedback, based on metrics like how many times their articles are emailed, tweeted, or shared on Facebook. This enables widely read journalists, such as Nate Silver, to quantify the value of their worth in specific figures, and then use that to negotiate for better wages. Performance metrics like this may also make it easier to identify “deadweight” employees who lower the overall productivity of the group. The hope is that, at its best, machine learning can help us overcome sticky cognitive biases, by providing us with more objective insight into worker performance.

Cowen’s vision of the future economy is disturbing because it predicts a deeply bifurcated world of haves and have-nots. But, theoretically, it will also be fair. There will be inequality in the workforce of the future, but that inequality will be based on finer and finer grained analyses of our productive capabilities. Therefore, decision-makers should work to ensure that people have the skills they need to be competitive amidst these changes. This characterization aligns nicely with the popular view that we need to “re-skill” the workforce in order to address issues of persistent underemployment and growing inequality in the United States. Developing (measurable) modern-day competencies is the best vehicle for upward mobility in this future world.

However, this view vastly oversimplifies the issue of data-driven decision making in the workplace, by failing to recognize the shifts in power that necessarily emerge along with our new data-driven decision-making processes. The types of changes that technology is bringing to the workplace have less to do with skill and a lot more to do with control.
Decision-makers should be asking more questions about who is able to use data to access new opportunities. Who controls the gathering and interpretation of our data footprints? How do we think about bias and discrimination in decisions that are heavily influenced by machine learning? What types of questions are we allowed to ask via the data we have access to?

It’s All About Control

Data is driving critical power shifts between the workers who generate it and the platforms and people who have the resources to extract value from it. With that in mind, let’s look at a few key examples.

First, we have a long way to go before we can develop reliable metrics about worker potential and performance without running the risk of replicating existing biases. During my graduate work, I studied an emerging class of companies who specialized in using machine learning to find and recruit employees in the tech industry. These companies were in the business of uncovering hidden insights in data that they scraped together from public websites where a lot of technical people hang out, such as GitHub (a site that enables programmers to collaboratively work on a shared codebase). Through this machine learning, large amounts of data were used to identify new patterns and trends, such as the defining characteristics of high-performing employees.

But correlative patterns do not equal causation, and many of the trends identified from such data reflect preexisting circumstances, as opposed to new opportunities. For example, one team of data analysts I interviewed was proud to have discovered that individuals who listed Lord of the Rings as one of their favorite books on Facebook were more likely to be hired as successful CTOs in tech companies. This insight helped determine which candidates were recommended for certain high-ranking jobs. But the type of people who fit this profile of an “ideal CTO” are also likely to reflect the demographics of the folks who currently hold these positions in the tech industry, which means that these metrics tend to simply reveal who is already in power, rather than finding people who are truly best equipped for the job.

Fortunately, important work is being undertaken to examine how biases like this work. For example, earlier this year, ProPublica conducted an investigation into an algorithm called COMPAS, which has been widely used by law enforcement to aid in the sentencing of those convicted of a crime. Researchers found that COMPAS consistently overestimated African-American defendants’ risk of repeat offenses, even though race was not explicitly inputted into the system. Similarly, a higher number of white defendants (47 percent, compared to just 28 percent of African Americans) were ranked as lower risk, but then went on to commit further crimes. These disparities could emerge for a variety of reasons. For example, while the score did not explicitly incorporate race as a variable, it did weigh items that correlate with race, such as poverty levels, joblessness, and social marginalization. This means that race might have been “double encoded” in the data in a way that led to biased results. Examples like this demonstrate how much more work needs to be done to establish practical frameworks for identifying and regulating this kind of algorithmic bias.

Sadly researchers and journalists could face serious legal repercussions for pursuing this type of work. In order to carry out investigations of this kind, researchers often must violate a website’s terms of service, either by collecting data from the site or by manipulating the algorithm in a way that was not intended or authorized by the operators of the platform. Under the Computer Fraud and Abuse Act (CFAA), individuals who deploy this method could be found guilty of a federal crime. This has a significant chilling effect on many researchers, who would rather not meddle in such a legal grey space. As algorithmic assessments become increasingly prevalent in the workplace, it will be important for lawmakers to revise laws so that important frameworks and guidelines can be developed to assist in the deployment of fair algorithmic practices.

But even if we could develop truly reliable predictions about who is best for a given job, that does not necessarily translate into a better overall situation for workers, since it doesn’t improve either their wages or their conditions. As discussed, the people who benefit most from the use of digital data are those with the resources necessary to employ these insights to their advantage. Right now, that privilege primarily lies with companies, not individuals. As such, the data-based insights being deployed are shifting the power dynamics in hiring and recruitment in important ways.

In my research, I found that data analytics are sometimes being used to drive down salary offerings to potential new hires. For example, one data platform would assess candidates along two key metrics: skill level and market demand. These scores could be used to offer lower salaries to candidates who might have a high skill level, but low market demand, thanks to the fact that, for example, they did not have a university degree. Such uses were not intended from the outset. Rather, they seem to have emerged gradually, as data analytics companies built their business models around the needs of those companies who could pay a hefty subscription fee. Workers whose data are included in these search engines often do not even know that they are being assessed in this way, as there is no legal obligation for companies to ask for their consent.

And our old paradigms for regulating such abuse and bias in predictive scoring are massively outdated. Probably the best legal precedent we have for monitoring algorithmic sorting methods like this are based on the Fair Credit Reporting Act, which is designed to protect consumer interests in the face of third-party data collectors like credit reporting agencies. Currently, these regulations do not extend to the growing group of entities that are providing “credit report” type services for job recruitment, because they are not strictly classified as credit reporting agencies.

Moreover, the machine learning processes being developed today are qualitatively different from the paper trails of the past, requiring new ways of thinking about how accuracy and due process can be enforced in the face of such massive amounts of digital data. Much work needs to be done to develop appropriate safeguards against the unfair practices and abuse that could stem from the current information asymmetries that exist between workers and employers.

Another complementary way of dealing with these asymmetrical benefits would be to construct tools and policies that enable the average worker to make use of their own data to access new opportunities and inform important decisions. A practical starting place might be to focus on changing the relationship workers in the “gig economy” have to the data they generate on platforms like Uber.

Currently, drivers work to build up their reputations on Uber’s platform, which can give them access to valuable opportunities, such as the financing of a new vehicle. However, those reputations are not portable to other networks, which means that workers must start from scratch if they want to engage in work with a competitor. This severely diminishes the leverage workers have in negotiating the terms of their employment, because they can’t easily switch to another platform without losing that valuable professional data. Whereas, 20 years ago, a plumber could source most of his business through word of mouth, today that same plumber’s business will live and die by his online reputation on sites like Yelp and TaskRabbit.

Moreover, laws like the Digital Millennium Copyright Act (DMCA) make it challenging for workers to develop software tools, or “bots,” that assist them in gaining a clearer perspective on the job market. It’s technically feasible for workers to develop bots to interact on their behalf through networked service platforms, to inform them of which company is offering the most advantageous deals at any given time. According to technologist and venture capitalist Albert Wenger, bots would enable workers to simultaneously participate in a variety of marketplaces, as well as to play one off against the other. However, specific provisions in the DMCA bar users from circumventing access controls and digital rights management protections, which is often necessary in order to reverse engineer the development of such bots. In other words, the benefits of these data are asymmetrical: Companies can use our data to their advantage, but we can rarely use that same data to further our own ends.

As we move toward this economy where much of our labor is contingent on professional reputation and dynamic data analysis, we need to think about ways to make sure that information can be leveraged by workers, not just private companies.

The Way Forward

Fortunately, there are some clear, practical next steps we can take in order to lay the foundation for a more equitable economic future for all Americans. We can start by revising problematic laws like those mentioned above, the Computer Fraud and Abuse Act and the Digital Millennium Copyright Act. Both of these pieces of legislation are too far-reaching and prioritize the interests of corporate entities over individual workers and researchers.

Some legislators have crafted bills, such as “Aaron’s Law,” to restrict the scope of the CFAA’s reach. However, none of these pieces of legislation have made it through Congress. This summer, the ACLU filed a lawsuit on behalf of a group of researchers in the hopes of establishing a strong precedent for the right to research discrimination online under the First Amendment. The Electronic Frontier Foundation also recently filed a case to challenge the constitutionality of the DMCA’s anti-circumvention and anti-trafficking provisions. Efforts like these are important first steps toward the development of essential research and tools designed to empower and protect the average worker.

In addition to legal reform, policymakers should strive to develop practical frameworks for regulating a new class of influential data brokers who operate outside the scope of the Fair Credit Reporting Act, the prevailing framework for regulating credit reporting-type agencies. Legal scholar Jack Balkin has proposed the concept of “information fiduciaries,” or information brokers (i.e. Google or Facebook) with a duty to preserve their client’s interests when handling their personal information. In some sense, information fiduciaries already exist. Lawyers and doctors are obliged to use confidential information in ways that benefit their client, and not share any information against the client’s interest.

We should explore ways of expanding these types of trust obligations to new domains of sensitive digital information, by creating new categories of fiduciary-beneficiary relationships that are organized around the collection and analysis of digital data. Information fiduciaries could be a pivotal first step in defining clearer terms and conditions under which new data intermediaries, such as data recruitment companies, can operate. It would reorient company practices around worker interests, as well as place explicit limitations on the types of questions and insights information providers can share with third parties. This would go a long way in addressing worrisome asymmetries in the access and control of data that currently exist between employers and employees in the labor market.

Finally, policymakers can also take a step toward enabling the development of more tools to empower workers in leveraging the data they generate. A natural starting point for this would be for city and state legislators to negotiate with platform service providers, such as Uber and Lyft, so that their workers can access and leverage their professional reputation in the broader marketplace. In addition, regulators should negotiate for workers to be able to interact with their network using their own software tools, such as bots that let drivers compare prices and wages across competitor platforms.

Albert Wenger argues that regulators in a large municipality such as New York City should have no problem effectively bargaining for these kinds of workers’ rights, given the size of their markets. In fact, in cities like Boston and Portland, local leaders have already made significant headway in this direction, by striking data-sharing deals with companies like Uber to help inform infrastructural and traffic management projects in the future. Such negotiations would also open up new opportunities for the broader public to benefit from the data that they are collectively generating.

It’s time we stop focusing so much energy on equipping people with “twenty-first century skills” and take a long, hard look at how the power dynamics between employers and employees are shifting in critical ways with the rise of data-driven decision-making. Our next great challenge will be to identify practical ways to re-distribute the benefits of digital technology to a broader set of the population, through the development of empowering infrastructures and worker-friendly legal frameworks.

Read more about dataEmploymentinnovationLabortechnology

Chelsea Barabas is the Head of Social Innovation at the MIT Digital Currency Initiative.

Click to

View Comments

blog comments powered by Disqus