Big Data, Surveillance, and the Tradeoffs of Internet Regulation

Born and raised in Seoul, Korea, Seungki Kim divides his time between his schoolwork as a junior at British Columbia Collegiate School and pursuing his motley interests that include scuba diving, music, and writing. He enjoys learning about the ethical challenges that societies face and plans to study international relations and philosophy in college.

ESSAY TOPIC: Is there an ethical responsibility to regulate the Internet? If so, why and to what extent? If not, why not?

Imagine entering an airport to board an international flight. One goes through identification checks, baggage inspection, passport control, and sometimes even more hurdles just to get to the gate area. Few people, perhaps no one, would look forward to these inconveniences with open arms. However, one still subjects oneself to them knowing that they are just necessary steps to get safely to one's destination. In a way, we modern citizens have lent tacit consent to such measures in return for the security we need. They are there not to badger us, but to stop the flow of persons and substances that cause us harm. To regulate travel means to ensure public safety.

One may argue that the same checks must be in place in the cyberspace to stem the flow of harmful information, the dissemination of abusive media, and the conduct of illicit transactions and communications. But are those short inconveniences the only tradeoff? In the airport security example, there are a number of other, hidden consequences that we make as we move through the security lines. The passenger data collected by airlines include basic biographical information such as name and date of birth and payment information such as credit card number and issuer. These data are passed onto government agencies under the name APIS, or Advance Passenger Information System. As one passes through security checks, one's face is recorded by surveillance cameras, contents of bags imaged, and sometimes questions asked as to one's destination and purpose of travel. Once collected, there are myriad uses of the data, and government agencies have indeed used them without much restrictions, their secrecy guaranteed by national security laws. In other words, regulation and surveillance go hand-in-hand. Effective identification of risks almost always requires effective information gathering. The situation is like a house search. In order to single out one person of risk, the whole space must be brightly lit, everyone must yield their personal information, and their persons and belongings must be thoroughly inspected. In short, surveillance enables regulation; the latter is impossible without the former.

When it comes to the Internet of today, there is one more thing to consider, chiefly big data. Together with artificial intelligence and machine learning, big data is one of the buzzwords in the technology industry today, and it is not difficult to see it pontificated in today's newspapers, television reports, tech magazines, blogs, and others. According to Gartner's IT Glossary, big data can be defined as "high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation." What about the nature of big data that potentially makes Internet regulation more of a danger than a benefit?

The latter part of the above definition sheds some light on the end purpose of big data: to derive maximum possible knowledge from gigantic sets of data that are otherwise too big, fuzzy, and irregular to squeeze meaning out of. Big data analytics operate in such a way as a satellite observing the earth sifts through enormous amounts of noise-filled visual information to see what previously remained invisible. Indeed, this "enhanced visibility" enables, for example, governments to trace a single missile, detect a furtive movement in enemy territory, or pinpoint a single person of interest. The danger, however, lies in its more devious uses: given the new avenues for previously unobtainable knowledge, governments will try to amass as much data as possible; and since the nature of big data implies that such "enhanced insight" cannot be obtained unless vast amounts of data are available in the first place, a hunt for more data is readily foreseeable. In sum, one major use of big data is to vastly improve the capacity for surveillance—not just against enemies, but also against the very citizens of a democratic society.

The growth of surveillance is not new. With the advent of the digital age, many organizations and institutions, ranging from private firms to police and security agencies, now have already heavily mobilized equipment designed for surveillance in virtual spaces. Today, individuals are monitored ubiquitously through the collecting, storing, and analysis of data, which can extrapolate sensitive information about people. The extent to which this practice is done without prior consent of the individual users is alarming. With the disappearance of conventional physical, spatial, and temporal barriers that had previously blocked out surveillance activities from certain inviolable domains, the modern digital environment has effectively eradicated privacy.

In today's world, no more than a casual examination of the everyday life is necessary to gauge how thoroughly, pervasively, and relentlessly our data are being collected. Surveillance is not limited to sensitive areas such as airports—it has penetrated into our everyday activities and thrust into our private life. In our capitalist economy, the typical day is filled with moving through commercial space and making commercial transactions. Companies, for instance, amass substantial amounts of consumer data, which are then processed and fed to algorithms called analytics to maximize profits. Massive consumer databases are built from transaction records, recordings of closed-circuit cameras, phone inquiries, customer surveys, loyalty programs, as well as various identification and access cards used by some businesses.

Since these activities serve business interests, they are well catered by other businesses specializing in the collection and processing of such data. The British company Cambridge Analytica, which was alleged to have illicitly obtained Facebook user data, is a case in point. This type of surveillance, as a "function" embedded within a wide variety of mundane activities, almost always leads to a classification of individuals and groups into categories. Thus, the development and implementation of surveillance activities invariably involve collecting, classifying, managing, manipulating, and commercializing personal data.

The inclusion of commercial sphere in the domain of surveillance is significant in that surveillance has deeply embedded itself inside sectors that might have been free, though only to a relative extent, from governmental control and authority in the past. Today, it is performed cooperatively between government organizations and private entities: the government, citing its rightful authority to carry out justice, demands data from companies, which often oblige with such requests. In other words, by watching over large corporations, the government also gets to watch over much wider swaths of the population. By gathering data produced by vast populations, the government is then able to obtain other data that are unrelated to the original collection goals. This method of data acquisition, comparable to warrantless searches, is not only illicit, but also inimical to democracy.

Another major risk possibly stemming from Internet regulation involves the unregulated dissemination of private information: making copies of private information available to data analysis gives rise to duplicates of the same private information—the so-called "data doubles." The danger is that they are no longer physically bound to their original human owners and are bounced around different computing processes and wedded to different data to yield yet new meaningful information. Once created, these shadows lead lives of their own, independent of their precursors. The basic structure of our digital communications is shaped in such a way that it cannot operate without the unwitting duplication and dissemination of data. Such a vicious cycle results in the untrammeled proliferation of parts of one's private information, thoughts, and actions at an unprecedented scale. The slip from availability to exploitation is easy and can be done so surreptitiously precisely because of the nature of digital information. Since the sole objective of big data algorithms is extracting meaningful information—singling out "signals" from a sea of "noise," that is—concerns of privacy become moot.

In conclusion, while safety and security are important concerns indeed, regulation must always proceed with a regard to its side effects: do we want more issues involving personal data going rogue? Do we trust our governments' ability and willingness to regulate themselves and their handling of information? Yes, there is an ethical responsibility to regulate the Internet, in that harmful flow of information and communications must be stopped. However, I believe this must be a collaborative process involving all parties—citizens, corporations, and government agencies. As with any exercise of power, responsible, ethical conduct should be enforced through certain checks-and-balances of power. Giving the administrative branch even more ability to peep into our private lives, at this point, strikes me as a very Hobbesian solution—no less dangerous than keeping the Internet a free arena that it is today.