Twitter's Moral Flaws, with Mark Hansen

July 23, 2018

CREDIT: Matthew Oliphant (CC)

DEVIN STEWART: Hi, I'm Devin Stewart here at Carnegie Council in New York City, and today I am speaking with Mark Hansen. Mark is director of the Brown Institute for Media Innovation. He is also professor of journalism at Columbia University here in New York City.

Mark, great to see you today.

MARK HANSEN: Nice to see you.

DEVIN STEWART: You're known for a couple of things including an installation, a work of art, at The New York Times headquarters in New York City in the foyer. In the lobby area there is a very interesting installation. It's a mixture between data, journalism, and art. I saw it a few weeks ago and was really struck by the impact of it. Can you describe that installation and what it's trying to say?

MARK HANSEN: From a visual perspective you walk into the building on Eighth Avenue between 40th and 41st Street. Ahead of you is a 65-foot-long hallway, about 25 feet wide. On either side is a grid of 280 text displays. Each one is about the size of a large Hershey bar. They're hanging from wires that drop from the very high ceiling. So basically you're faced with this grid. As you walk into the lobby and into the portion of the lobby that is between the two halves of the grid, you start to notice text that's floating by. The text is all choreographed into a series of "scenes." We borrow theatrical language for it. Each scene tells a different aspect of the journalistic outlet and how it was made.

Some of the scenes are all-over scenes, they mine the day's paper for the figures, the number-item pairs, so 200 of this or 500 of that or $50 million or whatever it might be. Some of the scenes again all over are looking for just the quotes from the day. Then others are devoted to particular sections. We have one devoted to the wedding section. We have one devoted to the obituaries. We have one devoted to the letters to the editor, and each one has a kind of choreography that is both visual because the text moves across the screen and between the screens on the grid. I should say that the grid is set so that there is about as much space between the individual displays as there is a display.

Each of the scenes tells you something different about how the paper was made and gives you a slightly different look. In a perfect world the piece goes to sleep at night and starts drawing not from current news, the current text of the paper, but instead from its archive going back to 1851, so it goes into a kind of dream state. The dream state is nice because the old papers are the victim of bad optical character recognition, at least the version of the archive we have, so those mistakes that get made when they get played out on the screen make it feel like it's dreamy and partially there.

The purpose of the installation is simply to try to channel the energy that is going on on the floors directly above, which is where the newsroom is located, and to give you a glimpse of how the paper is made.

DEVIN STEWART: It is a lot of sense of movement but also disembodiment. I don't know if you were trying to express that as well, text disembodied from the context.

MARK HANSEN: Right. That's part of the work that my collaborators, Ben Rubin, and I have done in various pieces, text disassembled and then put into juxtaposition with other text activates it in a way. We did something similar for the lobby of the Public Theater. There's a chandelier over the bar in the main lobby, 37 blades. Each one is a LED screen for each of Shakespeare's plays.

Then we do a kind of anaphora mining where we look for patterns of not only actual word usage, so the same words repeated across several plays, but also patterns of parts of speech because some poor humanities graduate student or series of graduate students have created a corpus where every single word in every single piece of Shakespeare has been labeled with a part of speech, and not just the five parts of speech that I can come up with because I'm a statistician, but 150 parts of speech, so it's a kind of intense labeling.

We can bring an interesting activation by repeating the kinds of phrases or even looking at the hyphenates that Shakespeare created. By looking at patterns in grammatical structure or patterns in actual word usage or maybe hyphenates and so on we can start to show his writing style among a lot of the different plays and in so doing create something else.

The phrases that come out are frankly beautiful and set against each other are even more beautiful. To me, the basic lesson is that no matter how hard you try you can't kill the beauty of Shakespeare. It's always going to come out in some way.

DEVIN STEWART: It's intrinsically beautiful.


DEVIN STEWART: The New York Times installation, does it have a name?

MARK HANSEN: It's called Movable Type.

DEVIN STEWART: What has the reception been from that?

MARK HANSEN: Oh, gosh. We put this piece up quite a while ago. I guess the reception can go in phases. The first thing was we hung it up, so 560 screens hanging from the ceiling. Someone's got to program it, so Ben and I were there on a folding table most days. I would stay until 8:00, 9:00, 10:00 at night.

The journalists who were in the building would come up and go: "What are you doing here? What is this thing hanging in our lobby? What does this mean?"

I met a lot of really great people that way. In fact, that's how I got to spend my sabbatical that year at The New York Times R&D lab (research and development) and eventually got to meet a lot of the Times people because of that period, people coming in and just wanting to know what it was that we were doing and could they program as well.

DEVIN STEWART: The journalists had a fairly positive opinion of it? I noticed that you can't ignore it when you're in the lobby there.

MARK HANSEN: We haven't really mentioned this, but it also makes noise. There is an embedded Linux processor on the back of each one of the screens, and it has its own sound card and a speaker together with other devices to make noise. So the thing is kind of chatter-y. It calls a lot of attention to itself, but not so much that it would drive the guards crazy. That's the line to be drawn.

The journalists had varying responses. Some came in and were trying to guess what we were doing. At the end of the letters to the editor scene you end up with a list of names, places, and dates, so the name of the person writing the letter, where they're from, the city they're from, and then the date of the letter.

I had journalists coming up—when the scene was all played out you just have these names—asking: "Who are these names? Who are these people?" I'd sit quietly and see if they could figure it out because there's a whole choreography that goes along before, so it's very clear what you're looking at, but at the very end just for a couple of seconds it holds this and then disappears. They're like: "Well, it can't be Iraq War victims because there's too many women. It can't be this. It can't be that." It was a kind of puzzle to them for a while.


MARK HANSEN: Then some had, because everyone's a critic: "Why doesn't it do this?" or "What about that?" or "Can you try this?" Sometimes those were good ideas.

It definitely was interesting. Programming is hard, period, because you introduce bugs as you go, there are errors, there's whatever, so programming itself can be an intensive process. Programming with other people watching is hard. Programming in the lobby of a busy midtown skyscraper is really hard, and then when that skyscraper is chockablock with journalists who have a lot of questions it's super-hard.

But at this point it runs on its own. Every so often the guard Kenny will call me. He'll say, "Mark, it didn't come up today." We log in and turn it on and make sure it's happy.

DEVIN STEWART: The other project that got a lot of attention, one of your many interesting projects, was an article called "The Follower Factory," which appeared in The New York Times a few months ago. It was a data analytical project on trying to understand the scam of followers on social media, if I'm correct. Can you tell our listeners a bit about what went into that research and what you found?

MARK HANSEN: To make the bridge for a second, my training is in statistics. My Doctorate is ages ago from Berkeley. I went directly to Bell Laboratories, and there I had the good fortune to be involved in a program that paired artists and engineers to just let them make things. It was a revival of something called E.A.T., Experiments in Art and Technology that Bell Labs started in the 1960s, and they revived it in the 1990s when Lucent, the parent of Bell Labs, was flying high.

That's where I met my art collaborators. We've done a number of projects around the city and in various other places that draw on data in some way and storytelling.

I have this job at Columbia now to try to mix computation or to mix technology and story in some way. I think they hired me in part because I have this media/art background and the capacity to be very technical in terms of thinking through what data and computation mean, but also a kind of sophisticated background in how those stories can manifest themselves through non-traditional standards, through lobby artwork or through a variety of other means.

Six years ago I took this job at Columbia, and it was a big shift from being a professional statistician to now being the computational person in a journalism school. I teach a computational journalism class in the spring. In the spring of 2017 the class was anchored on what people had come to call "computational propaganda," the ways in which the networks that we rely on for information can be gamed. People can gain outsized voices through various forms of either automated help—bots and so on—or through coordinated activity.

We started to teach a class about—we grounded our computational class in those questions: What does it mean for something to trend? What should it mean for something to "trend," because that word is quite powerful? How do trending algorithms work? Largely we don't know exactly how things are implemented by the platform, so trending algorithms, recommender systems, artificial intelligence (AI), machine learning. We ran the students through their paces.

One of the things we did along the way when we were thinking about trending and what makes something trend was think, Well, how much does it depend on your individual posts getting re-tweeted and so on? So we bought 2,500 followers on Twitter just to see what we would get.

DEVIN STEWART: How did you buy those followers?

MARK HANSEN: You first Google "How do I increase my followers on Twitter?" One of the first things you get back was at the time it pointed to a company called Devumi. In part the reason why that company came up first is because its CEO was an ex-search engine optimization person, so he knew exactly how to get this positioned right.

DEVIN STEWART: It still exists, that company, right?

MARK HANSEN: I believe it still exists, yes.

DEVIN STEWART: They have a fake office?

MARK HANSEN: I'm sure it's severely hobbled at this point.

If you look at recommender sites, sites that recommend this particular—okay, so now that I see that I can buy followers because they were advertising increasing your follower rates, "Just give us the money, and it'll go." Now that I know that's a thing, are there other services? There are services that compare the different services, and Devumi kept coming out on top I think in part because this CEO not only understood search engine optimization but also created a series of ranking sites that were all linked together because they shared common Google analytics tags. It was a kind of web. So we were led to Devumi no matter what.

We bought 2,500 followers, and we didn't quite know what to make of them because they looked amazing. They looked like real people. There was an image, a little bio pic, and then there was a pic in the background, maybe some kid with ski goggles in the circle on the left, and the background is maybe someone shushing down the slopes, and then there is a pointer to a Facebook site where this kid has all his skiing pictures. So you're like: "Wow! Either someone has gone to a great deal of effort to make this bot look real, or this is a real person." I couldn't quite figure out the economics of it because we paid fractions of a penny for each of these people, so it didn't make any sense. Why would they want to re-tweet our poor account?

I should have said, we created an account that had zero followers—there were a couple by accident because a couple of our students followed it, but other than that—and we hadn't really tweeted anything, but immediately we had 2,500 followers.

As we looked a little bit closer we realized that the—and it was an embarrassingly long amount of time for this, for me anyway—login names were just a little funny. If you were to look really closely, what should be a lowercase "i" had been turned into a lowercase "l." What should be an "o" was a zero. What should be a single underscore was a double-underscore. What this company had done or what someone had done was simply copied the whole profile and then changed the name in a way that didn't make it look suspicious because the person's screen name really only appears in two places, in the URL bar in your browser and then also in a small gray font in Twitter. They're not prominent enough to make that distinction that you would notice.

We thought, Gosh, these are all copies, which means these are all in effect stolen identities, so let's see how many there are and who else bought them.

We took the 2,500 and said: "All right. Who do these 2,500 follow?" It turns out that Hilary Rosen was followed by 2,400 of our 2,500 bots, so we thought, Well, maybe she bought something, too. Then we kept scaling up and down. We got 10,000 or 15,000 such. At that point, we started noticing things about them, like some of them were minors.

DEVIN STEWART: Who is Hilary Rosen?

MARK HANSEN: Hilary Rosen is a CNN political commentator. We found Kathy Ireland in this mix and so on.

We spidered the network and came up with a list of these copies and then started looking at the kinds of the things the copies were posting. The original might be a happy couple posting about their honeymoon in Hawaii or something, and then their doppelgänger is posting in five different languages, some of it pornographic, some of it political. It doesn't look consistent with what their stated interests and so on are. One of my students had noticed that there were a number of examples where the identity that was taken was from a minor, and now that minor is tweeting porn.

At that point, we thought: All right. We have enough stuff. Let's package this up. I really pushed on the students to pitch it to the Times: "We should get this out somewhere. Let's get someone behind us and report it out." We pitched it to the deputy investigations editor at the Times. He took it, and away we went.

Nick Confessore was assigned as the reporter. He did an amazing job of following the story of the company itself because it turns out the company itself doesn't actually make any of the accounts, they just buy them from a third party, from BlackHatWorld. They're listed on BlackHatWorld. They're an arbitrage scheme of some kind, and they charge a lot more than the original makers charge.

He was investigating and tried to get very close to the CEO of this company, who was sort of slippery. There were a lot of interesting details in the article about how he had outsourced a lot of his development work. One of the developers who was working in the Philippines took the code for the site and created a fake version of his site. So now there's a fake version of this site that sells fakes. It was beautiful. The layers of it were just glorious. It was glorious to watch Gabe Dance and Nick Confessore do their work. They're really amazing reporters.

In the end—I don't know if you've seen the piece online, the interactive graphics Rich Harris put together, and they're just beautiful. They tell the story very cleanly.

But there was a lot of computation involved. There was a lot of work to bang at Twitter and so on.

The end result, the recent Twitter purge that happened [recently]—

DEVIN STEWART: Purging fake accounts.

MARK HANSEN: Right. They quoted that story as being a reason for doing it. In fact, the day after the story came out in January, millions of accounts went away. We could see all the accounts that we had pointed to largely disappeared, the doppelgänger accounts, which led me to wonder why they were still there if they could be removed so quickly. We didn't provide a complete list to anyone, so Twitter must have known or used this recipe and very quickly spun something up. If that were the case, why didn't they do it earlier? I don't know.

DEVIN STEWART: Any guesses?

MARK HANSEN: I think the basic problem is—I could speculate but I'd rather not—that a lot of these networks perform computations that we don't ever have access to. We don't know how trending is computed. There are a lot of things that we don't understand.

At the same time, more and more civil discourse is taking place on these networks, and they are propping up a kind of democratic function, and we don't understand anything about it. It feels to me like it's ripe for journalists who know something about computation to get in there and really start to ask those basic questions and to hold these networks to account.

People like Jeff Larson and Julia Angwin have been doing that sort of thing. I felt very grateful that our students could contribute to something like this, that like I said did at least have an impact on a network.

DEVIN STEWART: Do you think social media is doing enough to address fraudulent identities and make influence actually legitimate?

MARK HANSEN: If you asked me this before the purge, I would have said no. I haven't done a close enough look to see if the accounts that we estimated as being fraudulent or not or fake or not are gone completely.

I think the larger issue is the operations behind concepts like trending topics, things that take—I can choose to follow or not follow a series of people, but then this trending piece is supposed to tell you something about what's going on more broadly, what that global conversation is about. If people through coordinated activity, whether that be by mobilizing a series of automated accounts or maybe half-automated accounts or maybe just you and 50 of your friends do something a thousand times can get something to trend, it feels that's not exactly—that's an issue. Because we don't know anything about how the trending works we're left with lots of question marks.

I think your question is partially about what to do about automated accounts, but it's also what's the result: How does someone get a voice? How does someone become an influencer, and how is influence wielded, because we really don't understand that properly on these networks I think.

DEVIN STEWART: There is a lot of anxiety about social media's impact on our well-being, on our psychological health, as well as the health of our discourse and democracy. Where do you come down on that debate about social media's overall impact?

MARK HANSEN: I've seen some of this before. One of the very first artworks—not to bring it back to this—was something that appeared at the Whitney. It was called Listening Post, and it looked a lot like the piece in the lobby of The New York Times. It was a curved display. The screens were smaller. We had a synthetic voice at that point that could put 250 voices in the room. It was something Bell Labs created. It sounded beautiful. It was slightly British inflected but not like an Apple voice or something like that, and it was a voice that nobody had before, so it was interesting.

DEVIN STEWART: Apple voice? You mean like Siri?

MARK HANSEN: Yes. Or one of the Mac voices.


MARK HANSEN: Just to try to say that we could take 250 voices and layer them over the top of each other, and they would sound like a crowd, which was amazing.

Then what we were channeling was a lot of Internet Relay Chat (IRC) chat rooms and bulletin boards, like the Yahoo! News comments and those sorts of things. In a way social media is serving some of that function. It's a way for people to communicate one-on-one. We would spider these IRC networks, and you would have a politics room or a NASCAR room or whatever, and we would sample a bit of the text and show it and group it in various ways and play some of the kinds of text juxtapositions and clustering and so on that we do with the Times piece to tell you what's on for the Times. We can tell you what's on for these large IRC networks, what people are talking about.

We tried to create a sense of a global conversation. This was before Twitter branded that as a thing.

It was awful. There were moments that were just—the piece debuted after 9/11, and the amount of vitriol was horrible.

DEVIN STEWART: Because of the cacophony or why? Horrible in what way?

MARK HANSEN: No. Cacophony is, I enjoy that. I'm a statistician. We love noise. Noise is good. Noise is a friend.

DEVIN STEWART: What was the criticism?

MARK HANSEN: Instead, just the kind of hateful commentary that you would get, anti-Arab, there was a lot of talk of terrorism and a lot of the same type of talk that we're hearing now. I suppose then the distinction was that it was happening off in these IRC rooms, which were a little isolated and you had to know about it or whatever, although they were hosted on America Online (AOL) and various other places.

I know this was a long time ago, and it's not fair to bring that up, but there is something about that that feels very similar to me. I remember when we went into the Whitney there was a question about should there be a stanchion outside that warned about the language. I said, "What do you mean by that?"

They said, "Obviously, it's like the four-letter words and all that."

I'm like: "Yeah, I can filter for those. No problem. I won't, but I could. We could easily remove those. The bigger problem is going to be the sentences that have no four-letter words in them, every single word is a legitimate word, but the thought expressed by the accumulation of these words is horrible."




MARK HANSEN: Just bad, really bad. So what do I do with that? I have no idea how to filter for that, and I think that's some of the same things we're coming up against now. There are things that you can't filter for. Natural language processing (NLP) is not up to it.

I think these are things that we've experienced. I don't claim to have a solution except to say something about the openness of a system about how things are, how things are operating can only make these systems function better, can only help them support kind of a democratic purpose.

At the end of the day I'm a hopeful man, and I see Twitter twitching slightly because of all of the bots and all the attention that it has been getting about fake accounts. Facebook can't seem to make a decision that people find reasonable, it seems that every one is just a new can of worms. So Facebook is kind of wobbly, Twitter is kind of wobbly. I don't know if you've used the auto-suggestion on YouTube lately, but it's a mess. So Google is probably a little wobbly.

There's a moment now that is different than before. Before there was no oxygen for anything else. Nothing else could kind of emerge, and I feel like there is a moment now where an alternative could emerge. Something could happen that maybe has got baked within it from the beginning the basic principles that are required for—I will say selfishly—a kind of vibrant, sustainable journalism but also supports basic interaction—


MARK HANSEN: —basic civil interaction. Instead of trying to bolt this on and try to change how things are, can we start with something fresh? There is some air now to start with something fresh and build it up, at least it seems to me.

DEVIN STEWART: Have you been following the recent trend of blocking anyone who is rude on Twitter? That has been very trendy these days, very popular.

MARK HANSEN: It is very popular. You should have my friend Jameel Jaffer from the Knight First Amendment Institute come because they were the ones who brought the suit about the president blocking people, claiming that by doing that he is depriving them of access to a corner of discourse that should be open to the public.

DEVIN STEWART: Sure. It has also become trendy among people who are worried about civility and etiquette or mutual respect to block anyone who goes after you personally.

MARK HANSEN: Sure. I think these are good examples of what a designer has imagined in their head how people would use a particular system, how communication happens, how conversation will happen.

One of the things I teach my students from the beginning is that every piece of digital technology embeds within it a model of the world, a use case, and then argues for that use case. If what we have at our disposal to somehow get control of a situation is blocking things or refusing to listen to or closing ourselves down or whatever, that was someone's idea of the best way, and this is how it should be used. In this moment where something else might emerge maybe there are other more nuanced somethings.

The other issues I've heard are that blocking mechanism becoming quite cumbersome when lots of automated accounts are on you. That becomes now a really difficult process to try to block everybody who is coming at you. It becomes adversarial, and so what do you do to stop that? We haven't really looked much into what that looks like.

DEVIN STEWART: Before you go, if you would just let us know about your concerns about the 2020 census in the United States. You mentioned earlier some concerns about that.

MARK HANSEN: Yes, when we were talking before. Back when I was a graduate student I was fortunate enough to be part of a case that a series of cities brought against the Commerce Department insisting rightly that there was a known differential undercount of minorities living in big cities. What do you do about that? The Commerce Department had come up with an adjustment technique, and some of the people in my graduate program were asked to evaluate that technique. I was part of doing the computations for that group. Since then I've followed the census pretty closely.

For 2020 the census is taking place in a really bad political context. The question around citizenship, immediately people are asking, "Will that suppress participation on the part of certain groups that should be counted?" The census was designed to count everyone living in the United States, not just citizens.

We're starting to see other kinds of misinformation campaigns that are not dissimilar to what we saw around the election, the idea that these networks might be used again to discourage or misinform people about whether or not they should participate in the census.

I've been working on a project to try at a local level to bring local newsrooms together with local demographic, social science, statistical knowledge which maybe comes from a local college, community college, or university, get the newsroom collaborating with them to start to tell stories about the census and its impact locally.

People don't often understand how much depends on the census, that it is the base map for so many things that happen. If you want to get a loan for a small business and you want to estimate the market size, you'll go to the census. School lunch programs are funded based on census counts to get a sense of how many people qualify. Infrastructure money: $600 billion will be given away over the next 10 years based on these counts, and if they are systematically undercounting certain groups then there is a real problem.

What I'm hoping is to get these clusters to tell stories about the importance of the census locally so that people understand what it is the census does and represents because in this political moment there is huge distrust of the government, of the government taking your information, and so on. The project is to get people to understand that.

It is also to locally help people understand if the census is going to do a good job in their community: Is the bureau gearing up? Do they have the resources to adequately count everyone? Because this is a new kind of census, this is an online census, meaning it's online and by telephone first, so there are all these complexities that are going to come with that and new people who are vulnerable to not filling it out. Who are they and how do you make sure they get counted and so on?

I think the bottom line for the project is to try to help people be better informed as they fill in their census form, to really understand what it's about and what it is they're doing as they exercise their constitutional right to be counted.

DEVIN STEWART: Mark, thank you so much for coming today. Mark Hansen is a professor at Columbia University's journalism school. It was really great to speak with you today. Thanks, Mark.

MARK HANSEN: Thank you for having me.

blog comments powered by Disqus

Read MoreRead Less