PARTICIPANT: Okay. Sorry about that. We had to figure out the projector.
Hi, everybody. Sorry about the couple of minute delay there, had some challenges with the projector. But now we’re good. So welcome and welcome to SRCCON, exciting to be one of the first panel or sessions of the day.
So my name is Ryan Murphy, and I work at the Texas Tribune, I’m on the data visuals team, and we’re the team that’s responsible for doing many things. But one of the major tasks that we have is maintaining and kind of iterating on our major databases.
So this includes the salary database, which we’ll talk a little bit about, prisoner database, and a number of others. So there’s the reason that I initially pitched this talk was—I had come up on I guess it was over six years now. I kind of had the salary database dropped on my lap as an intern when I started intern with the Tribune back in 2010. And one of the things that has been kind of an ongoing kind of internal and external conversation is kind of, like, are we doing this in the best way? And what is the proposition of taking this information, which was certainly public information and broadcasting it to the world? And kind of what are the ramifications of that? And kind know, how to fairly and effectively manage kind of the—you know, all the people that are suddenly finding themselves whenever they look for their names.
I’m, you know, we’ll stress that by no means am I giving this talk because I’m an expert on this. This is very much—this is one part kind of therapy session for me. Because this is something that I’ve struggled with and I’ve had many conversations with both inside organization and out. And, you know, I think that—and, any sitting right here in front has too. And I think this has been a good opportunity to have this conversation. I think we’ve seen variations of this both kind of in our sense where we’re taking public data and putting out there. But you also see it in datasets that aren’t so much on the nose kind of, like, straightforward, we’ve asked for our agency to give us.
Let me see if I can find how to advance the slide, I’ll keep going.
If you didn’t see the schedule, there is an—I probably shouldn’t promise, but I did put the links on some of the stuff we’ll touch on here, if you want to click through the screen and see them, these are just screen shots.
PARTICIPANT: I’ll take them.
PARTICIPANT: Thank you we’ll also at some point kind of move into groups and get a little bit closer. It probably—the tables may be okay for it already. But the plan is kind of to kind of touch on kind of walk through some of the scenarios that we’ve experienced. Again, of what applies to our situation. But it kind of go through and see, you know, kind of get crowd source kind of the reaction to some of those. A lot of the e-mails we tend to receive.
So from our—you know, as I was kind of trying to go through and try to find good examples of this, the one—I reason that I pitched this is that the ones we have are the ones that we’ve produced. And for us, like, if you’ve ever heard the story of the Tribune, it usually is the first thing you usually hear someone say is they have that salaries database, and it’s half their traffic.
I would say half of the traffic is not true anymore, but it’s not that far off. But it certainly is kind of one of our I guess you could say claims to fame in things that people usually think when they’re there, you know, talking about. Kind of like that filibuster and those are our things.
We also have the prison database, which is our interface on top of the Texas criminal justice’s dataset of the prisoners that are in their system at this time. They also have a search tool that is not great, but it’s technically there. Which was one of the initial kind of pushes for us to spin off and do our own thing.
I will say that this two predate pretty much that’s at the Tribune right now. So there’s some kind of—when they came to be that we were not there for.
More recent ones include faces of death row, which is a prison off the database, went through a lot of pain staking work to get the criminal justice t the—who’s on death row. To look through and constantly updated as the numbers change.
And more recently we did was in our borderline security series, which was looking at the cases of convictions of border guards and others along the border that have essentially kind of taken advantage of their position and then kind of went through and categorized all of the different things that they have done. Behind this photo of this guy here is actually the story behind it as well.
But another kind of recent example of what we’ve taken into the sense that are about individuals and putting them online.
I think other, it’s easy to find examples of doing awesome things. But I think they have really great, their own kind of version of this exactly with the docs, which is showing all the kind of doctors and what companies they’re getting monies from. And I think it’s the more recent prescriber checkup, which is a variation of that, which is looking at kind of what doctors are prescribing certain medicines and what the money that gets involved with that as well.
And I think this is probably the oldest example of a lot of this, again, not claiming to be an expert on the history of doing this. But you see this along with mug shot galleries. Same kind of idea. Tampa Bay.com of course one of the ones that was kind of I won’t say with certainty it led the charge, but prominent examples of that. But it’s not uncommon to find in many, many papers and dedicated sites doing this and all kinds of layers of interesting legal things to that.
And I think that, you know, one thing that kind of, again, part of the therapy session is that one thing I always kind of have internal kind of rangling with is how do you keep these things both journalism and not let them creep into borderrism? I think—I did not when I named the paper, but you’ve already seen it. But this is on their home page.
Like, here you can pick our games, look at our data guide, or our booking mug shots. And it’s kind of where—how do we find that balance between, you know, yes, the data’s out there, yes, it’s public. There’s not any legal thing that’s stopping you from doing it. But where’s the point do we stop and go okay. What’s the end-all here? And I think for us at the Tribune with many of our things, that’s a conversation we’ve had to step back and have and think about.
And you see other variations of this, there’s more stuff that came out that headline as well because I grabbed this a while ago. But you also see this in the bulk data bump situation as well. You know, there’s—we’ve not really done a version of it but kind of the here’s this big dataset, we don’t know what’s in it. You know, help us look through it together. Or here’s this big dataset, we don’t know what’s in it, but it’s probably something in there good, let’s go. And the challenges that come with that in terms of, you know, what potentially. You know, people that may be in there that aren’t of interest for that story but just as a by-product of their communications, now suddenly out there as being a part of it. And I do also open up. Sorry. Just—are there any other examples of this? Again, I’ve no expertise in collecting these, but I would be curious for just a second if anyone had any other good examples of –
PARTICIPANT: I’m doing a database basically people subsidies. Kind of like 2 million, and we treat them the same way and that’s –
PARTICIPANT: That’s a great example. Go ahead.
PARTICIPANT: The most important example here is the gun permits.
PARTICIPANT: Yeah.
PARTICIPANT: Because that I think illustrates for real, the real talk; right? Was it New York paper that got the list of heralds and legislature responded by it and making the information private. And that is one of the downside. There’s ethics, which some people care about. But there’s also, like, are we going to get this data?
PARTICIPANT: Yeah. That’s a very, very good one. . And to kind of your thing, that’s your point about the size differential as well. That’s definitely something in the salary database as well. The big sell is look at how much coat makes but there’s also thousands of other people that that’s not the kale that they’re at but they’re in there because that’s a bulk request. Yes?
PARTICIPANT: So when I think about a lot is voter registration data. And not publishing all of it but publishing some of it.
PARTICIPANT: Yeah. I don’t know if anyone—in Texas it’s—I’m sure it’s not unique to us, but a favorite thing to do is on the registration season, dredge up the voting records of every reporter and that’s always enjoying. And people should vote. But, like, in Texas, it’s interesting because you have to choose. You have to choose a side in the primary. You can’t just—you have to register as Democrat or Republican.
PARTICIPANT: And in some places you have to pay for it.
PARTICIPANT: Yeah, there are instances where it gets used as a weapon or trying to make a point of some sort with that. That’s a very good example too.
PARTICIPANT: California publishes data about people who don’t pay their taxes. So they give you a notice and then after several months if you don’t pay, it’s basically, like, the top hundred or something like that in terms of income. So it’s, like, a shaming technique they do to try to get people to pay eventually.
PARTICIPANT: I’m from Germany, very privacy aware. So all of these things are, like, wow. You’ve got all of that data.
But, for example, what we have and what I find shocking is bankruptcy data. If someone goes bankrupt, it’s public in Germany. Because I don’t know to protect debtors or people they owe money to. So I’ve received a couple of requests, for example, from people who are bankrupt somewhere that should please remove that from Google because, you know, it’s not nice if you search the name, and it says bankrupt, they don’t get any kind of business connections.
. We’ll kind of get to here with some of the scenario kind of talks. But we have a variation of that too with salary database, which is, like, people will show up in it and they’re, like, oh, trying to get a job. It shows me over here making this and the first thing someone does when I try to get a job is type my name in Google and the first thing is you and is that fair to me? Is that, you know, what value is being provided by potentially making that.
Your example about the California thing also reminded me of—I know—I see this in unique in Texas but see it in a lot of Texas cities, which is the access to water usage, which is the fun stories all the time. Look how much water Lance Armstrong uses. But it’s also, like, again, a lot of people at the data asset Seth that are not at his level who live in a nice house in west Austin that show up that potentially are kind of a—that you know tool of kind of, like, being able to look up your neighbor and see where they stand next to you.
PARTICIPANT: Yeah. They’re—the reason I remember that one is because I think there are some pro social—there’s a lot of terrible uses of this information, it’s really bad. But in some cases, like, that one in California, it’s pretty big revenue generator for the state, and it’s important to them. And there are people that should pay in Texas.
PARTICIPANT: There’s also that campaign finance example with that guy who made that Twitter bot for Donald Trump for all the people who donated to his campaign.
PARTICIPANT: I haven’t seen that. That sounds awesome.
PARTICIPANT: It was not as awesome as it could have been.
PARTICIPANT: Did everyone hear that one?
It was the—do you want to say it again?
PARTICIPANT: Oh, sorry. So somebody made a Twitter bot of all the people who donated to Donald Trump’s campaign. So the tweet it basically tweets out the person’s name and how much they donated. I think what their job is, what they do for a living, and where they live.
PARTICIPANT: Yeah. Essentially just a copy of what—online. Yeah?
PARTICIPANT: Going back on the water database, I think one of the things as journalists we have to consider is sometimes I think just slapping up a database without any context. But that’s where I kind of have those ethical questions. So you may have somebody that shows up as the top water user, but they also might have the largest property. So is it by square footage? Or maybe you want to use that data—for like in Portland here where I live, or in Austin, there are a lot of leak certified buildings. So are they using it before they were leak certified? There are ways we can put that data up there but not necessarily just throw up a dataset with names and numbers without any context. I think that’s where you need to I think that’s where we need to move closer to that context.
PARTICIPANT: Yeah.
PARTICIPANT: I think one of the things that has to happen especially with data in this day in sage why am I putting up this data? And is there an actual reason that I can speak to the person who might be exposed to why this was important?
Because too often Incas journalists we say something is newsworthy and what we mean is that it might pique an interest, but we’re not seeing what it makes. So someone’s water usage is, like, okay. We’re in a drought and this is a problem and this is not being regulated, so a person could actually answer what’s happening. Someone who just got a mug shot for shoplifting because they don’t have enough food. Why is that newsworthy? And there’s that relationship of kind of how often does it come into it? Especially when we just throw up databases that is—that is really lost. And especially with what happened with WikiLeaks in Turkey. People didn’t know enough Turkish to understand what’s in those e-mails. So there was complete lack of this is newsworthy but what are they saying? What is the news? So people don’t know and no one has an answer outside of you thought it was good and all information should be free.
PARTICIPANT: Yeah.
PARTICIPANT: To me, there’s the equivalent of, you know, when it comes to reporting our stories, we take great in what information we put out there. We haven’t done something and put that context around. And I think there are has been this, like—let’s just put a database up online because we can. Without that context. And I think that that’s the key because if we add the context around it, newsworthy justification, if we actually vetted it and looked at the data and kind of responsibly put it up, then I think that strengthens our case to the public for why we did what we did. And I think that that is also extremely important because I think often about you have so many states with evolving public record laws and we do more and more in the data work, those laws are going to catch up because right now most laws don’t even address some of this work.
So if they keep seeing us put things up here responsibly or there are legit criticism over why we might do something, that might hurt a lot of longer—have longer effects where we might not be able to do some good journalism and get this data because state laws will change as a result.
For me, that’s the fear. If we put this out and state law changes because of it, is it worth it? Is it worth publishing it in this context if it means somewhere it might mean you don’t have access to that data anymore?
PARTICIPANT: Yeah. Just kind of the point the gun permit stuff as well.
PARTICIPANT: Yeah.
PARTICIPANT: Like, you know, the dataset that did it just kind of lose that. Or we see that in Texas, again, that’s, like, there’s been a lot of pushes to not let birthdays of any public employees get out. Which, like, you know, at the face of it, I get, and it’s a by-product of a lot of the just kind of interesting—just the fact that it’s thrown out there. Which a lot of times, that’s a data point and, you know, a security request he. Or validating who a person is when you’re trying to get a password back. And whether that’s distributed that way is another conversation. But, you know, that’s the kind of thing that we’re seeing. We see in Texas that, like, you know. At one hand, it helps as a check of, like—in that kind of verification. But one bad actor is going to remove that level of—go ahead.
PARTICIPANT: The data birthing just drives me crazy. I’m pro data birth. I’m a news researcher, I’ve been looking at public records for 20 years. And without a date of birth, I don’t know how to verify people.
But I understand what you’re saying. I think the biggest issue is the identity theft. And I would like us as a society to sort of, like, move to not necessarily okay. Let’s not put any of this information out there but kind of come up with different checks or different ways to just stop creating IDs based on a date of birth and name. Does that make sense?
PARTICIPANT: Yeah.
PARTICIPANT: I mean the old way used to be okay. You went to the library, and you look, and you get someone who died their name. As far as law enforcement or technology or hacking, we also need to—that needs to be the next discussion. It’s not so much—the answer isn’t pulling the information it’s –
PARTICIPANT: Just the bigger intention of our usage of it, but what systems are in place to make that more –
PARTICIPANT: Yeah. I grew up in Massachusetts, and it used to be that your social security was your driver’s license numbe And so they kind of got to that point of huh this probably isn’t the best thing to have, so they changed that and made it a little bit more difficult or the breaking news became aware of it.
Driver’s license are completely different issue, whether or not that should be public record. But I think just having that change is a nice example of ways that we can adapt without restricting completely. If I’m making any sense. I have anemia and half of coffee.
PARTICIPANT: And another thing I want to bring to discussion is maybe something is newsworthy today but maybe not in five years and it still will be online. And what we’ve had a discussion in Europe right now is the right to be forgotten where court of justice ruled that something like that can be removed from search engines.
And now there’s the question of, like, can small, normal people but also lobbyists are moving between licenses, also possible for that. And the other aspect is some people pay companies to do participation management and then their results will drowned in page five and basically are no longer visible. And those who cannot afford that or don’t have means, but they still stay up and that’s also.
PARTICIPANT: We could build this in technology; right? There’s things like blogging and all of the systems that we could build that things are automatically disposed of; right? We could do that, but we don’t. So I think that’s another point in the discussion that there are technology solutions that we could be applying and pushing for, and I’m coming from the tech space definitely on this side. That we could be doing, and it’s not happening yet for sure.
PARTICIPANT: Yes?
PARTICIPANT: I’ve also noticed there’s not a lot of discussion about what a particular dataset looks like in the context of what’s already online.
So there’s been a couple of studies that came out recently that based on if you can find somebody’s Facebook profile, their LinkedIn, their Twitter, an phone number, you can guess like half of their social security number, and you have chance to their passwords, you have the chance to gain their whole social.
PARTICIPANT: Yeah.
PARTICIPANT: So I try to think about, like, what I’m doing in that sort of larger context as well, but it’s very hard, and we don’t have really tools or a way to think about that larger context yet either.
PARTICIPANT: Yes?
PARTICIPANT: I think similar to that, the idea of unanimous datasets that are really easy to recover the real identity of people, and I think that’s—those of us here understand a lot better. If somebody doesn’t have a lot of experience in tech or whatever has a dataset. If I type in social security numbers in alphabet, and what could be scarier from a individual privacy perspective is the thing that could be reconstructed with the next dataset that comes out; right? And we have no way of knowing what that’s going to be. The story I read recently was New ride share records, anonymous records that are linked together. Women were at more risk because they used the system less.
So you can figure out what station the person who had just left on a bicycle was going to with, like, 60% probability.
PARTICIPANT: Yeah.
PARTICIPANT: Yeah. I mean, like—oh, I was just saying –
PARTICIPANT: I keep making sure I try to peek behind that problem.
PARTICIPANT: There was one case where Google Maps had recording service, it was an amazing piece of recording where their URL was too short like five characters it was goo.gl/abcde and they’re, like, that’s not that many combinations, we’ll just make every possible URL and see what the maps are. And it was, like, 123 Broadway street directions to the abortion clinic. Huh I wonder if—we basically figured out who went and had an abortion and all we had to do was guess a five letter code and now, like, your address and the fact you went to a abortion clinic is available to anybody. And they were, like, hey, Google, fix this, and Google actually fixed it. They made it too long to be able to guess the code. But there’s probably—if Google made that mistake.
PARTICIPANT: Others.
PARTICIPANT: The most popular mapping service in the world, there’s probably so many these technology things that are in.
PARTICIPANT: And your point about the anonymousizing, it’s kind of almost in the sphere, but I think it was a buzz feed story where they did the tennis cheating and anonymousized it and someone said we reversed and entered that quickly. We know exactly who you’re pointing out here. That was the same kind of thing where even kind of your point about, like, Google even had this happen even our attempts to make anonymous, there’s still the risk that we missed a step, and we considered it good to go. Again, tennis is a little bit different, sorry, but kind of the same spirit of that. The due diligence is done but even in that case, it’s not enough.
So we kind of touched on some of this. But I think—I’m happy we jumped ahead to some degree. But we—especially in our experience, we get a lot of—we get a lot of perspectives on it, and these are kind of are the—they’re the major ones that come through. And none of these have a positive or negative assigned to them implicitly. People say through business in both good and bad way. But I think to the point that somebody made about kind of the power structure that’s there, you know, that’s something that kind of is very interesting to me because we in the media and in these kinds of situations, are we, you know, for kind of the point about the bankruptcy data and even in our salary database, there’s a lot of different people that show up in those things that are much more, you know, have more knowledge about, like, okay. Here—this would be the steps I take to get out of this and people have no clue. And have no clue that, you know,—who don’t know the media agency
, people think we are a state agency and think that we are an extension of that and contact us with those kinds of questions.
So we’ve touched on this to some degree, but I will, you know, briefly open it up real quick if what other, you know, perspectives on these kinds of things do you all have? And it’s okay if we don’t. We’ve jumped around pretty good.
PARTICIPANT: Have you guys thought all about the anonymousizing the data and—so that you can understand the trends was the data without being able to look at the individual records?
PARTICIPANT: We’ve thought about it. That third point is what makes that hard. And and the answer to that is the people are above me. So I agree with you. But it’s kind of that. I think especially for our dataset, this is where we have echoes of the mug shot galleries that it—it becomes lucrative to the point where the perspective changes. And that is a challenge. It just seems like—I mean, you know, I don’t even like saying it out loud but it’s the truth.
And so that’s the—again, kind of why I want to have this conversation. Someone? –
PARTICIPANT: You bring up journalism, but profit is that third piece of the triangle where it gets really muddy. It would be interesting to see how do you guys have this discussions to her point earlier, we have a mug shot, somebody makes a choice or whatever, something happens at one point in their life and now that’s searchable, and you have issues of criminal justice equity and you’re, like—how do you guys in the newsroom make that choice on whether this is journalism really good for profit or all three?
PARTICIPANT: And the good point on the mug shots too is that quite often that’s actually not, like—they’ve not been convicted. They’re there—they just got a picture taken of them at the sheriff’s office, at the police station. Like, that’s kind of—all the examples I found, I have this big disclaimer on the bottom, which I linked the Tampa Bay one, and I’ll put others in there too. And all of them have a big disclaimer in them, by the way.
So that’s to that point of, like, you know, they put it up and people came. And that changed that dialogue entirely.
For us, you know, I think that we’ve and as I said, we’ll have to go through them quickly. But some of the scenarios that we’ve actually, realistic ones we’ve been in. You know, I think it comes with good and bad. You know, we’ve seen, you know, people who want out for all different reasons, and we’ve had contacts who is this dataset helped me equalize pay. People say this dataset made everyone aware that the superintendent was laundering money.
Like, it’s their—there are layers to it that get kind of interesting. And then, like, to the point of—I mean they kind of validation or verifying it. That’s kind of the thing that we also struggle with because there are more people there than we could ever individually go through, you know? So there’s kind of that extra layer to it of, oh, well, we would have probably never known this mid-sized school district in north Texas, you know, had this issue until someone looked—from that district looked at it and goes, hey, he’s making a lot more than his contract actually says. And then that was kind of a story that blossomed out of that.
So, yeah, there’s that—there’s a lot of that gray area that gets hairy.
PARTICIPANT: I also have a quick question to ask, how much more does a person have to be—how much more personal data does a person have to release to counteract or interact with this information? And I think that’s a new question that’s been coming up now. Because now that the right social media centers of so much what happens in your life is tried in the court of public opinion, you have to counteract the narrative. And what do you have to do to do so? So if you’re on the water list, do you—are you using the largest, most water that you now have to release possibly justification for why? You have seen to release maybe a water main broke at your house, and you didn’t have enough to spend it. What do you have to do to make this data neutral again as opposed to serving whatever objective that had.
PARTICIPANT: Yeah. And kind of to that point too, we were talking about this before it started. But, like, not everyone has an extremely common name. That battle to, like, step out to counteract that for some people is so much harder. Because if you –
PARTICIPANT: Yeah. There ies aren’t they?
PARTICIPANT: Yeah. I will never have that issue. But that’s not the case for everyone, you know?
So that kind of power level comes into it, comes into play there.
PARTICIPANT: So as I’ve been—I’m working a lot right now with public data and linking it to create profiles of people. And I think, you know—and I’ve worked on a mug shot database that we put up in Arkansas as well.
So I feel like the thing I’m starting to land on and I’m starting to get comfortable with the idea is the idea of practical obscurity. So it’s make it available, but don’t make it easily available. So, for example, for the jail one that we put up in Arkansas. That was behind our wall and then we had stacks, we only put people to it who were charged for felony, and then it expired after 60 days off the site. We captivated that so that we still had it. We weren’t making it disappear from our own records, but it got removed off the site after 60 days. So the justification there is there’s still public value in seeing that whole picture so that you can see, you know, who’s getting booked? And how regularly—kind of get a picture of your local law enforcement community that you can’t get otherwise.
But you had to go want to know that, and you had to jump through hoops to sort of access that; right?
PARTICIPANT: Yeah.
PARTICIPANT: With what I’m working on in Pittsburgh, we have a tiered subscription model. So if you want to access certain information about these folks that are in our database, you have to really want to go out of your way. It’s not a perfect solution, you know? It’s—but I feel like it’s—for me, it’s making me more comfortable with making some of these decisions that we’re making.
PARTICIPANT: If you don’t mind me asking, is the—how did the conversation around that start? Like, in terms of, like—I guess I’m asking the frame of, like, what levels of buy in did you have to secure for that?
PARTICIPANT: For what we’re working on now?
PARTICIPANT: Yeah.
PARTICIPANT: Well, we’re the startup founders. So –
PARTICIPANT: That will work.
PARTICIPANT: We’re working on it on our own.
Because we’re linking a lot of public records in these profiles, so we basically want to man anyone who has influence in Pittsburgh and put it in a profile.
There are things like home addresses. Like, we want to mad kind of where money is flowing from different districts and areas. How we want to get it. We want the home addresses for our database to people to look up, but we decided that’s one clear line that we’ve created is we’re not releasing the exact public home addresses of the people within our profiles.
Again, there are other ways because we’re linking a lot of data power. There are just other ways.
PARTICIPANT: I’m curious in the room if anybody has ever seen an ethical hard line that promotes counter valence as a neutral playing field. So, for example, if you’re a news outlet, and you decide to report on mug shots or whatever, mug shot data. Obviously you’re a very opinionated source and single source community. But would it even be the job of the journalistic entity to say, like, we’re not going to release this until we have a counter surveillance, like, balanced thing on the scale where we’re letting the community—for every mug shot we release, we’ll release one photo of the police officer in the community, or probably something along those lines. That’s not the best example but something that’s tit for tat or you’re neutrally because you ethically encoded into your dataset some at least balance metric that you’re aspiring to.
I’ve been hearing a lot of people talk about counter valence as a policy for general data management on social networks and stuff like that. Where, like, if you give Facebook your photos and Facebook should allow you every time to know they mine your photos, for example. And the users can fight for those kinds of policies for, like, hey, you can make money off of my thing, but just let me know what you’re deriving. Every time you access my information, I should know about it. And I wonder if there’s potential for, like, innovation there around—plus not take the easy source that the police produce, and we know that they’re paid to do it. Let’s, like, raise the bar a little bit and produce the dataset.
Do people generally take user generated content and put it next to official government produced content? Is that still –
PARTICIPANT: I would be really uncomfortable doing that.
PARTICIPANT: Yeah.
PARTICIPANT: You know? There’s a reason that we get—like the government database is—there’s a probability in terms of how they’re publishing it, and you can kind of analyze it. That’s a lot harder to do with the data.
So I feel there’s still a responsible process that’s made to kind of evaluate the government data whereas in another sense not as much. That’s just my first gut reaction to that.
PARTICIPANT: Yeah.
PARTICIPANT: I think also if I had to guess what we had and tried to pitch that in our newsroom, the legal smoke screen that, oh, it isn’t ours, it’s the government, even though you’re the one collecting it from the user. So I think as far as the sales pitch to the folks running the show, I think it would be harder to get sustained BGC on—depending on subject matter.
PARTICIPANT: Yeah.
PARTICIPANT: It might work for, like, you know, places that the city hasn’t plowed on time or stuff like that.
But when it comes to shootings or criminal applications against law enforcement, I imagine that would be a harder sell.
PARTICIPANT: The blatant lies that are in there. But it’s a start.
PARTICIPANT: There’s also a big question of what is actual battles? And is just putting a cop who has misbehaved poorly next to a mug shot actually balance? Because people have very different opinions, and you say the government has this, and we have researched this, is that an actual balance?
PARTICIPANT: But that’s not—there’s no direct answer to that. You just have to find your plan.
PARTICIPANT: The fact there’s an objective answer to it, doesn’t mean you shouldn’t necessarily ask the question.
PARTICIPANT: Oh, sure. We have to answer for ourselves.
PARTICIPANT: We have to actually look to at some points of what might be the thing we need to do. Because things that have often happened with, like, crime is one of the first things people look for whether or not they have anything to do with what happened that when the crime was committed or when the person was attacked. But there’s this kind of thing, well, if I release someone’s juvenile record, that can be an excuse as to why they were murdered. But if we say this person was drunk all the time, and it doesn’t affect anything, we’ve now exposed more data.
And I think that’s the thing—because I do believe data should be exposed, but what I constantly wrestle with is it becomes a contest of data. Like, we’re just more and more and more and more and this data will do this and if this data doesn’t do what I think it does, you’re getting to the point where you’re living your entire life as a performance piece, and it doesn’t work for everyone that way. So now you’re going back to—you’re basically now with lack of, like, privacy having the replication of the same issues beforehand. Except now your identity –
PARTICIPANT: And it goes back to what we were saying earlier in terms of why are we putting that data setup? What’s the value? What’s the purpose? There has to be some newsworthy justification and I don’t think that’s just because we can.
I don’t think there’s a balancing act, there’s a deeper reason why you’re putting that data up there, and then it goes back to the responsibility of what’s the context again around where we’re doing this? It shouldn’t just purely become data for the—data in and of itself grow a record, you know, it doesn’t have enough context to provide the whole story behind it. And I think that we got sort of caught up in this idea that you just by putting out a spreadsheet, we’ve done journalism, and I don’t think we have. There’s a much bigger responsibility around how that data is presented, how it’s accessed. The context that it’s presented in. And then we need to—start thinking about your data is your story. So, yeah, it’s really big, but still how is that packaged?
PARTICIPANT: I just have another follow-up question on that idea of curation and being able to say—like I was thinking about we only show the last 60 days of the data, but then we don’t show the other data that we’re keeping maybe off the Internet or, you know, whatever.
So is there—like what’s the calculation there? Like—I’m asking more about the risk assessment of if you—if you get hacked and all of it gets taken anyway. So I was thinking about the DNC e-mails; right? The value of deleting your stupid e-mails. And, yeah, how does that factor into the conversations about the curation?
PARTICIPANT: So I don’t think it ended up—that’s a long time ago. I suspect that they’re not even keeping that data honest, even though that was the idea.
I think now what they’re looking at now as reporters, it’s about building our library. We want to have access so that we can continue to do our reporting and research. It’s helpful to us. But I think the security aspect is huge. And I don’t know that—that might be another place, but we need to start thinking about what are the protocols? What are our standards? How do we—you know, what is the right strategy for hanging onto all of this data? Because once we are starting to link all of that, we’re doing something much more powerful than the government is itself.
PARTICIPANT: So just to give you, like, more European perspective of the database from booking system is mind-boggling to Europeans. That would never happen. I mean what it basically does—it’s reinforcing a racist police system and basically their perspective on the whole thing. Most of these pictures—I don’t know. It’s just crazy that the society allows that. And we would never have that. And it’s also against the press codes in Germany to publish these pictures of—if they’re taken ethical journalism perspective, it’s not journalism because you’re not providing any context, you’re relaying public data, I don’t understand why it is public. Why in the society that it happened to someone who made a mistake, put their picture up online. I think there’s a legislation that makes it easier to remove these. This discussion, it’s mind-boggling to me.
PARTICIPANT: If I’m searching data, like, in France, and I’m looking at the outer, and I start seeing a color change as you—who’s being, you know, if I want to search, I want to look at the shades of the people who are being arrested, couldn’t I also look at racist policing practices by seeing it in a more accessible way as opposed to getting lost in the data or not?
PARTICIPANT: Why does it need to be a picture, for example? Why does it need to have a name associated? Why can’t just records—I don’t know some attribute of color or –
PARTICIPANT: My follow-up, because it’s easier for journalists and nontechnical people to get to those answers.
PARTICIPANT: Also for devils advocate perspective, the face is also more compelling than the name written. People are more—the same reason why we get pictures of dead children and run them is because it’s heartbreaking to hear that a 3-year-old dies, but it’s really heartbreaking of a 3-year-old who you can see their face who died. That’s been forever also, but it doesn’t make it less true that human faces are less compelling than the name and the age and the neighborhood that they’re from doesn’t have the same emotional residence, good or bad. You could use it either way, or it can be used either way. But the consequences of it aside, pictures are really interesting.
PARTICIPANT: But a journalistic responsibility—you shouldn’t just go for the shock or the—there’s a bit more to it. I’m not a journalist by training actually. I’m a computer science. This—it’s just a different culture, let’s assume that.
PARTICIPANT: Well, some states are not even—like some states, New York d release the mug shots and some jurisdictions don’t give it to you, even if you want to have it. So in some ways, when the information became public, you can get people’s names and home addresses if they own a gun or if they immediately changed the law, people said enough bad stuff about something, when it was bad when it became open, maybe they would change it and understanding it not usable anymore. Some people say they released mug shots with that awful website where they released naked pictures and if you wanted it off, you had to pay them, it was black mail. If you feel strongly enough about something you would take action, which is an act of journalism inadvertently.
PARTICIPANT: I think what you said, though, is really important is the idea of—and when we get into releasing data looking at how we use data now. Is it actually having the effect we want it to have? Is it doing the thing we want it to do? Because if we’re actually hoping, well, I want to—if I may tell a quick personal because this was a real—it was, like, a mind boggling read. I got into a bit of a Twitter fight with the editor of the New York Daily News about his decision to publish the face of the body of Alton Sterling. And his response was this will make people see. This is going to make people actually feel a thing. It has a emotional resonance. And my response was this has happened for years, and it hasn’t helped. So what is your actual goal of this?
And the data point is important. Someone died. It’s questionable. There’s police brutality. But that image, what did it do? And data I think itself can function like that because we can see these records and all of these things. But if we’re trying to find out what our journalistic inquiry is, I want us to think hoar responsible about we use water and force and violence. We also have historical models that show the addition of data—the addition of these things don’t help. Like, a lot of times I think especially when we’re dealing with technological journalism because data is now easier for us to quantify and compute pretending it hasn’t been there before. So we’ve had historical models for these things before. We’ve had practices, that’s why journalism has practices and we’ve seen what has and hasn’t worked. With things like dead bodies of kids, dead bodies of people, it has that gut-wrench human reaction. But I don’t think it ofte expresses what people are trying for, which is scope.
So, yes, you get that quick thing, but what you’re thinking about is 20,000 people with an emotional react to scope rather than shock. Which data can do now, but it doesn’t match. So could a graphic be better than a picture? Could this—could an info graph be better than everybody’s address? And how do you start thinking about that?
PARTICIPANT: I think that’s the question of the story. Is it about this trend or is it about this one person? And I think a lot of times that’s what data people are there for. Like, I feel like reporters are often trying to report on an individual narrative and trying to find a human, a person or some story they have.
But we’re more trying to find a trend and look at a broader picture, as you said, a scope. Sometimes they’re not always—but I think that’s also the question of context too. If it’s not context, you feel it’s more of the narrative, like, where are we going, sure, we can say this person used a lot of water. Yeah, but it’s not really the story that you’re trying to tell with all the data. That’s the big question. And that’s the point that you have to drive home when you get angry e-mails from people who are, like, I’m mad because my salary’s up here, and I haven’t worked there in two weeks, but you still have me listed there.
PARTICIPANT: But just to push back on the idea if we’re going with water is if the actual goal of the story is saying this municipality is overusing water to a degree, and you have someone who was, like, I had a disaster in my house and that put me over the top rather than someone who just doesn’t give and has been overwatering their lawn. That person has been swept up in the trend is still not the story.
PARTICIPANT: Right. But I think it’s hard to differentiate that when you have the one person who is saying that out of the greater story. But not that their experience is important. But I think that that’s hard to pull them out from the e-mail that you get from people.
PARTICIPANT: Then I think that’s the thing we need to work on. How do you start giving people ways to pull themselves out? But also in pulling themselves out, they generate more content. They generate more connection. They might generate better connections for us, if there are people who say this is how you see me in the trend and this is how not. It’s not necessarily connected to all their personal data, but it is about the back and forth because another thing that’s simple for data stories is that it’s very hard to get in contact with anyone specific. You always have to just generally hit the paper, not a person.
PARTICIPANT: Right.
PARTICIPANT: So we’ve hit 11:30. I don’t know when someone kicks us out of here. But I think—yeah, I wish we had another 30 minutes to go into that because I think that point about, like, what are your steps to exit something like that is something that we definitely see as well in terms of just—I mean that, to me, is always what the power kind of play of it to me. Which is every person that’s e-mailing you saying, hey, I don’t think I should be in there, there are probably 100 others that have no clue that that’s even a path that they could take. And, like, how—you know, how to normalize that and take that into account when there’s something being—when you’re putting that big dataset online and acknowledging that there shouldn’t be a venue for dispute.
But didn’t get to scenarios, but I think the conversations that we had instead were easily more interesting than the scenarios I had written up. So totally fine. Thank you, all. I really appreciate coming and hanging out first session this morning. And, Anne, you have some notes.
PARTICIPANT: Yeah. I took some notes.
PARTICIPANT: I’m going to try to grab links of some of the other examples people mentioned and put them in there as well.
PARTICIPANT: I wrote them down. Or if you want to add them as well. It can be found in the schedule.
PARTICIPANT: All right. Thank you.
[Applause]