ETEC 511 – IP #3: Algorithms

Option I: Content Prioritization

“At a time when state funding for public goods such as universities, schools, libraries, archives, and other important memory institutions is in decline in the US, private corporations are providing products, services and financing on their behalf. With these trade-offs comes an exercising of greater control over the information, which is deeply consequential for those already systematically oppressed…” (Noble, p. 123)

Think and respond to the following questions:

  • Explain in your own words what “content prioritization” (Noble, p. 156) means (give some examples) and how (in lay terms) content prioritization algorithms work. With control over the “largest digital repository in the world” (Noble, p. 187), how have Google’s content prioritization algorithms been “consequential for those already systematically oppressed”? How do they impact your professional life? (give specific examples and briefly discuss)
  • What are some ways PageRank impacts your personal life? (specific examples and briefly discuss) (How) can you impact PageRank? Explain.

Content prioritization essentially is a resorting algorithm based on a myriad of factors. In Google Search, it is based on things like location data, prior search history, demographic information about your account, and other personalization. While this might seem like a good and useful thing, it does often lead one down a rabbit hole. Google Search, like most sites, wants your attention. The more time you spend with it the better. The more searches, means that they can build a better profile of you and what you want to see. I will come back to that in a moment.

When people started figuring out how to improve their own sites search ranking, they started to manipulate link text to manipulate Google’s algorithm for search ranking. This lead to using potential misleading link text (the early internet history’s version of being Rick-Rolled) to mislead users through a process called Google Bombing. As these became passed around early social media, they also caused Google to rank them higher in priority based on the number of searches being run for the search term. The one that might be memorable was during the second Iraq War, anti-war groups made an effort to link “miserable failure” to the White House’s website. Typically, these Google Bombs were not long-lasting, as you can see from Google Search trends for the phrase “miserable failure”. However, their impact was.

The manipulation of search ranking was seen as a strategy from radical right groups (McSwiney in Devries, Bessant & Watts, 2021, p. 25) to access, recruit and radicalize users. The sheer volume of racist propaganda online is almost pervasive. If one of Google’s search algorithm key ranking criteria is based on volume of links, it is no wonder that racist, biased sites get pushed up the rankings. One of Noble’s arguments throughout the book is that while Google builds the algorithm that pushes certain sites to the top of the pile, they are not responsible for it (Noble, 2018). So, it follows when Google autocompletes a search query with a stereotypical response, that has an impact on the viewer – either reinforcing a negative view or potentially introducing self-doubt and the ranking algorithm every time it is clicked.

In my professional work, I often am searching websites for documentation about educational technology products. I am often working on a work account which has little to no demographic information, never search logged in, with no location technology able to be queried. Essentially, my work account is a bit like a burner account. So typically, no, I do not see any evidence of discrimination, however documentation does have discrimination built into it – the types of archetypes used, the images of people describe more about the company than many think. However, from a search perspective, I do not use Google autocomplete ever, I do not use Google as a sole search provider, I move around (Duck Duck Go and Bing are two suitable competitors), so that there isn’t much to give to one provider.

Back to the popularity contest that is PageRank, and attention. While I do not use Google exclusively, PageRank’s algorithm strategy is pretty pervasive amongst search. It is part of what was taught at Udacity’s big “build a search engine” MOOC (I completed this in 2012?) and is what Yandex and Bing use to scrape the web for links, and to count the number of links that point to a site with a set of keywords. It is a common strategy and would like yield common results – except you do not have the ranking algorithm component, but both ranking and link scraping work hand in hand. The first way that PageRank influences daily life is the reliance on what is popular over what is factual. I have seen this over and over, popular misconceptions – and how tales take over the reality of what happened. Sure, we have context for some of that (in that disadvantaged groups often do not get their stories told at all) but Google’s focus on popularity assumed (when PageRank was developed initially) that people are mostly truthful. Instead, 15 years later, it is less about people and more about how many resources can be deployed to increase site ranking, essentially privileging the wealthy (who are disproportionally white, heterosexual and male) and resourceful.

Well, one way to circumvent PageRank is to Google Bomb it to uselessness. Essentially fill it with obfuscated information. However, that only punishes you – because it makes it less useful to you. One other way to make PageRank be influential is to pay less attention to it and SIFT the sources. Go straight to the source rather than Google Search everything (the number of times I have seen someone search for the website, rather than just type the site address is more than I can count in the last decade). Use other search engines. Avoid giving attention to things that are false. Support platforms that do not combine personal data with search results.

References

Devries, M., Bessant, J., & Watts, R. (Eds.). (2021). Rise of the far right : Technologies of recruitment and mobilization. Rowman & Littlefield Publishers.

Noble, S. (2018). Algorithms of Oppression : How Search Engines Reinforce Racism. NYU Press.

Google Is Not My Curator

I don’t want anyone else to curate search results. If I want to curate my own experience, I want to do it on my desktop, on my terms, with my data and my experience. Not with their data, with their choices, with their algorithms and frankly their shit. I can curate my own search results, thank you very much. I guess I’ve gone from Google fanboy (basing an entire course around it) to disgruntled search addict. You know you’re a bad company (yes, you Google) when Microsoft and Yahoo look good in comparison. I wonder if DuckDuckGo is any better? It sure as hell couldn’t be any worse.

Here’s a link to a PDF of the FTC document filed by Centre for Digital Democracy.

Image Matching

I’ve been working on videos the past two weeks, spending a lot of time in a bubble examining clips in a sort of detail that is probably excruciating for everyone else. I like this sort of in depth analysis of what I look at. It also brings up how flawed we are in how we process images online. We generally assign a bunch of words, meaning and descriptors of an image instead of trying to mathematically or logically sort images. Google gets it right by being able to match images, but their similar image algorithm would be much more useful if we could upload an image and say “match that!” It’s further complicated when you factor in the 60 images a second that you get with a video. It would be great to be able to upload a clip and get a bunch of clips, YouTube videos that match not only the actions but the content as well. Want to see Devo’s live performance of Whip It? Just whip up a video of you playing Whip It on Rock Band 3, upload it and get matches for the band, but others also playing Devo on Rock Band.

Making Sense of Old Media in New Ways

I’ve been reading about sensemaking lately, mostly about dealing with information abundance, filter failure and information overload. Most of the articles deal with text and some other form of media, be it video or audio. We have a few tools to help with text based websites, like Google and Bing as well as trust based networks like the ones we build with blogs and twitter. But we haven’t really dealt with information abundance with videos. Recall back less than a decade ago, prior to Google, where we had AltaVista as the best search engine and ranking was based on a series of on-page items. There was  a wealth of deception based on the ability of unscrupulous webmasters to spam keywords unrelated to the content on page to boost ranking. Google changed that by factoring in links to the webpage, and the text describing that link serving as an annotation of that page being linked to. Which leads us to the number one user-created content on the web…. video.

Specifically, how are we going to make sense of these thousands of hours of video, and assess it for quality? Adaptive Path seem to think that video needs a flickr-like revolution, an interpretive layer on top of the mass of tools we have to share videos. The problem that I see is the same as the problem with AltaVista, keywords and tags can be gamed, and made irrelevant. So we’re relying on text to describe video. Is there a better way? Maybe some way of visually linking instead of describing a video? Maybe a selection of key frames from the videos to describe the video profile?

Searching and Learning

We already have changed how we get information – instead of reading books or taking a course a lot of us just use Google or other search engines to grab the information we need. Unfortunately, there’s a contextual issue, where grabbing information off the Internet doesn’t necessarily provide some linear context for that information that may be important. Sure, with books you can use the index, find the pages that the information is on and scan for just what you need, but inevitably you end up reading at least a few paragraphs before and after and getting some context. Searching the Internet is different – we have Google as a situational context provider (even if it is false context, like Google lending it’s authority to search results). I’ve been thinking about how this ties into education – specifically higher education – and I think the way we informally learn information like we do through Google will trickle up to higher education. In ways, we already see this with how students use the Internet for research.

I’m not the only one thinking about this either, Futurelab released a poll a couple days ago asking (primarily K-8 teachers) which search engine they used. I answered the poll, even though I’m not in that demographic, I figured the more data the better. I think that this poll indicates that others are beginning to use search engines in their teaching – which further moves the teacher away from being the sage on the stage, and more towards the guide on the side (with thanks to my friend Otto who had a habit of saying this at least a couple times a semester and burned the phrase into my head).

Also, I was turned on to the book Search Patterns and the accompanying Search Patterns website – both of which the patterns of how people search – which has tremendous implications for how people learn using the Internet.

Howard Rheingold’s Crap Detection 101

This may be one of the most important thing that will determine success or failure in the future. Not just to determine who is telling us bullshit, but what their motivations might be in telling us bullshit. Otherwise, we’ll be exposed to sending personal information to Nigeria to help out princes, which is this generation’s prime real estate in Florida and bridge in Brooklyn.

Reflections on My Use of Wikis in the Classroom

Wikipedia has fundamentally and finally altered epistemology itself—our commonly held ideas about knowledge. For the academy at large, the significance of Wikipedia is roughly equivalent to that which the Heisenberg uncertainty principle had in the sciences in the 1920s—stating what is not possible rather than what is. It is no longer possible to plan, tax, and budget for universities as if their model of knowledge creation is the only epistemological path. No matter how improbable it might seem that a Web page that anyone can edit would lead to valuable knowledge, Wikipedia makes clear that there is now another model for knowledge creation. And it also recasts the comments of the diplomatic chancellor in a supremely ironic light: here is the leader of a massive state system for knowledge creation stating that “when every one is responsible no one is responsible,” while he, and certainly everyone in that audience, has probably relied upon a knowledge acquisition path—from Google to Wikipedia—for which everyone is responsible and no one is responsible at once. — Robert E. Cummings, Wiki Writing: Collaborative Learning in the College Classroom (online book) (link to quote)

I’ve written previously about my wiki problems, assessing the wiki work, but never really assessed the impact of my decision to turn the Searching the Internet course into a guided research course. Now is a good time to do this as the second iteration of the course is done, and I’m handing it off to someone else. For the most part, people embraced the technology once they understood the purpose of using the wiki – which was hard to explain to some people. It was important to understand that user-created content needs some critical consumption before you trust it. It’s constantly amazing that many people don’t think to question broadcast news, newspapers or media in general, which really is the main goal that I hoped to get out there to people. I think in some ways I failed, more on that later.

One major hurdle that I still am not sure about how to get around (through?) is student expectations of what should go on in the classroom. Using the wiki for everything was conceptually difficult for those who attended lectures in the face-to-face offering. They wanted to discuss the issues in class – and I didn’t persuade them otherwise. It’s the thing I love about classrooms – the discussions therein. I should’ve made a better attempt at summarizing these in class discussions in the wiki, that way there would be a digital record of what was discussed, what the decisions were and where to go post-discussion. Of course, having the discussions robbed them of a crucial piece of the collaborative work – the discussions on talk pages. This discussion serves two purposes. The first being the revelation that the general public have a democratic say in the content published. The second being that hopefully the fact that they’re editing the content means that other non-experts are also editing content, and that means you have to take everything written with a grain of salt (sometimes a pound).

Another classroom expectation that I had trouble with was a small minority of students were just not comfortable doing their own research. They wanted specific instructions from me as to what to do. I was clear in that this course would be unlike other courses they may have taken. I didn’t want the authority of the teacher (and considering the subject matter, let’s face it, people have to get over this authority complex they have – it’s decentralized just like the Internet) and was looking for ways to bust ye olde teacher as authority. I tried telling people that I was not an expert and that my role was as a guide through the material laid out before them. Yes, I wrote it and yes, I researched it. Yes, it could be wrong too. I tried telling people that it wasn’t a course, and they weren’t students and they weren’t doing assignments they were doing exercises. Of course, the exam at the end was real. I tried telling students that I only know this stuff because I looked it up on the Internet. That didn’t work out so well, and I never repeated that one. Nothing will devalue the course than telling the truth. In the end I didn’t try to break this power structure, and it’s one of the reasons I won’t be teaching after this semester.

I disagree with Cumming’s assertion that everyone and no one is responsible for the content. It’s neither. It’s you who is responsible for assessing the information you consume. I think that’s where I’ve failed, not getting this point through, that every piece of information you consume has a bias, a history and a reason. Nobody publishes a story in the newspaper or on a blog without a reason. Some are transparent, some are difficult to read. While I’ve given the students of the Searching the Internet over the last seven years the tools and some experience in using them, I’m not sure anyone stayed with it.

With that said, it wasn’t an all-around failure. I did become a better teacher, more confident in the skills I do have (and able to improve the ones that I’m lacking). The content on the wiki was well crafted, well thought out and showed that when students would engage with the subject, they could become subject matter experts on their own.

What If….

In the spirit of the old Marvel comics, What If… series, I bring you:

What if Bloom’s Taxonomy is right?

Well, first a brief primer in Bloom’s. Bloom’s Taxonomy is a tiered structure, like a staircase, that illustrates increasingly complex or difficult cognitive tasks, particularly in an educational setting. At the bottom is Knowledge (knowing the facts) and scaling up to Evaluation (the ability to weigh several arguments, select the best option and defend that selection). In between there are steps that build on the previous one. There are criticisms of Bloom’s, including many arguments about how learning is not sequential and the semantic framing of Bloom’s steps.

What if Bloom had it wrong for learning but right for evolution? On pages 6-18 of the new Pew Report on the Future of the Internet (PDF), people responded to whether or not Google is making us smarter. I think if you apply this transformation to Bloom’s, you see that we’re skipping from Knowledge and Comprehension on Bloom’s scale to something higher, likely Application. The danger, of course, is if we ignore the the critical consumption part of the equation. We no longer have to evaluate the information we receive, but the source of the information. If the source is trustworthy, then it is likely that the information is trustworthy. If we are unable to make this evaluation in a split second, then we are destined for Idiocracy.

Who’s Watching The Wikimen, Or Wikipeople

I just found in a random search (for editing Wikipedia) an article by Wired about an effort to see who’s editing the world’s largest encyclopedia. I have some privacy reservations about this sort of third party monitoring, especially if corporations are turning the screws on people writing about their excesses. I guess though, if everyone can do it, everyone should. Of course, corporations are the sort of bodies that have people who can spare the time to do this sort of activity, which could lead to that sort of misuse. Now, I’m sure that’s not happening, because corporations never behave badly. Right?

Where Journalism Can Go From Here

Happy new year!

There’s been a lot of talk about the death of the newspaper over the last year. In fact, the postings and articles range from the dire to the hopeful almost dismissive (midway down the page). The main culprit is, of course, “the Internet”. Really, this economic downturn has been a chance for further consolidation of corporate assets. It’s not the Internet that has killed these small papers, it’s the (profit) margins. Here’s an idea where journalism (and newspapers) can go from here.

First thing, for full disclaimer, I’m not educated in journalism, although I use a lot of it’s tenets in my Searching The Internet Effectively course when speaking about verifying information and trust. Trust is a very fickle friend that only comes after time, and those who trust implicitly are likely to be burned somewhere along the course of time. Hopefully, these experiences come early enough and without any major damage and the person will gain experience with those situations. As an educator, and a human being, truth is very important to me. Journalism should be the attempt to discover truth, although I suspect that journalism (…not truth) currently resides in the realm of entertainment or at a minimum, distraction.

So with Google working on better search results for you, personally, and a world of apps for the iPhone that focus on geo-location, you’d think local news would be important. Local news is important. So much so, it saved the Birmingham Eccentric from the axe. Yes, the paper was transferred to being a weekly, but newspapers bringing recent news died in the 80’s with a refocusing on TV news. Certainly the rise of cable news and CNN Headline News being a 24 hour news channel for the headlines, helped nail the coffin for breaking news in newspapers. News from your newspaper should contain stories tailored to the location. Yes, I know that this is taught to journalism students everywhere, but it seems like it is ignored. I know that corporate media recycle their wire stories for several different communities, and I’m sure it’s a fairly commonplace activity. Why?

Newspapers aren’t breaking immediate news anymore, so why focus on what isn’t their strength?

Newspapers should be bringing more in depth news, the “why” in the stories. Part of the “why” should be the reason an article is appearing in the local paper. In “Made to Stick“, the book by Chip and Dan Heath, they talk about relevance and how it is important to transmit the relevance of information to an audience. One of the examples of relevance to an audience was about how a local paper focused almost exclusively on local news. If this simple idea of making things relevant to people works, why aren’t people using it? The term for the “why” in a story has become a part of slow news. Much like the local food and slow food movements, slow news can bring a better and deeper understanding of ideas, relevant to people in a community (you can get your Jane Jacobs texts out now to define community). Pausing to reflect on an incident, newspapers can provide this in depth clarification and corrections to the initial news “outbreak” via cable news and online sources that are, ahem, questionable.

You can even have spicy tag-lines, “News you can really trust” and prove it. From a business sense, people are looking for trust, honesty and things we are sorely lacking from our public institutions. Perhaps, a refocused and brave cadre of journalists can bring that to society.  Plus it’ll save paper where they used to be printing corrections (that no one read anyways).