Dealing With D2L’s Data Hub with PowerBI

Where I work, we use D2L Brightspace as our LMS. We also used to use Insights, their data dashboard product, but we didn’t find value in it and turned it off in 2016 or so (just as they were revamping the product, which I think has gone through a couple of iterations since the last time I looked at it). The product in 2016 was not great, and while there was the kernel of something good, it just wasn’t where it needed to be for us to make use of it at the price point where we as an institution would see value. I will note, that we also weren’t in a place where as an institution we saw a lot of value in institutional data (which has changed in the last few years).

Regardless, as a non-Performance+ user, we do have access to D2L’s Data Hub, which has somewhat replaced the ancient Reporting tool (that thing I was using circa 2013) and some of the internal data tools. The Data Hub is great, because it allows you access to as much data as you can handle. Of course, for most of us that’s too much.

Thankfully, we’re a Microsoft institution, so I have access to PowerBI, which is a “business intelligence tool” but for the sort of stuff that I’m looking for, it’s a pretty good start. We started down this road to understand LMS usage better – so the kind of things that LMS administrators would benefit from knowing: daily login numbers, aggregate information about courses (course activation status per semester), average logins per day per person and historical trending information.

The idea is that building this proof-of-concept for administrators, we can then build something similar for instructors to give them course activity at a glance, but distinct from the LMS. Why not just license Performance+ and manage it through Insights report builder? Well, fiscally, that’s more expensive than the PowerBI Pro licensing – which is the only additional cost in this scenario. Also Insights report builder is in Domo, and there’s frankly less experts in Domo than PowerBI. Actually, the Insights tool is being revamped again… so there’s another reason to wait on that potential way forward. Maybe if the needs change, we can argue for Performance+, but at this time, it’s not in the cards, as we’d need to retrain ourselves, on top of migrating existing workflows, and managing different licenses. It would probably reveal an efficiency in that once built we don’t have to manually update, and it’s managed inside the LMS so it’s got that extra layer of security.

Where I’m at currently is that I’ve built a couple reports, put that on a PowerBI dashboard. Really, relatively simple stuff. Nothing automated. Those PowerBI reports are built on downloading the raw data in CSV format from Brightspace’s Data Hub, once a week, extracting those CSVs from the Zip format (thankfullly Excel no longer strips leading zeros automatically, so you could migrate/examine the data in Excel as a middle step), placing them in a folder, starting up PowerBI then updating the data.

The one report that I will share specifics about is just counting the number of enrollments and withdrawls each day, but every report we use follows the same process – download the appropriate data set, extract from Zip, place it in the correct folder within the PowerBI file structure on a local computer. PowerBI then can publish the dataset to a shared Microsoft Teams so that only the appropriate people can see it (you can also manage that in PowerBI for an extra layer of security).

I obviously won’t share all the work we’ve done in PowerBI as well, some of the data is privileged, and should be for LMS admins only. However, I can share the specifics of the enrollments and withdrawals counts report because the datasets we need for enrollments and withdrawals is, not shockingly, “Enrollments and Withdrawals” (not the Advanced Data Set). We also need “Role Details”, to allow filtering based on roles. So you want to know how many people are using a specific role in a time period? You can do it. We also need “Organizational Units” because for us, that’s where the code is which contains the semester information for us and if we ever need to see a display of what’s happening in a particular course, you could. Your organization may vary. If you don’t have that information there, you’ll need to pull some additional information, likely in Organizational Units and Organizational Parents.

In PowerBI you can use the COUNTA function (which is similar to COUNTIF in Excel) and create a “measure” for the count of enroll and a separate one for the unenroll. That can be plotted (a good choice is a line chart – with two y-axis lines). Setup Filters to filter by role (translating them through a link between enrollments and withdrawals and role details).

I will note here, drop every last piece of unused data. Not only is it good data security practice, but PowerBI Pro has a 1 gig limit on it’s reporting size, which is a major limitation on some of the data work we’re doing. That’s where we start to actually get into issues, in that not much can be done to avoid it when you’re talking about big data (and the argument for just licensing Performance+ starts to come into play). While I’m a big fan of distributing services and having a diverse ecosystem (with expertise being able to be drawn from multiple sources), I can totally see the appeal of an integrated data experience.

Going forward, you can automate the Zip downloads through Brightspace’s API, which could automate the tedious process of updating once a week (or once a day if you have Performance+ and get the upgraded dataset creation that’s part of that package). Also, doing anything that you want to share currently requires a PRO license for PowerBI, which is a small yearly cost, but there’s some risk as Microsoft changes bundles, pricing and that may be entirely out of your control (like it is where I work – licensing is handled centrally, by our IT department – and cost recovery is a tangled web to navigate). Pro licenses have data limits, but I’m sure the Enrollments data is sitting at 2GB, and growing. So you may hit some data caps (or in my experience, it just won’t work) if you have a large usage. That’s a huge drawback as most of the data in large institutions will surpass the 1GB limit. For one page of visualizations, PowerBI does a good job of remembering your data sanitization process, so as long as the dataset itself doesn’t change, you are good.

Now, if you’re looking to do some work with learning analytics, and generate the same sort of reports – Microsoft may not be the right ecosystem for you as there’s not many (if any) learning analytics systems using Microsoft and their Azure Event Hubs (which is the rough equivalent of the AWS Firehose – both of which are ingestion mechanisms to get data into the cloud). That lack of community will make it ten times harder to get any sort of support and Microsoft themselves aren’t particularly helpful. In just looking at the documentation, AWS is much more readable, and understandable, without having a deep knowledge of all the related technologies.

Why Can’t Students Opt-Out of Data Collection in Higher Ed?

You know, for all the talk from EdTech vendors about being student centred (and let’s face it, LMS’s and most of the other products are not student centred) and all the focus on data collection from student activity – why don’t products have an easy opt-out (being student centred and all that) to not allow data to be collected?

What makes matters worse in many ways is that the data collection is hidden from student’s view. For instance, in many LMS’s they track time spent on content items or files uploaded. This tracking is never made explicit to the student unless they go and dig into their own data. And doing that is incredibly difficult and you don’t get a complete picture of what is being collected. If I was a more conspiratorial minded person, I’d suggest that it was done on purpose to make it hard to understand the amount of total surveillance that is accessible by a “teacher” or “administrator”. I’m not. I honestly believe that the total surveillance of students in the LMS is really about feature creep, where one request turned into more requests for more information. LMS’s on the other hand want to keep their customers happy, so why not share what information they have with their clients, after all it’s their data isn’t it?

Actually, it’s not. It’s not the client’s data. It’s the individual’s data. To argue otherwise is to claim that what someone does is not their own – it reduces agency to a hilariously outdated and illogical idea.

The individual, human, user should be allowed to share or not share this data, with teachers, with institutions or with external companies that host that data in an agreement with an institution that probably was signed without them even knowing it. There’s an element of data as a human right that we should be thinking about. As an administrator I have a policy I have to adhere to, and a personal set of ethics that frankly are more important to me (and more stringent) than the obscurely written-in-legalese policy. An unscrupulous, or outright negligent LMS administrator would mean that all bets would be off. They could do things in the LMS that no one, except another administrator, could track. Even then, the other administrator would have to know enough to be able to look at all the hundreds of different changelogs, scattered across different tools, across different courses and do essentially a forensic search that could take a good long time to undo any damage. That lack of checks and balances (a turn of phrase that appears purposefully as I think we’ll see what a lack of checks and balances will be like in the US the next few years) which could be implemented as part of using the system, but aren’t, leaves education in precarious situations.

The idea that the data locked in the LMS without the students being able to say, “I only want my data shared with my College” or “I only want my data shared with company X for the purposes of improving the system” shouldn’t be hard to implement. So why hasn’t it been done? Or, even talked about?

In my opinion, the data that gets harvested (anonymously of course) provides more important information to the company about how people use the system than the optics of having an opt-out button. It allows Blackboard to say how instructors use their system. We could talk about how terrifying this blog post is (instructor use is a proxy for how students use the system because LMSs give power to instructors to construct the student’s experience), or devoid of solid analysis. I’ll deal with the analysis later, so let’s just consider how this is entirely without context.  Blackboard hosted sites have been harvested (probably with the consent of the people who signed the contracts, not the instructors who actually create things on the system, or the students who engage in the system) by Blackboard to tell you how you teach. In one of the cited pieces, Blackboard says that they used this data to improve their notifications. If I put this through for ethics review, and said I’m going to look at notifications improvement and then released a post about how people used the whole system, it may very well be in their rights (and I suspect it is) but it is ethically murky. The fact they’ve released it to the public allows the public to ask these questions, but no one really has? Or if they have, I missed it.

The fact that Blackboard can do this (and I’m talking about Blackboard because they did it, but there’s a similar post from Canvas that’s making the rounds about discussion posts with video being more engaging or some such idea) without really clearing this with any client is chilling. It also constrains how we perceive we are able to use the LMS (it sets the standards for how it is used).