Aggregation in MongoDB and Mongoose

For a recent project, I worked with my classmates from Code Fellows to create a web app to display movie box office analytics. This post demonstrates what I learned when I wanted to use aggregated data from MongoDB to display average box office grosses on a chart.

Our query to the server will (or will not) include some key/value pairs for filtering the result and the data returned will show the average income per movie screening over a six month period. The result will also have data for an alternate view that provides an overall summary of the query with averages and totals for admissions and attendance.

Although our source data came from MongoDB, a NoSQL database, we modeled our data similar to how we would with a relational database. Our Screenings collection contained instances of individual movie screenings, recording the day and time of the screening, attendance and admissions totals, and then a reference to an entry in the Movie collection that had details about the movie like title and genre.

Aggregation was introduced in Mongo version 2.2. Aggregate operations form a data processing pipeline. The documentation description reads as follows:

Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

Mongoose’s implementation is basically a wrapper around the MongoDB methods, forming JavaScript methods. It acts as a pipeline by creating a promise which can in turn be acted on by any number of aggregation operations.

Let’s begin in the Screenings model. The following is a a static method on the Screenings collection Schema. Rather than writing out the entire function, I’ll break it up into chunks with some commentary.

1
screeningSchema.statics.getAggregateData = function aggMatchingCompany(title, genre) { ... }

The following five code blocks are the body of the above function.

I begin the aggregation with the project method to define the fields I want from the Screenings collection. Here I’m including all the fields in the Screenings schema, and even adding a few computed values that I’ll need. The movie field is a reference to a record in another collection. Project returns all of the records in the collection.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const aggregateResult = this.aggregate([
{
$project:
{
movie: true,
attendanceTotal: true,
admissionsTotal: true,
concessionsTotal: true,
dateTime: { $subtract: ['$dateTime', 1000 * 60 * 60 * 7] },
seats: true,
format: true,
dayOfWeek: { $dayOfWeek: { $subtract: ['$dateTime', 1000 * 60 * 60 * 7] } },
hourOfDay: { $hour: { $subtract: ['$dateTime', 1000 * 60 * 60 * 7] } },
month: { $month: '$dateTime' },
},
},
]);

Next, I need to the title and genre information that is stored on the related entry in the Movies collection. Lookup is a method that effectively performs an outer left join between two collections (in RDBS terms). I’m making the connection between the movie field in my current data to the _id field in the Movies collection and the data is returned as a property on Screenings object titled movie_data.

1
2
3
4
5
6
7
aggregateResult.lookup({
from: 'movies',
localField: 'movie',
foreignField: '_id',
as: 'movie_data',
});
aggregateResult.unwind('$movie_data');

The result of lookup is an array. Unwind is called immediately afterwards to extract the object from the array. With a one-to-one relationship between the instance of a screening to a related Movie entry, unwind is just that simple. If it were a one-to-many relationship, we’d have more data processing options in the unwind.

Next, I’m testing to see if any filter queries were passed into the aggregation pipeline and I’m acting on those using match to constrain the set of all screenings to just the filtered set of screenings.

1
2
3
4
5
6
7
if (title) {
aggregateResult.match({ 'movie_data.title': title });
}
if (genre) {
aggregateResult.match({ 'movie_data.genres': { $in: [genre] } });
}

Grouping will pivot the data into the time sequence I need for a chart. Instead of returning all the raw screening data, this method will use the month field as the basis for grouping the data and returning a sum on admissions and attendance and also an average for those fields. With count, I’m also providing a count of the number of Screening records that were used to form the aggregated data.

1
2
3
4
5
6
7
8
aggregateResult.group({
_id: '$month',
count: { $sum: 1 },
admissions: { $sum: '$admissionsTotal' },
attendance: { $sum: '$attendanceTotal' },
avgAdm: { $avg: '$admissionsTotal' },
avgAtt: { $avg: '$attendanceTotal' },
});

Since this is just a plain old JavaScript promise, we can do any additional work with the data to shape it into the result we want. Here, I’m totaling and averaging the aggregated data to provide an optional grand summary object. And after that, a function to fill in any missing months and then sorting the months in sequential order.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
return aggregateResult.then(data => {
// Add summary totals
const totals = data.reduce((previous, current) => {
previous.admissions += current.admissions;
previous.attendance += current.attendance;
previous.count += current.count;
return previous;
}, { admissions: 0, attendance: 0, count: 0 });
totals.avgAdm = (totals.admissions / totals.count) || 0;
totals.avgAtt = (totals.attendance / totals.count) || 0;
// Polyfill any missing months
for (let i = 1; i < 8; i++) {
const index = data.findIndex((e) => e._id === i);
if (index === -1) {
data.push({
_id: i,
count: 0,
admissions: 0,
attendance: 0,
avgAdm: 0,
avgAtt: 0,
});
}
}
// Sort in ascending order
data.sort((a, b) => a._id > b._id);
return {
sequence: data,
totals,
};
});

The promise will resolve into an object with two properties: sequence and totals. The following is the server side route (using Express) that invokes the aggregation function.

1
2
3
4
5
.get('/aggregate', (req, res, next) => {
const { title, genre } = req.query;
Screening.getAggregateData(title, genre)
.then(data => res.json(data));
})

This is what it looks like from a client side service.

1
2
3
4
5
6
7
aggregate(params) {
return $http
.get(`${apiUrl}/screenings/aggregate`, params)
.then(r => {
return r.data;
});
},

The sequence object in the final data is passed into an instance of ChartJS as a dataset. ChartJS can even overlay datasets to easily compare multiple queries as lines or bar charts.

A demo of the finished project with mock data is online at ahbo.firebaseapp.com.
Here’s the project on GitHub.
MongoDB has some good documentation for its aggregation operations.
And here is the documentation for Mongoose.