The MongoDB aggregation pipeline is a sequence of stages that process and transform data before returning it to the application. Each stage represents a data processing operation, such as grouping or filtering, and uses a set of operators. The pipeline enables a step-by-step approach to processing data, with input documents from one or more collections being transformed successively until the desired output is achieved. The output of each stage becomes the input of the next, allowing for unlimited stage combinations. This approach allows for efficient and flexible data processing.
How does the aggregation pipeline work?
I have provided a visual representation below that showcases a common MongoDB aggregation pipeline.
- $match stage – selects the documents that meet our criteria.
- $group stage – performs the aggregation operation on these selected documents.
- $sort stage – arranges the resulting documents in the desired order, whether in ascending or descending order.
MongoDB aggregate pipeline syntax
Here’s an instance of constructing an aggregation query:
whereas –
- collection Name – The name of a collection,
- pipeline – is an array format that consists of the aggregation stages,
- options – is an optional parameter for the aggregation
This is an example of the aggregation pipeline syntax:
MongoDB Operations
let’s consider a sample data set of a company that has the following collections:
1. Departments: This collection contains information about all the departments in the company, such as their name, description, and location.
2. Employees: This collection contains information about all the employees working in the company, such as their name, age, gender, job title, department, and salary.
Let’s assume the data set looks something like this:
Departments Collection –
Employees Collection –
Below are a few instances of aggregation operations available in MongoDB:
1. Filtering: The $match stage filters documents according to the particular condition. For example, if we want to retrieve all the employees whose departments are “IT”, “Product” and “Marketing”, we can use the match stage as follows:
This query will return all the documents from the employee’s collection where the department field equals “IT”, “Product” and “Marketing”.
2.Grouping: The $group stage groups documents based on a particular field and lets you perform various operations on the grouped data. For example, Once we have filtered out the employees based on the above condition, we can use the MongoDB aggregation $group stage to group the remaining employees by their department and calculate the average salary of each department.
This query will group all the documents from the employee’s collection by the department and calculate the average salary for each department.
3.MongoDB $lookup: $lookup is a stage in MongoDB’s aggregation pipeline that enables you to perform a left outer join between two collections. This operation can be useful when you want to combine data from multiple collections into a single result set.
Assuming that we aim to obtain information on employees and their department details including name, description, and location, after performing filtering and grouping operations as mentioned earlier. We can leverage the $lookup operator to link the employee’s collection with the departments collection.
This will perform a join between the collections of departments and employees, based on the name and department fields, respectively. The result will be a new field called department_details in each employee document, which will contain the details of the department.
4.Projecting: With the $project stage, we can project a subset of fields from the documents in a collection. To illustrate, in case we wish to obtain solely the name and department fields of every employee following the filtering, grouping, and lookup procedures mentioned earlier, we can utilize the project stage in a subsequent manner:
This query will return all the documents from the employee’s collection with only the name and salary fields.
5.Sorting: The $sort stage sorts documents based on a specified field in ascending and descending order. For example, if we want to retrieve all the employees sorted by their name in ascending order, we can use the sort stage as follows:
6.Aggregation Operators: In the pipeline stage, MongoDB provides a variety of aggregation operators. For example, the $limit operator limits the number of documents returned by the pipeline. The $skip operator skips a particular number of documents in the given pipeline.
Suppose we want to group the employees based on their department and calculate the average salary for each department, but we also want to skip the first document and only retrieve two documents per department. To achieve this, we can make use of the $skip, and $limit in the MongoDB aggregation pipeline.
By exploring these examples, you will gain a better understanding of how aggregation frameworks work.
Conclusion
In summary, MongoDB’s Aggregation Pipeline is a powerful tool for processing data on document collections. It allows developers to filter, transform, and aggregate data, making it ideal for analysis. The pipeline offers numerous operations, such as sorting, grouping, joining, and projecting data from various collections, enabling developers to perform complex analytics. It can process real-time data, allowing developers to analyze and act on it as it arrives. The Aggregation Pipeline is a critical component for applications requiring advanced data processing capabilities, providing developers with flexibility and efficiency.