Day 3: Let’s collect data using Stream API


On day 2, you learned that Stream API can help you work with collections in a declarative manner. We looked at the collect method, which is a terminal operation that collects the result set of a stream pipeline in a List. The collect method is a reduction operation that reduces a stream to a collection. The collect method takes a Collector which let us implement functionalities like group by, partitioning, very easily.

To feel the power of Collector let us look at the example where we have to group tasks by their type. To write code for the use-case, we would write the following Java 8 code. Please refer to day 2 blog where we talked about the example domain we will use in this series

private static Map<TaskType, List> groupTasksByType(List tasks) {
return tasks.stream().collect(Collectors.groupingBy(task -> task.getType()));
}

The code shown above uses groupingBy Collector defined in the Collectors utility class. The Collectors utility class provides a lot of static utility methods for creating collectors for most common use cases like grouping. It creates a Map with key as the TaskType and value as the list containing all the tasks which have same TaskType. As an exercise try to write the same example in the previous versions of Java.

Collecting data into containers

On day2, you looked at the collect method on the Stream that allowed you to collect the resulting Stream result to a List using stream.collect(Collectors.toList()).

private static List titles(List tasks) {
return tasks.stream().map(Task::getTitle).collect(toList());
}

Collecting data into a Set

You can also collect data into a Set using the toSet collector.

private static Set uniqueTitles(List tasks) {
return tasks.stream().map(Task::getTitle).collect(toSet());
}

Collecting data into a Map

You can convert a stream to a Map by using the toMap collector. The toMap collector takes two mapper functions to extract the key and values for the Map. In the code shown below, Task::getTitle is Function that takes a task and produces a key with only title. The task -> task is a lambda expression that just returns itself i.e. task in this case.

private static Map<String, Task> taskMap(List tasks) {
return tasks.stream().collect(toMap(Task::getTitle, task -> task));
}

We can improve the code shown above by using the identity default method in the Function interface to make code cleaner and better convey developer intent to use identity function as shown below.

import static java.util.function.Function.identity;

private static Map<String, Task> taskMap(List tasks) {
return tasks.stream().collect(toMap(Task::getTitle, identity()));
}

The code to create a Map from the stream will throw an exception when duplicate keys are present. You will get an error like the one shown below.

Exception in thread "main" java.lang.IllegalStateException: Duplicate key Task{title='Read Version Control with Git book', type=READING}
at java.util.stream.Collectors.lambda$throwingMerger$105(Collectors.java:133)

You can handle duplicates by using another variant of the toMap function which allows us to specify a merge function. The merge function allows a client to specify how they want to resolve collisions between values associated with the same key. In the code shown below, we just used the last value but you can write intelligent algorithm to resolve the collision.

private static Map<String, Task> taskMap_duplicates(List tasks) {
return tasks.stream().collect(toMap(Task::getTitle, identity(), (t1, t2) -> t2));
}

Similar to the toMap collector there is also toConcurrentMap collector that produces ConcurrentMap instead of a HashMap.

Using other collections

The specific collectors like toList and toSet does not allow you to specify the underlying List or Set implementation. You can use toCollection collector when you want to collect the result to other types of collections as shown below.

private static LinkedHashSet collectToLinkedHaskSet(List tasks) {
return tasks.stream().collect(toCollection(LinkedHashSet::new));
}

##Grouping Collectors

One of the most common use case of Collector is to group elements. Let’s look at various examples to understand how we can perform grouping.

###Example 1: Grouping tasks by type

Let’s look the example shown below where we want to group all the tasks based on their TaskType. You can very easily perform this task by using the groupingBy Collector of the Collectors utility class as shown below. You can make it more succinct by using method references and static imports.

import static java.util.stream.Collectors.groupingBy;
private static Map<TaskType, List> groupTasksByType(List tasks) {
return tasks.stream().collect(groupingBy(Task::getType));
}

It will produce the output shown below.

{CODING=[Task{title='Write a mobile application to store my tasks', type=CODING, createdOn=2015-07-03}], WRITING=[Task{title='Write a blog on Java 8 Streams', type=WRITING, createdOn=2015-07-04}], READING=[Task{title='Read Version Control with Git book', type=READING, createdOn=2015-07-01}, Task{title='Read Java 8 Lambdas book', type=READING, createdOn=2015-07-02}, Task{title='Read Domain Driven Design book', type=READING, createdOn=2015-07-05}]}

Example 2: Grouping by tags

private static Map<String, List> groupingByTag(List tasks) {
return tasks.stream().
flatMap(task -> task.getTags().stream().map(tag -> new TaskTag(tag, task))).
collect(groupingBy(TaskTag::getTag, mapping(TaskTag::getTask,toList())));
}

private static class TaskTag {
final String tag;
final Task task;

public TaskTag(String tag, Task task) {
this.tag = tag;
this.task = task;
}

public String getTag() {
return tag;
}

public Task getTask() {
return task;
}
}

Example 3: Group task by tag and count

Combining classifiers and Collectors

private static Map<String, Long> tagsAndCount(List tasks) {
return tasks.stream().
flatMap(task -> task.getTags().stream().map(tag -> new TaskTag(tag, task))).
collect(groupingBy(TaskTag::getTag, counting()));
}

Example 4: Grouping by TaskType and createdOn

private static Map<TaskType, Map<LocalDate, List>> groupTasksByTypeAndCreationDate(List tasks) {
return tasks.stream().collect(groupingBy(Task::getType, groupingBy(Task::getCreatedOn)));
}

Partitioning

There are times when you want to partition a dataset into two dataset based on a predicate. For example, we can partition tasks into two groups by defining a partitioning function that partition tasks into two groups — one with due date before today and one with due date after today.

private static Map<Boolean, List> partitionOldAndFutureTasks(List tasks) {
return tasks.stream().collect(partitioningBy(task -> task.getDueOn().isAfter(LocalDate.now())));
}

Generating statistics

Another group of collectors that are very helpful are collectors that produce statistics. These work on the primitive datatypes like int,double, long and can be used to produce statistics like the one shown below.

IntSummaryStatistics summaryStatistics = tasks.stream().map(Task::getTitle).collect(summarizingInt(String::length));
System.out.println(summaryStatistics.getAverage()); //32.4
System.out.println(summaryStatistics.getCount()); //5
System.out.println(summaryStatistics.getMax()); //44
System.out.println(summaryStatistics.getMin()); //24
System.out.println(summaryStatistics.getSum()); //162

There are other variants as well for other primitive types like LongSummaryStatistics and DoubleSummaryStatistics

You can also combine one IntSummaryStatistics with another using the combine operation.

firstSummaryStatistics.combine(secondSummaryStatistics);
System.out.println(firstSummaryStatistics)

Joining all titles

private static String allTitles(List tasks) {
return tasks.stream().map(Task::getTitle).collect(joining(", "));
}

Writing a custom Collector

import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;

import java.util.Collections;
import java.util.EnumSet;
import java.util.Set;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;

public class MultisetCollector implements Collector<T, Multiset, Multiset> {

@Override
public Supplier<Multiset> supplier() {
return HashMultiset::create;
}

@Override
public BiConsumer<Multiset, T> accumulator() {
return (set, e) -> set.add(e, 1);
}

@Override
public BinaryOperator<Multiset> combiner() {
return (set1, set2) -> {
set1.addAll(set2);
return set1;
};
}

@Override
public Function<Multiset, Multiset> finisher() {
return Function.identity();
}

@Override
public Set characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH));
}
}
import com.google.common.collect.Multiset;

import java.util.Arrays;
import java.util.List;

public class MultisetCollectorExample {

public static void main(String[] args) {
List names = Arrays.asList("shekhar", "rahul", "shekhar");
Multiset set = names.stream().collect(new MultisetCollector<>());

set.forEach(str -> System.out.println(str + ":" + set.count(str)));

}
}

Leave a comment