Finding all the indexes of a whole word in a given string using java


Intent

To find all the indexes of a whole word in a given searchable string.

Motivation

Today, I had to write a piece of code in which I had to find all the indexes of a particular keyword in a searchable string. Most of the times, when we have to find index of a keyword in a searchable string we use indexOf method of String class.

For example,


String searchableString = “Don’t be evil. Being evil is bad”;

String keyword = “be”;

So, we can find the index of keyword as


int index = searchableString.indexOf(keyword);

This will give us the index of first occurrence of keyword (“be”). Now suppose, we have to find all the indexes of keyword (“be”), we will have to loop over searchableString and find all the indexes of keyword “be”.

This can be done as follows

	public static void findIndexes(){
		String searchableString = "don’t be evil.being evil is bad";
		String keyword = "be";

		int index = searchableString.indexOf(keyword);
		while (index >=0){
			System.out.println("Index : "+index);
			index = searchableString.indexOf(keyword, index+keyword.length())	;
		}

	}

	public static void main(String[] args) {
		findIndexes();
	}

This program will print :

Index :  6

Index : 14

There is a problem in this code as we should not get “Index : 14” because we are looking for a keyword “be” and the index it has found is from word “being” . This just looks for any occurrence of keyword in the string it does not have to be a whole word. To solve this problem, we will be writing a solution using Regular expressions.

Solution

We have to find out the indexes which correspond to the whole word. For example, we should only get “Index : 6” as that corresponds to the keyword “be” not “Index : 14” as it corresponds to the word “being” .

This can be done as follows :-

WholeWordIndexFinder.java

package org.dailywtf.string;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WholeWordIndexFinder {

	private String searchString;

	public WholeWordIndexFinder(String searchString) {
		this.searchString = searchString;
	}

	public List<IndexWrapper> findIndexesForKeyword(String keyword) {
		String regex = "\\b"+keyword+"\\b";
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(searchString);

		List<IndexWrapper> wrappers = new ArrayList<IndexWrapper>();

		while(matcher.find() == true){
			int end = matcher.end();
			int start = matcher.start();
			IndexWrapper wrapper = new IndexWrapper(start, end);
			wrappers.add(wrapper);
		}
		return wrappers;
	}

	public static void main(String[] args) {
		WholeWordIndexFinder finder = new WholeWordIndexFinder("don’t be evil.being evil is bad");
		List<IndexWrapper> indexes = finder.findIndexesForKeyword("be");
		System.out.println("Indexes found "+indexes.size() +" keyword found at index : "+indexes.get(0).getStart());
	}

}

IndexWrapper.java

package org.dailywtf.string;

public class IndexWrapper {

	private int start;
	private int end;

	public IndexWrapper(int start, int end) {
		this.start = start;
		this.end = end;
	}

	public int getEnd() {
		return end;
	}

	public int getStart() {
		return start;
	}

	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + end;
		result = prime * result + start;
		return result;
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		IndexWrapper other = (IndexWrapper) obj;
		if (end != other.end)
			return false;
		if (start != other.start)
			return false;
		return true;
	}

}

Explanation
First of all a regular expression is built using the keyword. Regex \b defines a word boundary. \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b.
More information on regex could be found at this link.  Next, we just iterate over the matcher till matcher can find any match. Whenever a match is found an wrapper object is created which contains the start and end position of the keyword.

Using the above code, we can find all the indexes of a keyword in a searchable string.

27 thoughts on “Finding all the indexes of a whole word in a given string using java

  1. Tym the Enchanter

    Nice little solution, two things though.

    1. The hashCode for index wrapper only returns the value of the evaluation in line 26

    result = prime * result + start;

    If you want to assign 1 to result then the next two lines should start result += and not result =, but then do you need the line

    int result = 1 anyway?

    2. Maybe adding a flag to do case insensitive search, or to allow the addition of any flags as OR’d ints would be useful.

    Tym

    Reply
    1. whyjava Post author

      1. I used eclipse to generate the hashcode and equals method. In my view, the hashcode method is correct because in line 25
      result = prime * result + end;
      it calculates the value of result based on the end value and now this result value is used
      in the line 26
      result = prime * result + start;

      So, result is calculated based on the value of result in line 25.

      2. Yes, we could add a flag for specifying whether we want case sensitive or case insensitive search.

      Thanks
      whyjava

      Reply
  2. Artur Biesiadowski

    If you want to use it often, caching Pattern for given keywords might be thing to consider (unless you are looking for different words each single time).

    If you use it VERY often, write it on your own without using regexp (after profiling that this is really a hotspot in the app). I remember that rewriting some regexp trying to detect date format to hand-crafted logic gave 30 times speedup in synthetic microbenchmark and noticeable speedup in real app.

    Reply
  3. scalaBoy

    In scala:

    val str = "Don't be evil. Being evil is bad"
    val key = "be"
    val regex = ("\\b" + key + "\\b").r
    (for(m <- regex findAllIn str matchData) yield (m.start,m.end)).toList
    

    would get List((6,8))

    Reply
  4. scalaBoy

    In scala, the following codes would get List((6,8))

    val str = “Don’t be evil. Being evil is bad”
    val key = “be”
    val regex = (“\\b” + key + “\\b”).r
    (for(m <- regex findAllIn str matchData) yield (m.start,m.end)).toList

    Reply
  5. Juri

    Thank you for the post

    In case keyword contains some regex reserved characters, the keyword should be escaped:
    String regex = “\\b”+ Pattern.quote(keyword) +”\\b”;

    Reply
  6. Raymon

    Incredible! This blog looks just like my old one!

    It’s on a completely different topic but it has pretty much the same page layout and design. Excellent choice of colors!

    Reply
  7. films

    I really like your blog.. very nice colors & theme.
    Did you make this website yourself or did you hire someone to do it for you?
    Plz respond as I’m looking to create my own blog and would like to know where u got this from. many thanks

    Reply
  8. Pingback: SyncEdit 2 – An (updated) IntelliJ IDEA Plugin | Bill White's Blog

  9. Pingback: SyncEdit 2 – An (updated) IntelliJ IDEA Plugin | Bill White's Blog | HTML5 / Javascript / d3 / iOS / Data Visualizations / Flex

  10. Maybell

    An outstanding share! I’ve just forwarded this onto a coworker who was doing a little research on this. And he actually bought me dinner simply because I stumbled upon it for him… lol. So let me reword this…. Thanks for the meal!! But yeah, thanks for spending some time to talk about this matter here on your site.

    Reply
  11. Mark

    String regex = “\\b”+keyword+”\\b”;
    This doesn’t work if the keyword your searching for has a minus sign on the beginning or the end of it.

    Reply
  12. Kat Uk

    There isn’t any need to maintain DVD collections or rent DVDs frequently to look at movies of your choice. Sit using your palms out of your body facing down towards the floor, your spine straight, you flat on the floor. However, a possible problem with these proxy sites is because have limited capabilities and therefore are sometimes slow in loading.

    Reply
  13. vijay

    Hi i am in need of help i need a java program to find words and lines of the same java program which i typed or created can any help me please

    Reply
  14. dating and texting rules

    Search for at least 10 American singles from these free American dating
    sites who have the same interests as you, then contact them all.
    For as long as persons actually meet in person – then dating online really isn’t any different from
    “real life”. Article Source: More information about online dating sites,click online dating
    sites and get details.

    Reply
  15. reyes

    Thanks , I’ve just been looking for info about this topic for a
    long time and yours is the greatest I’ve came upon so far.
    But, what in regards to the conclusion? Are you positive in regards to
    the source?

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s