Common String Algorithms

By | August 13, 2015

In this post, I would try to implement the following string-related algorithm questions.
I would like to implement these using Java and try to make additional comments which I think that are useful for understanding the implementations of some type of data structures in Java programming language.

Non Repeated Characters in String : Return the unique characters in a given letter.
Anagram Strings : Given two strings, check if they’re anagrams or not. Two strings are anagrams if they are written using the same exact letters, ignoring space, punctuation and capitalization. Each letter should have the same count in both strings. For example, ‘Eleven plus two’ and ‘Twelve plus one’ are meaningful anagrams of each other.
Find Word Positions in Text : Given a text file and a word, find the positions that the word occurs in the file. We’ll be asked to find the positions of many words in the same file.
Remove Duplicate Characters in String : Remove duplicate characters in a given string keeping only the first occurrences. For example, if the input is ‘tree traversal’ the output will be ‘tre avsl’.

In order to understand these types of questions, it requires to store unique key and count of values each, especially basic understanding of Hashmap is essential.

1- Non Repeated Characters in String :

public static ArrayList<String> uniqueLetters(String text) {
    //create a hashmap with key and value pairs, where key is the each letter and
    //value is the count of each letter of a given text
    Map<Character, Integer> map = new HashMap<Character, Integer>();
    //to store unique letters create an arraylist
    ArrayList<String> uniqueChars = new ArrayList<String>();
    //we check each letter in a given text, so in this problem we don't need StringTokenizer
    for (int i = 0; i < text.length(); ++i) {
        if (map.containsKey(text.charAt(i))) {
            int count = map.get(text.charAt(i));
            map.put(text.charAt(i), count + 1);
        } else {
            map.put(text.charAt(i), 1);
        }
    }
    // Loop in a hashSet to get keys and values
    Iterator iterator = map.entrySet().iterator();
    while (iterator.hasNext()) {
        Map.Entry pairs = (Map.Entry) iterator.next();
        System.out.println(pairs.getKey() + " = appeared " + pairs.getValue() + " times");
        if (pairs.getValue().equals(1)) {
            uniqueChars.add(pairs.getKey().toString());
        }
        iterator.remove(); // avoids a ConcurrentModificationException
    }
    return uniqueChars;
}

In this question, we use hashmap and complexity is :
O(n) where n is the length of a given word.

2- Anagram Strings:

public static boolean isAnagram(String text1, String text2) {
    //1st  method is to sort both strings and check one by one
    //sorting adds an complexity of nlogn (e.g. merge sort)
    //this code considers space, punctuation etc.
    //getChars method  may also be used here
    System.out.println("");
    if (text1.length() != text2.length()) {
        System.out.println("inside isAnagram: " + text1.length() + " " + text2.length());
        return false;
    } else {
        String sorted1 = convertToCharArrayFromString(1, text1);// 1 is for sorted return
        String sorted2 = convertToCharArrayFromString(1, text2);
        for (int i = 0; i < sorted1.length(); i++) {
            if (sorted1.charAt(i) != sorted2.charAt(i)) {
                return false;
            }
        }
    }
    return true;
}

public static String convertToCharArrayFromString(int option, String text) {
    char[] content1 = text.toLowerCase().toCharArray();
    if (option == 1) {
        Arrays.sort(content1);
        String sorted1 = new String(content1);
        return sorted1;
    } else {
        String unsorted1 = new String(content1);
        return unsorted1;
    }
}

Two implementations have coded, basic knowledge of Multiset is required to understand the second implementation which reduces the complexity to O(n).

3- Find Word Positions in Text
For this questions, we use a text instead of file to find the given word’s position. We create a map and to store the values, arraylist is used. For a given key, if there are more than one appearance of a value, positions are appended to the arraylist.

public static void findWordPosition(String text, String word) {
    Map<String, ArrayList<Integer>> map = createInvertedIndex(text);
    // for (Map.Entry<String, ArrayList<Integer>> entry : map.entrySet()) {
    ArrayList<Integer> position = map.get(word);
    // System.out.println("Key " + entry.getKey() + entry.getValue());
    for (Integer value : position) {
        System.out.println("Key: " + word + " its positions " + value);
    }
}

public static Map<String, ArrayList<Integer>> createInvertedIndex(String text) {
    //assume that text is given with appropriate
    //letters so no need for regular expressions
    //we can create a map with a key of String (each word) and
    //their positions that is integer array
    Map<String, ArrayList<Integer>> map = new HashMap<String, ArrayList<Integer>>();
    String[] words = text.split(" ");
    for (int i = 0; i < words.length; i++) {
        if (map.containsKey(words[i])) {
            ArrayList<Integer> position = map.get(words[i]);
            position.add(i);
            map.put(words[i], position);
        } else {
            ArrayList<Integer> position = new ArrayList<Integer>();
            position.add(i);
            map.put(words[i], position);
        }
    }
    return map;
}

4- Remove Duplicate Characters in String

public static Set<Character> removeDuplicateChars(String text) {
    // for insertion sort order LinkedHashSet is used
    Set<Character> set = new LinkedHashSet<Character>();
    for (int i = 0; i < text.length(); i++) {
        if (!set.contains(text.charAt(i))) {
           set.add(text.charAt(i));
        }
    }
    return set;
}

Here is the Ideone link of the running code- http://ideone.com/j0lbO1

Enclose codes in [code lang="C"] [/code] tags

This site uses Akismet to reduce spam. Learn how your comment data is processed.