String Algorithms Summaries
Sep 1, 2024
Here are the summaries for the String algorithms in Leetcode's Blind 75. Check out the previous post on Matrixes. A String is a sequence of characters used to represent text, usually to make something human-readable. It's not really a data structure, but can be seen as similar to an array, so the solutions will be similar to those problems.
Longest Substring Without Repeating Characters
Given a string, find the longest subsequence of characters that doesn’t repeat characters.
Solution with O(n) time / O(n) space
We’ll use two pointers to keep track of the longest substring so far (called the “sliding window” algorithm), and we’ll use a hash map to keep track of the last location of each character.
Iterate through the characters in the input one at a time with the right pointer, for each character:
- If the character is already in the hash map, update the left pointer one spot to the right of the last character occurrence (unless the left pointer is already past it).
- Add/update the current character and location to the hash map.
- Keep track of the maximum substring length so far by subtracting the right pointer location from the left pointer location.
Longest Repeating Character Replacement
Given a string, and a positive integer k, find the longest repeating substring possible if you can change k characters to any other character.
Solution with O(n) time / O(1) space
This will be another sliding window solution, but with some complexity added because we can change characters. The main idea to this solution is that we only need to keep track of the character with the most occurrences within the sliding window, because that will create the longest substring.
Iterate through the input string with the right pointer, using a hash map (or integer array) to track the number of occurrences of each character included in the window. Separately, keep track of the max occurrence of any character within the window.
When the window size minus the max occurrence of any character is greater than k, move the left pointer forward and decrement the removed character’s occurrence in the hash map above.
The max window size at any time is the answer.
One thing that might be confusing about this solution, is that the algorithm doesn’t necessarily ensure that each window contains a valid substring, it just ensures that the window size never increases beyond the longest substring so far.
Minimum Window Substring
Given an input string and a search string, find the smallest substring of the input string that contains all characters from the search string.
Solution with O(n) time / O(1) space, where n is the characters in the input string
This is another sliding window two pointer solution.
First, we’ll initialize a character count map counting the occurrences of each character within the search string, which will be used to track which characters have been found in the input string.
Then we’ll initialize a total count of the characters in search string, which will be used to track if the substring is a valid window.
Iterate through the input string with the right pointer, decreasing the count in the character count map, indicating one of those characters are in the substring window. If the character’s value in the map is positive, then decrease the total count as well because this is a character that is still needed.
At this point, if the total count of characters found from the search string is 0, we’ll use an inner loop to move the left pointer as far as we can to create a minimum window. While the count is 0:
- Keep track of the minimum substring found so far.
- If the character at the left pointer was initialized in the map, then we know the substring is not valid anymore, so increase the total count.
- Also remember to increment the character count map, and move the left pointer forward.
The answer is the minimum substring found within the inner loop.
Valid Anagram
Given 2 input strings, determine if they are an anagram of each other (letters rearranged).
Solution with O(n) time / O(1) space, where n is the number of characters in one input, assuming we skip the comparison when the lengths are different
Just as a side note, for problems involving character counting like in many of the string problems, there’s a common optimization by using an integer array of length 26 (for each letter in the english alphabet) to count characters instead of a map. Since the indices are known, the map has slightly more overhead than the array, but I prefer the map for readability.
This problem is basically only character counting, so just iterate over the first string, incrementing the character count in the map, then iterate over the second string, decrementing the character count in the map.
If the character count is 0 for all letters, then it is a valid anagram.
Group Anagrams
Given an array of strings, group the strings by anagram (i.e. every string in a group is an anagram of each other).
Solution with O(nklog(k)) time / O(1) space, where n is the # of strings in the input array, and k is the longest string length
If you’re familiar with functional programming concepts, this problem can be solved nicely by implementing a "group by" and using the anagram as a dynamic key.
In other words, initialize a grouping map of "string" → "a list of strings" to store the groups. Iterate through the input array:
- Determine the key for the current string by sorting it so that every anagram will have the same key.
- If the key is already in the grouping map, append the string to the group, otherwise, create a new group.
The answer is the values of the grouping map
Valid Parentheses
Given a string with only parentheses (rounded, square, and curly), determine if the order of the parentheses is valid. By valid, they mean that parenthesis should always be in pairs (open and close), and the pairs should match by type.
Solution with O(n) time / O(n) space
If you picture an expression with parenthesis, you’ll notice that no matter how many times you “open” a parenthesis pair, the next closing parenthesis should be the last parenthesis opened. This is a LIFO/stack.
Iterate through the characters in the string, whenever an open parenthesis is encountered, push the expected closing parenthesis to a stack. When you encounter a closing parenthesis, if the stack is empty or the top of the stack doesn’t match the current character, then the parenthesis are not valid because they are in the wrong order.
After iterating through the string, if the stack is not empty, then this also means the parentheses are not valid, because some parentheses are not closed. Otherwise, it is valid.
Valid Palindrome
Given a string, determine if it's a palindrome, ignoring punctuation.
Solution with O(n) time / O(1) space
Use two pointers starting at both ends. Iterate towards the middle, skipping punctuation and spaces. If the characters are always equal until the pointers meet in the middle, then it's a valid palindrome.
Longest Palindromic Substring
Given a string, find the longest palindrome within the string
Solution with O(n^2) time / O(1) space
For me, this problem ended up being more complex than it seemed, because there’s no simple “start” and “end” for a palindrome. A palindrome starts from the middle and repeats outward.
Iterate through the string, at each character, try to find the longest palindrome using that as the center. Since the palindrome could also start with the “center” in between letters, check each half step as well.
In order to find the palindrome, use two pointers moving out until the characters at the pointers aren’t equal anymore. Save the location and length of each palindrome if its larger than the previous.
Palindromic Substrings
Given a string, find the number of palindromes within the string, including the recursive palindromes at each level of a longer palindrome.
Solution with O(n^2) time / O(1) space
You can solve this one almost exactly like the previous problem, “Longest Palindromic Substring”, except you’ll need to keep a count for every palindrome found.
Encode and Decode Strings (Leetcode Premium)
Given an array of strings, implement one function to encode the array into a single string, then implement a decode function to reverse the encode operation.
Solution with O(n) time / O(1) space for both encoding and decoding
The general idea with this one is that we have full control over the beginning of the encoded string, so we can insert a marker that helps us figure out where to split the string. If the marker includes the length of the next string, then we don’t even need to modify the input strings.
For encoding, iterate through the array, joining all strings into one. Before appending each string, insert the string length marker, for example:
- The length of the string followed by any identifying character like "#" .
- The length of the string with a fixed width using leading zeros (The width can be as small as 3 because the problem states 200 is the max string length).
There are many options for the marker because once you know the string length, then you can completely skip over to the next marker, without worrying about whether your marker conflicts with an input string.
For decoding, iterate through the encoded string until you’ve found the end of your marker. Parse the length from the marker, and use that determine the start and end of the next string. Put the next string into the result array, then find the next marker after the end of the last result string.