In-Place Tokenization with Pointers (Trim Spaces, Skip Empties)
Title: Tokenize a Mutable C-String In-Place Using Two Pointers
Level: Difficult
Concepts: Pointer traversal, read/write pointers, in-place modification ('\0' insertion), pointer difference, trimming via pointer arithmetic, output list of char*
Scenario
You receive a mutable C-string containing comma-separated values, possibly with extra spaces and consecutive delimiters. You must split the string into tokens in-place (by writing '\0' terminators), trim spaces around each token, skip empty tokens, and return an array of pointers to the tokens. No dynamic allocation; only pointer arithmetic.
Problem Statement
Implement a function that:
- Takes a writable null-terminated string
s, - Splits on a single-character delimiter
delim, - Trims leading and trailing spaces (
' ') from each token, - Skips empty tokens (after trimming),
- Stores up to
max_tokenspointers to tokens intoout_tokens[], - Returns the number of tokens found via
*out_countand a truncation flag if more tokens existed than capacity.
All work must be done in-place using two-pointer logic; no library calls.
Requirements
- Inputs:
char *s— mutable, null-terminated input string; will be modified.char delim— delimiter character (e.g.,',').char **out_tokens— array to store pointers to token starts.int max_tokens— capacity ofout_tokens(max_tokens ≥ 0).
- Outputs:
int *out_count— number of stored tokens (0..max_tokens).bool *out_truncated—trueif more tokens existed than stored.
- Behavior:
- Use pointer variables:
char *read,*write,*token_start, and*p. - Replace delimiter(s) with
'\0'after trimming each token. - Trim spaces: advance token start over leading spaces; and for trailing spaces, backtrack from last character to the last non-space, then write
'\0'. - Skip tokens that become empty after trimming.
- Use pointer variables:
- Validation:
- On any
NULLpointer ormax_tokens < 0, return-1without modifying outputs.
- On any
- Time: O(L) where L is length of
s; Space: O(1) extra (besidesout_tokens). - Note: Only ASCII space
' 'is treated as whitespace in this problem.
Function Details
- Name:
tokenize_in_place - Arguments:
char *schar delimchar **out_tokensint max_tokensint *out_countbool *out_truncated
- Return Value:
int—0on success;-1on invalid input. - Description:
Walk the string with a read pointer to find token boundaries and a write pointer to place characters. When a delimiter is hit or the string ends, trim trailing spaces by moving a pointer backward to the last non-space, place'\0', and if the token is non-empty, record its start pointer intoout_tokens(provided there is capacity). Multiple adjacent delimiters collapse to zero or one stored token depending on content (empties are skipped).
Solution Approach
- Validate inputs.
- Initialize:
read = s; write = s; count = 0; truncated = false; - While
true:- Skip leading spaces before token: advance
readwhile*read == ' '. - Set
token_start = write; - Copy until delimiter or
'\0': while*read != delimand*read != '\0', write*write++ = *read++; - Trim trailing spaces: set
p = write - 1; whilep >= token_start && *p == ' ',p--; - Terminate token: set
write = p + 1; *write = '\0'; - Record token if non-empty: if
write > token_start, then ifcount < max_tokenssetout_tokens[count++] = token_start;else settruncated = true; - If
*read == '\0'→ break; otherwise (*read == delim) advanceread++and setwrite++to start next token (we’ll overwrite next chars).
- Skip leading spaces before token: advance
- After loop: set
*out_count = count; *out_truncated = truncated; return 0;
Tasks to Perform
- Validate:
s,out_tokens,out_count,out_truncatednon-NULL;max_tokens ≥ 0. - Initialize
char *read = s; char *write = s; int count = 0; bool truncated = false; - Loop over tokens separated by
delim:- Skip leading spaces: advance
read. token_start = write;- Copy chars until
delimor'\0'towrite(advance both). - Trim trailing spaces using a backtracking pointer
p. - Terminate with
'\0'and (if non-empty) store pointer if capacity allows; otherwise set truncated flag. - If at
'\0', break; else advance past delimiter and prepare for next token (write++to leave room for the next token).
- Skip leading spaces: advance
- On exit: write outputs and return
0. - On invalid input: return
-1without modifying outputs.
Test Cases
| # | Inputs / Precondition | Expected Output | Notes |
|---|---|---|---|
| 1 | s=" alpha, beta , , gamma ", delim=',', max_tokens=5 |
ret=0, out_tokens → {"alpha","beta","gamma"}, out_count=3, truncated=false; s becomes "alpha\0beta\0\0gamma\0"` |
Leading/trailing spaces trimmed; empty token skipped |
| 2 | s="a,b,c", delim=',', max_tokens=2 |
ret=0, out_tokens → {"a","b"}, out_count=2, truncated=true |
Capacity limit sets truncated flag |
| 3 | s=" , , ", delim=',', max_tokens=4 |
ret=0, out_count=0, truncated=false |
All tokens empty after trimming |
| 4 | s="one", delim=';', max_tokens=3 |
ret=0, out_tokens → {"one"}, out_count=1, truncated=false |
No delimiters present |
| 5 | s="" (empty), any delim, max_tokens=3 |
ret=0, out_count=0, truncated=false |
Empty string |
| 6 | s=NULL or out_tokens=NULL or out_count=NULL |
ret=-1 |
Invalid pointers |
| 7 | max_tokens=-1 |
ret=-1 |
Invalid capacity |