# cnotes Internals Guide A detailed walkthrough of the cnotes codebase, focusing on C90 file handling, memory management, and cross-platform techniques. ## Table of Contents 1. [Project Structure](#project-structure) 2. [Platform Abstraction (platform.h)](#platform-abstraction) 3. [Configuration (config.h)](#configuration) 4. [File I/O Patterns](#file-io-patterns) 5. [Module Walkthrough](#module-walkthrough) - [cnadd.c - Writing to Files](#cnadd---writing-to-files) - [cndump.c - Reading and Parsing](#cndump---reading-and-parsing) - [cnfind.c - Searching](#cnfind---searching) - [cncount.c - Aggregation](#cncount---aggregation) - [cndel.c - File Rewriting](#cndel---file-rewriting) 6. [Memory Management](#memory-management) 7. [String Handling in C90](#string-handling-in-c90) 8. [Cross-Platform Considerations](#cross-platform-considerations) --- ## Project Structure ``` cnotes/ ├── include/ │ ├── config.h # Application configuration constants │ └── platform.h # Platform-specific abstractions ├── src/ │ ├── cnadd.c # Add new entries │ ├── cndump.c # Display entries │ ├── cnfind.c # Search entries │ ├── cncount.c # Statistics │ ├── cndel.c # Archive (delete) entries │ └── cnhelp.c # Help system ├── Makefile # GCC build ├── MAKEFILE.TC # Turbo C++ 3.0 build └── BUILD.BAT # DOS batch build ``` --- ## Platform Abstraction **File: `include/platform.h`** This header provides a consistent interface across DOS, Windows, and Unix systems. ### Key Concepts ```c #ifndef PLATFORM_H #define PLATFORM_H ``` The **include guard** prevents multiple inclusion. If `PLATFORM_H` is already defined, the preprocessor skips the entire file. ### Platform Detection ```c #if defined(__MSDOS__) || defined(__DOS__) /* DOS-specific code */ #elif defined(_WIN32) /* Windows-specific code */ #else /* Unix/Linux/macOS code */ #endif ``` Compilers pre-define macros that identify the target platform: - `__MSDOS__`, `__DOS__` - DOS compilers (Turbo C, DJGPP) - `_WIN32` - Windows compilers (MSVC, MinGW) - Neither - Assumed to be Unix-like ### Platform-Specific Definitions | Macro | DOS | Windows | Unix | |-------|-----|---------|------| | `PATH_SEPARATOR` | `'\\'` | `'\\'` | `'/'` | | `PATH_SEP_STR` | `"\\"` | `"\\"` | `"/"` | | `HOME_ENV` | `"CNOTES_HOME"` | `"USERPROFILE"` | `"HOME"` | | `mkdir_portable(p)` | `mkdir(p)` | `_mkdir(p)` | `mkdir(p, 0755)` | **Why two path separator forms?** - `PATH_SEPARATOR` (char) - For character comparisons - `PATH_SEP_STR` (string) - For string concatenation with `sprintf()` ### The mkdir Problem Different systems have different `mkdir()` signatures: ```c /* DOS (dir.h) */ int mkdir(const char *path); /* Windows (direct.h) */ int _mkdir(const char *path); /* Unix (sys/stat.h) */ int mkdir(const char *path, mode_t mode); ``` The `mkdir_portable()` macro abstracts this difference. --- ## Configuration **File: `include/config.h`** ### Compile-Time Defaults ```c #ifndef CNOTES_FILE #define CNOTES_FILE "cnotes.csv" #endif ``` The `#ifndef` pattern allows override at compile time: ```bash gcc -DCNOTES_FILE=\"myfile.csv\" ... ``` ### Memory Constraints ```c #ifndef MAX_ENTRIES #ifdef MAX_ENTRIES_DEFAULT #define MAX_ENTRIES MAX_ENTRIES_DEFAULT #else #define MAX_ENTRIES 5000 #endif #endif ``` DOS has limited memory (~640KB conventional). `MAX_ENTRIES_DEFAULT` is set to 100 for DOS in `platform.h`, but 5000 for modern systems. --- ## File I/O Patterns ### Opening Files C90 provides `fopen()` with mode strings: | Mode | Meaning | |------|---------| | `"r"` | Read (file must exist) | | `"w"` | Write (creates/truncates) | | `"a"` | Append (creates if needed) | | `"r+"` | Read/write (file must exist) | | `"w+"` | Read/write (creates/truncates) | **Always check for failure:** ```c FILE *fp = fopen(path, "r"); if (fp == NULL) { fprintf(stderr, "Error: Cannot open '%s'\n", path); return 1; } ``` ### Reading Lines ```c char line[500]; while (fgets(line, sizeof(line), fp) != NULL) { /* Process line */ } ``` `fgets()` is safe because it: 1. Takes a maximum length argument 2. Always null-terminates 3. Returns NULL on EOF or error **Never use `gets()`** - it has no length limit and is a buffer overflow vulnerability. ### Writing Data ```c /* Formatted output */ fprintf(fp, "%s,%s,%s,\"%s\"\n", date, time, category, message); /* Or build string first, then write */ sprintf(buffer, "%s,%s\n", field1, field2); fputs(buffer, fp); ``` ### Closing Files ```c fclose(fp); ``` **Always close files** to: 1. Flush buffered data to disk 2. Release system resources 3. Allow other programs to access the file --- ## Module Walkthrough ### cnadd - Writing to Files **Purpose:** Append a new timestamped entry to the notes file. #### Getting the Current Time ```c #include time_t now; struct tm *local; time(&now); /* Get seconds since epoch */ local = localtime(&now); /* Convert to local time struct */ sprintf(date_str, "%04d-%02d-%02d", local->tm_year + 1900, /* Years since 1900 */ local->tm_mon + 1, /* Months are 0-11 */ local->tm_mday); sprintf(time_str, "%02d:%02d", local->tm_hour, local->tm_min); ``` The `struct tm` fields: - `tm_year` - Years since 1900 (so 2026 = 126) - `tm_mon` - Month (0-11, so January = 0) - `tm_mday` - Day of month (1-31) - `tm_hour`, `tm_min`, `tm_sec` - Time components #### Building the File Path ```c int get_cnotes_path(char *buffer, size_t bufsize, const char *filename) { const char *home = getenv(HOME_ENV); if (home == NULL) { fprintf(stderr, "Error: %s not set\n", HOME_ENV); return 0; } /* Check buffer size before writing */ if (strlen(home) + strlen(CNOTES_DIR) + strlen(filename) + 3 > bufsize) { fprintf(stderr, "Error: Path too long\n"); return 0; } sprintf(buffer, "%s" PATH_SEP_STR "%s" PATH_SEP_STR "%s", home, CNOTES_DIR, filename); return 1; } ``` **Key points:** 1. `getenv()` returns NULL if variable isn't set 2. Always check buffer size before `sprintf()` 3. `PATH_SEP_STR` is a string, so it concatenates directly #### Creating Directories ```c void ensure_directory_exists(const char *filepath) { char dir[512]; char *last_sep; strcpy(dir, filepath); last_sep = strrchr(dir, PATH_SEPARATOR); if (last_sep != NULL) { *last_sep = '\0'; /* Truncate at last separator */ mkdir_portable(dir); } } ``` `strrchr()` finds the **last** occurrence of a character. By truncating there, we get the directory portion of the path. #### Appending to File ```c FILE *fp = fopen(path, "a"); /* "a" = append mode */ if (fp == NULL) { fprintf(stderr, "Error: Cannot open file\n"); return 1; } fprintf(fp, "%s,%s,%-*s,\"%s\"\n", date_str, time_str, CATEGORY_LENGTH, category, /* Left-justified, padded */ message); fclose(fp); ``` The format `%-*s`: - `-` = left-justify - `*` = width comes from next argument - `s` = string So `%-*s, CATEGORY_LENGTH, category` prints `category` left-justified in a field of `CATEGORY_LENGTH` characters. --- ### cndump - Reading and Parsing **Purpose:** Read all entries and display in a formatted table. #### The Entry Structure ```c typedef struct { char date[DATE_LENGTH + 1]; /* +1 for null terminator */ char time[TIME_LENGTH + 1]; char category[CATEGORY_LENGTH + 1]; char text[TXTMSG_LENGTH + 1]; } Entry; ``` **Why +1?** C strings are null-terminated. A 10-character date needs 11 bytes: 10 for characters + 1 for `'\0'`. #### Dynamic Memory Allocation ```c Entry *entries = (Entry *)malloc(MAX_ENTRIES * sizeof(Entry)); if (entries == NULL) { fprintf(stderr, "Error: Cannot allocate memory\n"); return 1; } /* ... use entries ... */ free(entries); ``` **Why malloc instead of stack array?** ```c Entry entries[MAX_ENTRIES]; /* BAD on DOS - stack overflow! */ ``` DOS has ~64KB stack limit. With `MAX_ENTRIES=5000` and `Entry` being ~150 bytes, that's 750KB - stack overflow! `malloc()` uses the heap, which has more space. #### Parsing Fixed-Width Fields ```c static const char *parse_fixed_field(const char *ptr, char *dest, int length, char delimiter) { if ((int)strlen(ptr) < length) return NULL; /* Not enough data */ strncpy(dest, ptr, length); dest[length] = '\0'; /* Ensure null-terminated */ ptr += length; /* Advance pointer */ if (*ptr != delimiter) return NULL; /* Expected delimiter not found */ return ptr + 1; /* Return pointer past delimiter */ } ``` This function: 1. Copies exactly `length` characters to `dest` 2. Null-terminates the result 3. Verifies the expected delimiter follows 4. Returns a pointer to continue parsing, or NULL on error **Usage pattern (state machine):** ```c const char *ptr = line; ptr = parse_fixed_field(ptr, entry->date, 10, ','); if (!ptr) return 0; /* Parse error */ ptr = parse_fixed_field(ptr, entry->time, 5, ','); if (!ptr) return 0; /* ... continue ... */ ``` #### Parsing Variable-Width Fields ```c static const char *parse_variable_field(const char *ptr, char *dest, int max_length, char delimiter) { int i = 0; while (*ptr != '\0' && *ptr != delimiter) { if (i < max_length) { dest[i++] = *ptr; } /* Continue even if truncating, to find delimiter */ ptr++; } dest[i] = '\0'; if (*ptr != delimiter) return NULL; return ptr + 1; } ``` This handles fields of unknown length up to a maximum, with graceful truncation. #### Sorting with qsort() ```c #include /* Comparison function signature required by qsort */ static int compare_by_date(const void *a, const void *b) { const Entry *entry_a = (const Entry *)a; const Entry *entry_b = (const Entry *)b; int cmp = strcmp(entry_a->date, entry_b->date); if (cmp != 0) return cmp; return strcmp(entry_a->time, entry_b->time); } /* Usage */ qsort(entries, entry_count, sizeof(Entry), compare_by_date); ``` `qsort()` parameters: 1. Array pointer 2. Number of elements 3. Size of each element 4. Comparison function pointer The comparison function must return: - Negative if a < b - Zero if a == b - Positive if a > b **Why `const void *`?** C90's `qsort()` is generic - it works with any data type. You cast to your actual type inside the function. --- ### cnfind - Searching **Purpose:** Find entries matching search criteria. #### Case-Insensitive Search ```c #include /* Convert character to lowercase */ int to_lower(int c) { if (c >= 'A' && c <= 'Z') { return c + ('a' - 'A'); } return c; } /* Case-insensitive substring search */ char *strcasestr_portable(const char *haystack, const char *needle) { size_t needle_len; if (*needle == '\0') return (char *)haystack; needle_len = strlen(needle); while (*haystack != '\0') { /* Check if needle matches at current position */ size_t i; int match = 1; for (i = 0; i < needle_len && haystack[i] != '\0'; i++) { if (to_lower(haystack[i]) != to_lower(needle[i])) { match = 0; break; } } if (match && i == needle_len) return (char *)haystack; haystack++; } return NULL; } ``` **Why implement our own?** `strcasestr()` is not part of C90 - it's a POSIX/GNU extension. #### Multiple Filter Criteria ```c int matches = 1; /* Assume match until proven otherwise */ /* Filter by category */ if (filter_category[0] != '\0') { if (strcasecmp_portable(entry->category, filter_category) != 0) { matches = 0; } } /* Filter by date */ if (matches && filter_date[0] != '\0') { if (strcmp(entry->date, filter_date) != 0) { matches = 0; } } /* Filter by text pattern */ if (matches && pattern[0] != '\0') { if (strcasestr_portable(entry->text, pattern) == NULL) { matches = 0; } } if (matches) { /* Entry passes all filters */ } ``` This "whittle down" approach applies filters incrementally. --- ### cncount - Aggregation **Purpose:** Count entries, optionally grouped by category or date. #### Tracking Unique Values ```c typedef struct { char key[32]; int count; } CountEntry; CountEntry counts[MAX_CATEGORIES]; int num_categories = 0; void increment_count(const char *key) { int i; /* Look for existing key */ for (i = 0; i < num_categories; i++) { if (strcmp(counts[i].key, key) == 0) { counts[i].count++; return; } } /* Add new key */ if (num_categories < MAX_CATEGORIES) { strncpy(counts[num_categories].key, key, 31); counts[num_categories].key[31] = '\0'; counts[num_categories].count = 1; num_categories++; } } ``` This is a simple associative array. For small datasets, linear search is fine. Larger datasets would benefit from a hash table. --- ### cndel - File Rewriting **Purpose:** Remove entries by moving them to an archive file. #### The Challenge You cannot delete lines from the middle of a file in C. Instead: 1. Read all entries into memory 2. Write non-deleted entries to a temporary file 3. Append deleted entries to archive 4. Replace original with temporary #### Safe File Replacement ```c /* Read all entries */ Entry entries[MAX_ENTRIES]; int count = read_all_entries(entries, source_path); /* Open files */ FILE *temp = fopen(temp_path, "w"); FILE *archive = fopen(archive_path, "a"); /* Write entries to appropriate files */ for (i = 0; i < count; i++) { if (should_delete(&entries[i])) { write_entry(archive, &entries[i]); deleted_count++; } else { write_entry(temp, &entries[i]); } } fclose(temp); fclose(archive); /* Replace original with temp */ remove(source_path); rename(temp_path, source_path); ``` **Why archive instead of delete?** The immutable-log philosophy means data is never truly lost - it's just moved to a different file. #### Confirmation Prompts ```c char response[10]; printf("Delete %d entries? (y/n): ", count); fflush(stdout); /* Ensure prompt appears before input */ if (fgets(response, sizeof(response), stdin) != NULL) { if (response[0] == 'y' || response[0] == 'Y') { /* Proceed with deletion */ } } ``` `fflush(stdout)` ensures the prompt is displayed before waiting for input. Without it, buffered I/O might delay the prompt. --- ## Memory Management ### The Golden Rules 1. **Check malloc() return value** ```c ptr = malloc(size); if (ptr == NULL) { /* Handle error */ } ``` 2. **Free what you allocate** ```c Entry *entries = malloc(...); /* ... use entries ... */ free(entries); /* Always free before return */ ``` 3. **Don't use after free** ```c free(entries); entries = NULL; /* Prevent accidental use */ ``` 4. **Match allocations to deallocations** Every `malloc()` needs exactly one `free()`. ### Stack vs Heap | Stack | Heap | |-------|------| | Automatic allocation | Manual allocation | | Fixed size (~64KB DOS, ~1MB modern) | Limited by system memory | | Fast allocation | Slower allocation | | Automatic cleanup | Must call `free()` | ```c void function(void) { char buffer[100]; /* Stack - automatic */ char *data = malloc(100); /* Heap - manual */ /* buffer freed automatically when function returns */ free(data); /* Must free explicitly */ } ``` --- ## String Handling in C90 ### String Basics C strings are arrays of `char` terminated by `'\0'` (null character). ```c char str[10] = "Hello"; /* Memory: ['H','e','l','l','o','\0',?,?,?,?] */ /* 0 1 2 3 4 5 6 7 8 9 */ ``` ### Safe String Functions | Unsafe | Safe | Notes | |--------|------|-------| | `gets()` | `fgets()` | Always use fgets | | `strcpy()` | `strncpy()` | Specify max length | | `sprintf()` | `snprintf()`* | *Not in C90 | **strncpy() gotcha:** ```c char dest[10]; strncpy(dest, source, 9); dest[9] = '\0'; /* strncpy may not null-terminate! */ ``` If `source` is longer than 9 characters, `strncpy()` won't add a null terminator. Always add it manually. ### String Length vs Buffer Size ```c char buffer[100]; /* Buffer SIZE is 100 */ strcpy(buffer, "Hello"); /* String LENGTH is 5 (not counting '\0') */ /* strlen(buffer) returns 5 */ ``` Always allocate `strlen(str) + 1` bytes for a copy. --- ## Cross-Platform Considerations ### Line Endings | System | Line Ending | |--------|-------------| | Unix/Linux/macOS | `\n` (LF) | | Windows | `\r\n` (CRLF) | | Classic Mac | `\r` (CR) | When reading with `fgets()`, the line ending is included. You may need to strip it: ```c char *newline = strchr(line, '\n'); if (newline) *newline = '\0'; char *cr = strchr(line, '\r'); if (cr) *cr = '\0'; ``` ### Path Separators Handled by `PATH_SEPARATOR` and `PATH_SEP_STR` macros in `platform.h`. ### Environment Variables | System | Home Directory | |--------|----------------| | Unix | `HOME` | | Windows | `USERPROFILE` | | DOS | None standard | The `HOME_ENV` macro abstracts this. ### Integer Sizes C90 only guarantees minimums: - `char`: at least 8 bits - `short`: at least 16 bits - `int`: at least 16 bits - `long`: at least 32 bits For portable code, don't assume `int` is 32 bits (it's 16 bits on DOS). --- ## Summary The cnotes codebase demonstrates several important C90 patterns: 1. **File I/O**: Opening, reading line-by-line, writing formatted data, closing 2. **Parsing**: State-machine approach with pointer advancement 3. **Memory**: malloc/free for large data, stack for small buffers 4. **Strings**: Careful length tracking, null termination 5. **Portability**: Preprocessor conditionals for platform differences 6. **Error Handling**: Check every return value These patterns form the foundation of robust C programming and are still relevant in modern systems programming.