cnotes/INTERNALS.md
Gregory Gauthier 17cddcb7d9
All checks were successful
Build / build (push) Successful in 17s
add gitea runner workflow, and C90 docs for reference
2026-01-30 15:56:36 +00:00

785 lines
18 KiB
Markdown

# cnotes Internals Guide
A detailed walkthrough of the cnotes codebase, focusing on C90 file handling, memory management, and cross-platform techniques.
## Table of Contents
1. [Project Structure](#project-structure)
2. [Platform Abstraction (platform.h)](#platform-abstraction)
3. [Configuration (config.h)](#configuration)
4. [File I/O Patterns](#file-io-patterns)
5. [Module Walkthrough](#module-walkthrough)
- [cnadd.c - Writing to Files](#cnadd---writing-to-files)
- [cndump.c - Reading and Parsing](#cndump---reading-and-parsing)
- [cnfind.c - Searching](#cnfind---searching)
- [cncount.c - Aggregation](#cncount---aggregation)
- [cndel.c - File Rewriting](#cndel---file-rewriting)
6. [Memory Management](#memory-management)
7. [String Handling in C90](#string-handling-in-c90)
8. [Cross-Platform Considerations](#cross-platform-considerations)
---
## Project Structure
```
cnotes/
├── include/
│ ├── config.h # Application configuration constants
│ └── platform.h # Platform-specific abstractions
├── src/
│ ├── cnadd.c # Add new entries
│ ├── cndump.c # Display entries
│ ├── cnfind.c # Search entries
│ ├── cncount.c # Statistics
│ ├── cndel.c # Archive (delete) entries
│ └── cnhelp.c # Help system
├── Makefile # GCC build
├── MAKEFILE.TC # Turbo C++ 3.0 build
└── BUILD.BAT # DOS batch build
```
---
## Platform Abstraction
**File: `include/platform.h`**
This header provides a consistent interface across DOS, Windows, and Unix systems.
### Key Concepts
```c
#ifndef PLATFORM_H
#define PLATFORM_H
```
The **include guard** prevents multiple inclusion. If `PLATFORM_H` is already defined, the preprocessor skips the entire file.
### Platform Detection
```c
#if defined(__MSDOS__) || defined(__DOS__)
/* DOS-specific code */
#elif defined(_WIN32)
/* Windows-specific code */
#else
/* Unix/Linux/macOS code */
#endif
```
Compilers pre-define macros that identify the target platform:
- `__MSDOS__`, `__DOS__` - DOS compilers (Turbo C, DJGPP)
- `_WIN32` - Windows compilers (MSVC, MinGW)
- Neither - Assumed to be Unix-like
### Platform-Specific Definitions
| Macro | DOS | Windows | Unix |
|-------|-----|---------|------|
| `PATH_SEPARATOR` | `'\\'` | `'\\'` | `'/'` |
| `PATH_SEP_STR` | `"\\"` | `"\\"` | `"/"` |
| `HOME_ENV` | `"CNOTES_HOME"` | `"USERPROFILE"` | `"HOME"` |
| `mkdir_portable(p)` | `mkdir(p)` | `_mkdir(p)` | `mkdir(p, 0755)` |
**Why two path separator forms?**
- `PATH_SEPARATOR` (char) - For character comparisons
- `PATH_SEP_STR` (string) - For string concatenation with `sprintf()`
### The mkdir Problem
Different systems have different `mkdir()` signatures:
```c
/* DOS (dir.h) */
int mkdir(const char *path);
/* Windows (direct.h) */
int _mkdir(const char *path);
/* Unix (sys/stat.h) */
int mkdir(const char *path, mode_t mode);
```
The `mkdir_portable()` macro abstracts this difference.
---
## Configuration
**File: `include/config.h`**
### Compile-Time Defaults
```c
#ifndef CNOTES_FILE
#define CNOTES_FILE "cnotes.csv"
#endif
```
The `#ifndef` pattern allows override at compile time:
```bash
gcc -DCNOTES_FILE=\"myfile.csv\" ...
```
### Memory Constraints
```c
#ifndef MAX_ENTRIES
#ifdef MAX_ENTRIES_DEFAULT
#define MAX_ENTRIES MAX_ENTRIES_DEFAULT
#else
#define MAX_ENTRIES 5000
#endif
#endif
```
DOS has limited memory (~640KB conventional). `MAX_ENTRIES_DEFAULT` is set to 100 for DOS in `platform.h`, but 5000 for modern systems.
---
## File I/O Patterns
### Opening Files
C90 provides `fopen()` with mode strings:
| Mode | Meaning |
|------|---------|
| `"r"` | Read (file must exist) |
| `"w"` | Write (creates/truncates) |
| `"a"` | Append (creates if needed) |
| `"r+"` | Read/write (file must exist) |
| `"w+"` | Read/write (creates/truncates) |
**Always check for failure:**
```c
FILE *fp = fopen(path, "r");
if (fp == NULL) {
fprintf(stderr, "Error: Cannot open '%s'\n", path);
return 1;
}
```
### Reading Lines
```c
char line[500];
while (fgets(line, sizeof(line), fp) != NULL) {
/* Process line */
}
```
`fgets()` is safe because it:
1. Takes a maximum length argument
2. Always null-terminates
3. Returns NULL on EOF or error
**Never use `gets()`** - it has no length limit and is a buffer overflow vulnerability.
### Writing Data
```c
/* Formatted output */
fprintf(fp, "%s,%s,%s,\"%s\"\n", date, time, category, message);
/* Or build string first, then write */
sprintf(buffer, "%s,%s\n", field1, field2);
fputs(buffer, fp);
```
### Closing Files
```c
fclose(fp);
```
**Always close files** to:
1. Flush buffered data to disk
2. Release system resources
3. Allow other programs to access the file
---
## Module Walkthrough
### cnadd - Writing to Files
**Purpose:** Append a new timestamped entry to the notes file.
#### Getting the Current Time
```c
#include <time.h>
time_t now;
struct tm *local;
time(&now); /* Get seconds since epoch */
local = localtime(&now); /* Convert to local time struct */
sprintf(date_str, "%04d-%02d-%02d",
local->tm_year + 1900, /* Years since 1900 */
local->tm_mon + 1, /* Months are 0-11 */
local->tm_mday);
sprintf(time_str, "%02d:%02d",
local->tm_hour,
local->tm_min);
```
The `struct tm` fields:
- `tm_year` - Years since 1900 (so 2026 = 126)
- `tm_mon` - Month (0-11, so January = 0)
- `tm_mday` - Day of month (1-31)
- `tm_hour`, `tm_min`, `tm_sec` - Time components
#### Building the File Path
```c
int get_cnotes_path(char *buffer, size_t bufsize, const char *filename) {
const char *home = getenv(HOME_ENV);
if (home == NULL) {
fprintf(stderr, "Error: %s not set\n", HOME_ENV);
return 0;
}
/* Check buffer size before writing */
if (strlen(home) + strlen(CNOTES_DIR) + strlen(filename) + 3 > bufsize) {
fprintf(stderr, "Error: Path too long\n");
return 0;
}
sprintf(buffer, "%s" PATH_SEP_STR "%s" PATH_SEP_STR "%s",
home, CNOTES_DIR, filename);
return 1;
}
```
**Key points:**
1. `getenv()` returns NULL if variable isn't set
2. Always check buffer size before `sprintf()`
3. `PATH_SEP_STR` is a string, so it concatenates directly
#### Creating Directories
```c
void ensure_directory_exists(const char *filepath) {
char dir[512];
char *last_sep;
strcpy(dir, filepath);
last_sep = strrchr(dir, PATH_SEPARATOR);
if (last_sep != NULL) {
*last_sep = '\0'; /* Truncate at last separator */
mkdir_portable(dir);
}
}
```
`strrchr()` finds the **last** occurrence of a character. By truncating there, we get the directory portion of the path.
#### Appending to File
```c
FILE *fp = fopen(path, "a"); /* "a" = append mode */
if (fp == NULL) {
fprintf(stderr, "Error: Cannot open file\n");
return 1;
}
fprintf(fp, "%s,%s,%-*s,\"%s\"\n",
date_str,
time_str,
CATEGORY_LENGTH, category, /* Left-justified, padded */
message);
fclose(fp);
```
The format `%-*s`:
- `-` = left-justify
- `*` = width comes from next argument
- `s` = string
So `%-*s, CATEGORY_LENGTH, category` prints `category` left-justified in a field of `CATEGORY_LENGTH` characters.
---
### cndump - Reading and Parsing
**Purpose:** Read all entries and display in a formatted table.
#### The Entry Structure
```c
typedef struct {
char date[DATE_LENGTH + 1]; /* +1 for null terminator */
char time[TIME_LENGTH + 1];
char category[CATEGORY_LENGTH + 1];
char text[TXTMSG_LENGTH + 1];
} Entry;
```
**Why +1?** C strings are null-terminated. A 10-character date needs 11 bytes: 10 for characters + 1 for `'\0'`.
#### Dynamic Memory Allocation
```c
Entry *entries = (Entry *)malloc(MAX_ENTRIES * sizeof(Entry));
if (entries == NULL) {
fprintf(stderr, "Error: Cannot allocate memory\n");
return 1;
}
/* ... use entries ... */
free(entries);
```
**Why malloc instead of stack array?**
```c
Entry entries[MAX_ENTRIES]; /* BAD on DOS - stack overflow! */
```
DOS has ~64KB stack limit. With `MAX_ENTRIES=5000` and `Entry` being ~150 bytes, that's 750KB - stack overflow! `malloc()` uses the heap, which has more space.
#### Parsing Fixed-Width Fields
```c
static const char *parse_fixed_field(const char *ptr, char *dest,
int length, char delimiter) {
if ((int)strlen(ptr) < length)
return NULL; /* Not enough data */
strncpy(dest, ptr, length);
dest[length] = '\0'; /* Ensure null-terminated */
ptr += length; /* Advance pointer */
if (*ptr != delimiter)
return NULL; /* Expected delimiter not found */
return ptr + 1; /* Return pointer past delimiter */
}
```
This function:
1. Copies exactly `length` characters to `dest`
2. Null-terminates the result
3. Verifies the expected delimiter follows
4. Returns a pointer to continue parsing, or NULL on error
**Usage pattern (state machine):**
```c
const char *ptr = line;
ptr = parse_fixed_field(ptr, entry->date, 10, ',');
if (!ptr) return 0; /* Parse error */
ptr = parse_fixed_field(ptr, entry->time, 5, ',');
if (!ptr) return 0;
/* ... continue ... */
```
#### Parsing Variable-Width Fields
```c
static const char *parse_variable_field(const char *ptr, char *dest,
int max_length, char delimiter) {
int i = 0;
while (*ptr != '\0' && *ptr != delimiter) {
if (i < max_length) {
dest[i++] = *ptr;
}
/* Continue even if truncating, to find delimiter */
ptr++;
}
dest[i] = '\0';
if (*ptr != delimiter)
return NULL;
return ptr + 1;
}
```
This handles fields of unknown length up to a maximum, with graceful truncation.
#### Sorting with qsort()
```c
#include <stdlib.h>
/* Comparison function signature required by qsort */
static int compare_by_date(const void *a, const void *b) {
const Entry *entry_a = (const Entry *)a;
const Entry *entry_b = (const Entry *)b;
int cmp = strcmp(entry_a->date, entry_b->date);
if (cmp != 0) return cmp;
return strcmp(entry_a->time, entry_b->time);
}
/* Usage */
qsort(entries, entry_count, sizeof(Entry), compare_by_date);
```
`qsort()` parameters:
1. Array pointer
2. Number of elements
3. Size of each element
4. Comparison function pointer
The comparison function must return:
- Negative if a < b
- Zero if a == b
- Positive if a > b
**Why `const void *`?** C90's `qsort()` is generic - it works with any data type. You cast to your actual type inside the function.
---
### cnfind - Searching
**Purpose:** Find entries matching search criteria.
#### Case-Insensitive Search
```c
#include <ctype.h>
/* Convert character to lowercase */
int to_lower(int c) {
if (c >= 'A' && c <= 'Z') {
return c + ('a' - 'A');
}
return c;
}
/* Case-insensitive substring search */
char *strcasestr_portable(const char *haystack, const char *needle) {
size_t needle_len;
if (*needle == '\0')
return (char *)haystack;
needle_len = strlen(needle);
while (*haystack != '\0') {
/* Check if needle matches at current position */
size_t i;
int match = 1;
for (i = 0; i < needle_len && haystack[i] != '\0'; i++) {
if (to_lower(haystack[i]) != to_lower(needle[i])) {
match = 0;
break;
}
}
if (match && i == needle_len)
return (char *)haystack;
haystack++;
}
return NULL;
}
```
**Why implement our own?** `strcasestr()` is not part of C90 - it's a POSIX/GNU extension.
#### Multiple Filter Criteria
```c
int matches = 1; /* Assume match until proven otherwise */
/* Filter by category */
if (filter_category[0] != '\0') {
if (strcasecmp_portable(entry->category, filter_category) != 0) {
matches = 0;
}
}
/* Filter by date */
if (matches && filter_date[0] != '\0') {
if (strcmp(entry->date, filter_date) != 0) {
matches = 0;
}
}
/* Filter by text pattern */
if (matches && pattern[0] != '\0') {
if (strcasestr_portable(entry->text, pattern) == NULL) {
matches = 0;
}
}
if (matches) {
/* Entry passes all filters */
}
```
This "whittle down" approach applies filters incrementally.
---
### cncount - Aggregation
**Purpose:** Count entries, optionally grouped by category or date.
#### Tracking Unique Values
```c
typedef struct {
char key[32];
int count;
} CountEntry;
CountEntry counts[MAX_CATEGORIES];
int num_categories = 0;
void increment_count(const char *key) {
int i;
/* Look for existing key */
for (i = 0; i < num_categories; i++) {
if (strcmp(counts[i].key, key) == 0) {
counts[i].count++;
return;
}
}
/* Add new key */
if (num_categories < MAX_CATEGORIES) {
strncpy(counts[num_categories].key, key, 31);
counts[num_categories].key[31] = '\0';
counts[num_categories].count = 1;
num_categories++;
}
}
```
This is a simple associative array. For small datasets, linear search is fine. Larger datasets would benefit from a hash table.
---
### cndel - File Rewriting
**Purpose:** Remove entries by moving them to an archive file.
#### The Challenge
You cannot delete lines from the middle of a file in C. Instead:
1. Read all entries into memory
2. Write non-deleted entries to a temporary file
3. Append deleted entries to archive
4. Replace original with temporary
#### Safe File Replacement
```c
/* Read all entries */
Entry entries[MAX_ENTRIES];
int count = read_all_entries(entries, source_path);
/* Open files */
FILE *temp = fopen(temp_path, "w");
FILE *archive = fopen(archive_path, "a");
/* Write entries to appropriate files */
for (i = 0; i < count; i++) {
if (should_delete(&entries[i])) {
write_entry(archive, &entries[i]);
deleted_count++;
} else {
write_entry(temp, &entries[i]);
}
}
fclose(temp);
fclose(archive);
/* Replace original with temp */
remove(source_path);
rename(temp_path, source_path);
```
**Why archive instead of delete?** The immutable-log philosophy means data is never truly lost - it's just moved to a different file.
#### Confirmation Prompts
```c
char response[10];
printf("Delete %d entries? (y/n): ", count);
fflush(stdout); /* Ensure prompt appears before input */
if (fgets(response, sizeof(response), stdin) != NULL) {
if (response[0] == 'y' || response[0] == 'Y') {
/* Proceed with deletion */
}
}
```
`fflush(stdout)` ensures the prompt is displayed before waiting for input. Without it, buffered I/O might delay the prompt.
---
## Memory Management
### The Golden Rules
1. **Check malloc() return value**
```c
ptr = malloc(size);
if (ptr == NULL) {
/* Handle error */
}
```
2. **Free what you allocate**
```c
Entry *entries = malloc(...);
/* ... use entries ... */
free(entries); /* Always free before return */
```
3. **Don't use after free**
```c
free(entries);
entries = NULL; /* Prevent accidental use */
```
4. **Match allocations to deallocations**
Every `malloc()` needs exactly one `free()`.
### Stack vs Heap
| Stack | Heap |
|-------|------|
| Automatic allocation | Manual allocation |
| Fixed size (~64KB DOS, ~1MB modern) | Limited by system memory |
| Fast allocation | Slower allocation |
| Automatic cleanup | Must call `free()` |
```c
void function(void) {
char buffer[100]; /* Stack - automatic */
char *data = malloc(100); /* Heap - manual */
/* buffer freed automatically when function returns */
free(data); /* Must free explicitly */
}
```
---
## String Handling in C90
### String Basics
C strings are arrays of `char` terminated by `'\0'` (null character).
```c
char str[10] = "Hello";
/* Memory: ['H','e','l','l','o','\0',?,?,?,?] */
/* 0 1 2 3 4 5 6 7 8 9 */
```
### Safe String Functions
| Unsafe | Safe | Notes |
|--------|------|-------|
| `gets()` | `fgets()` | Always use fgets |
| `strcpy()` | `strncpy()` | Specify max length |
| `sprintf()` | `snprintf()`* | *Not in C90 |
**strncpy() gotcha:**
```c
char dest[10];
strncpy(dest, source, 9);
dest[9] = '\0'; /* strncpy may not null-terminate! */
```
If `source` is longer than 9 characters, `strncpy()` won't add a null terminator. Always add it manually.
### String Length vs Buffer Size
```c
char buffer[100]; /* Buffer SIZE is 100 */
strcpy(buffer, "Hello");
/* String LENGTH is 5 (not counting '\0') */
/* strlen(buffer) returns 5 */
```
Always allocate `strlen(str) + 1` bytes for a copy.
---
## Cross-Platform Considerations
### Line Endings
| System | Line Ending |
|--------|-------------|
| Unix/Linux/macOS | `\n` (LF) |
| Windows | `\r\n` (CRLF) |
| Classic Mac | `\r` (CR) |
When reading with `fgets()`, the line ending is included. You may need to strip it:
```c
char *newline = strchr(line, '\n');
if (newline) *newline = '\0';
char *cr = strchr(line, '\r');
if (cr) *cr = '\0';
```
### Path Separators
Handled by `PATH_SEPARATOR` and `PATH_SEP_STR` macros in `platform.h`.
### Environment Variables
| System | Home Directory |
|--------|----------------|
| Unix | `HOME` |
| Windows | `USERPROFILE` |
| DOS | None standard |
The `HOME_ENV` macro abstracts this.
### Integer Sizes
C90 only guarantees minimums:
- `char`: at least 8 bits
- `short`: at least 16 bits
- `int`: at least 16 bits
- `long`: at least 32 bits
For portable code, don't assume `int` is 32 bits (it's 16 bits on DOS).
---
## Summary
The cnotes codebase demonstrates several important C90 patterns:
1. **File I/O**: Opening, reading line-by-line, writing formatted data, closing
2. **Parsing**: State-machine approach with pointer advancement
3. **Memory**: malloc/free for large data, stack for small buffers
4. **Strings**: Careful length tracking, null termination
5. **Portability**: Preprocessor conditionals for platform differences
6. **Error Handling**: Check every return value
These patterns form the foundation of robust C programming and are still relevant in modern systems programming.