cnotes/INTERNALS.md
Gregory Gauthier 17cddcb7d9
All checks were successful
Build / build (push) Successful in 17s
add gitea runner workflow, and C90 docs for reference
2026-01-30 15:56:36 +00:00

18 KiB

cnotes Internals Guide

A detailed walkthrough of the cnotes codebase, focusing on C90 file handling, memory management, and cross-platform techniques.

Table of Contents

  1. Project Structure
  2. Platform Abstraction (platform.h)
  3. Configuration (config.h)
  4. File I/O Patterns
  5. Module Walkthrough
  6. Memory Management
  7. String Handling in C90
  8. Cross-Platform Considerations

Project Structure

cnotes/
├── include/
│   ├── config.h      # Application configuration constants
│   └── platform.h    # Platform-specific abstractions
├── src/
│   ├── cnadd.c       # Add new entries
│   ├── cndump.c      # Display entries
│   ├── cnfind.c      # Search entries
│   ├── cncount.c     # Statistics
│   ├── cndel.c       # Archive (delete) entries
│   └── cnhelp.c      # Help system
├── Makefile          # GCC build
├── MAKEFILE.TC       # Turbo C++ 3.0 build
└── BUILD.BAT         # DOS batch build

Platform Abstraction

File: include/platform.h

This header provides a consistent interface across DOS, Windows, and Unix systems.

Key Concepts

#ifndef PLATFORM_H
#define PLATFORM_H

The include guard prevents multiple inclusion. If PLATFORM_H is already defined, the preprocessor skips the entire file.

Platform Detection

#if defined(__MSDOS__) || defined(__DOS__)
    /* DOS-specific code */
#elif defined(_WIN32)
    /* Windows-specific code */
#else
    /* Unix/Linux/macOS code */
#endif

Compilers pre-define macros that identify the target platform:

  • __MSDOS__, __DOS__ - DOS compilers (Turbo C, DJGPP)
  • _WIN32 - Windows compilers (MSVC, MinGW)
  • Neither - Assumed to be Unix-like

Platform-Specific Definitions

Macro DOS Windows Unix
PATH_SEPARATOR '\\' '\\' '/'
PATH_SEP_STR "\\" "\\" "/"
HOME_ENV "CNOTES_HOME" "USERPROFILE" "HOME"
mkdir_portable(p) mkdir(p) _mkdir(p) mkdir(p, 0755)

Why two path separator forms?

  • PATH_SEPARATOR (char) - For character comparisons
  • PATH_SEP_STR (string) - For string concatenation with sprintf()

The mkdir Problem

Different systems have different mkdir() signatures:

/* DOS (dir.h) */
int mkdir(const char *path);

/* Windows (direct.h) */
int _mkdir(const char *path);

/* Unix (sys/stat.h) */
int mkdir(const char *path, mode_t mode);

The mkdir_portable() macro abstracts this difference.


Configuration

File: include/config.h

Compile-Time Defaults

#ifndef CNOTES_FILE
#define CNOTES_FILE "cnotes.csv"
#endif

The #ifndef pattern allows override at compile time:

gcc -DCNOTES_FILE=\"myfile.csv\" ...

Memory Constraints

#ifndef MAX_ENTRIES
    #ifdef MAX_ENTRIES_DEFAULT
        #define MAX_ENTRIES MAX_ENTRIES_DEFAULT
    #else
        #define MAX_ENTRIES 5000
    #endif
#endif

DOS has limited memory (~640KB conventional). MAX_ENTRIES_DEFAULT is set to 100 for DOS in platform.h, but 5000 for modern systems.


File I/O Patterns

Opening Files

C90 provides fopen() with mode strings:

Mode Meaning
"r" Read (file must exist)
"w" Write (creates/truncates)
"a" Append (creates if needed)
"r+" Read/write (file must exist)
"w+" Read/write (creates/truncates)

Always check for failure:

FILE *fp = fopen(path, "r");
if (fp == NULL) {
    fprintf(stderr, "Error: Cannot open '%s'\n", path);
    return 1;
}

Reading Lines

char line[500];
while (fgets(line, sizeof(line), fp) != NULL) {
    /* Process line */
}

fgets() is safe because it:

  1. Takes a maximum length argument
  2. Always null-terminates
  3. Returns NULL on EOF or error

Never use gets() - it has no length limit and is a buffer overflow vulnerability.

Writing Data

/* Formatted output */
fprintf(fp, "%s,%s,%s,\"%s\"\n", date, time, category, message);

/* Or build string first, then write */
sprintf(buffer, "%s,%s\n", field1, field2);
fputs(buffer, fp);

Closing Files

fclose(fp);

Always close files to:

  1. Flush buffered data to disk
  2. Release system resources
  3. Allow other programs to access the file

Module Walkthrough

cnadd - Writing to Files

Purpose: Append a new timestamped entry to the notes file.

Getting the Current Time

#include <time.h>

time_t now;
struct tm *local;

time(&now);              /* Get seconds since epoch */
local = localtime(&now); /* Convert to local time struct */

sprintf(date_str, "%04d-%02d-%02d",
        local->tm_year + 1900,  /* Years since 1900 */
        local->tm_mon + 1,      /* Months are 0-11 */
        local->tm_mday);

sprintf(time_str, "%02d:%02d",
        local->tm_hour,
        local->tm_min);

The struct tm fields:

  • tm_year - Years since 1900 (so 2026 = 126)
  • tm_mon - Month (0-11, so January = 0)
  • tm_mday - Day of month (1-31)
  • tm_hour, tm_min, tm_sec - Time components

Building the File Path

int get_cnotes_path(char *buffer, size_t bufsize, const char *filename) {
    const char *home = getenv(HOME_ENV);

    if (home == NULL) {
        fprintf(stderr, "Error: %s not set\n", HOME_ENV);
        return 0;
    }

    /* Check buffer size before writing */
    if (strlen(home) + strlen(CNOTES_DIR) + strlen(filename) + 3 > bufsize) {
        fprintf(stderr, "Error: Path too long\n");
        return 0;
    }

    sprintf(buffer, "%s" PATH_SEP_STR "%s" PATH_SEP_STR "%s",
            home, CNOTES_DIR, filename);
    return 1;
}

Key points:

  1. getenv() returns NULL if variable isn't set
  2. Always check buffer size before sprintf()
  3. PATH_SEP_STR is a string, so it concatenates directly

Creating Directories

void ensure_directory_exists(const char *filepath) {
    char dir[512];
    char *last_sep;

    strcpy(dir, filepath);
    last_sep = strrchr(dir, PATH_SEPARATOR);

    if (last_sep != NULL) {
        *last_sep = '\0';  /* Truncate at last separator */
        mkdir_portable(dir);
    }
}

strrchr() finds the last occurrence of a character. By truncating there, we get the directory portion of the path.

Appending to File

FILE *fp = fopen(path, "a");  /* "a" = append mode */
if (fp == NULL) {
    fprintf(stderr, "Error: Cannot open file\n");
    return 1;
}

fprintf(fp, "%s,%s,%-*s,\"%s\"\n",
        date_str,
        time_str,
        CATEGORY_LENGTH, category,  /* Left-justified, padded */
        message);

fclose(fp);

The format %-*s:

  • - = left-justify
  • * = width comes from next argument
  • s = string

So %-*s, CATEGORY_LENGTH, category prints category left-justified in a field of CATEGORY_LENGTH characters.


cndump - Reading and Parsing

Purpose: Read all entries and display in a formatted table.

The Entry Structure

typedef struct {
    char date[DATE_LENGTH + 1];      /* +1 for null terminator */
    char time[TIME_LENGTH + 1];
    char category[CATEGORY_LENGTH + 1];
    char text[TXTMSG_LENGTH + 1];
} Entry;

Why +1? C strings are null-terminated. A 10-character date needs 11 bytes: 10 for characters + 1 for '\0'.

Dynamic Memory Allocation

Entry *entries = (Entry *)malloc(MAX_ENTRIES * sizeof(Entry));
if (entries == NULL) {
    fprintf(stderr, "Error: Cannot allocate memory\n");
    return 1;
}
/* ... use entries ... */
free(entries);

Why malloc instead of stack array?

Entry entries[MAX_ENTRIES];  /* BAD on DOS - stack overflow! */

DOS has ~64KB stack limit. With MAX_ENTRIES=5000 and Entry being ~150 bytes, that's 750KB - stack overflow! malloc() uses the heap, which has more space.

Parsing Fixed-Width Fields

static const char *parse_fixed_field(const char *ptr, char *dest,
                                      int length, char delimiter) {
    if ((int)strlen(ptr) < length)
        return NULL;  /* Not enough data */

    strncpy(dest, ptr, length);
    dest[length] = '\0';  /* Ensure null-terminated */

    ptr += length;  /* Advance pointer */

    if (*ptr != delimiter)
        return NULL;  /* Expected delimiter not found */

    return ptr + 1;  /* Return pointer past delimiter */
}

This function:

  1. Copies exactly length characters to dest
  2. Null-terminates the result
  3. Verifies the expected delimiter follows
  4. Returns a pointer to continue parsing, or NULL on error

Usage pattern (state machine):

const char *ptr = line;
ptr = parse_fixed_field(ptr, entry->date, 10, ',');
if (!ptr) return 0;  /* Parse error */
ptr = parse_fixed_field(ptr, entry->time, 5, ',');
if (!ptr) return 0;
/* ... continue ... */

Parsing Variable-Width Fields

static const char *parse_variable_field(const char *ptr, char *dest,
                                         int max_length, char delimiter) {
    int i = 0;

    while (*ptr != '\0' && *ptr != delimiter) {
        if (i < max_length) {
            dest[i++] = *ptr;
        }
        /* Continue even if truncating, to find delimiter */
        ptr++;
    }

    dest[i] = '\0';

    if (*ptr != delimiter)
        return NULL;

    return ptr + 1;
}

This handles fields of unknown length up to a maximum, with graceful truncation.

Sorting with qsort()

#include <stdlib.h>

/* Comparison function signature required by qsort */
static int compare_by_date(const void *a, const void *b) {
    const Entry *entry_a = (const Entry *)a;
    const Entry *entry_b = (const Entry *)b;

    int cmp = strcmp(entry_a->date, entry_b->date);
    if (cmp != 0) return cmp;

    return strcmp(entry_a->time, entry_b->time);
}

/* Usage */
qsort(entries, entry_count, sizeof(Entry), compare_by_date);

qsort() parameters:

  1. Array pointer
  2. Number of elements
  3. Size of each element
  4. Comparison function pointer

The comparison function must return:

  • Negative if a < b
  • Zero if a == b
  • Positive if a > b

Why const void *? C90's qsort() is generic - it works with any data type. You cast to your actual type inside the function.


cnfind - Searching

Purpose: Find entries matching search criteria.

#include <ctype.h>

/* Convert character to lowercase */
int to_lower(int c) {
    if (c >= 'A' && c <= 'Z') {
        return c + ('a' - 'A');
    }
    return c;
}

/* Case-insensitive substring search */
char *strcasestr_portable(const char *haystack, const char *needle) {
    size_t needle_len;

    if (*needle == '\0')
        return (char *)haystack;

    needle_len = strlen(needle);

    while (*haystack != '\0') {
        /* Check if needle matches at current position */
        size_t i;
        int match = 1;

        for (i = 0; i < needle_len && haystack[i] != '\0'; i++) {
            if (to_lower(haystack[i]) != to_lower(needle[i])) {
                match = 0;
                break;
            }
        }

        if (match && i == needle_len)
            return (char *)haystack;

        haystack++;
    }

    return NULL;
}

Why implement our own? strcasestr() is not part of C90 - it's a POSIX/GNU extension.

Multiple Filter Criteria

int matches = 1;  /* Assume match until proven otherwise */

/* Filter by category */
if (filter_category[0] != '\0') {
    if (strcasecmp_portable(entry->category, filter_category) != 0) {
        matches = 0;
    }
}

/* Filter by date */
if (matches && filter_date[0] != '\0') {
    if (strcmp(entry->date, filter_date) != 0) {
        matches = 0;
    }
}

/* Filter by text pattern */
if (matches && pattern[0] != '\0') {
    if (strcasestr_portable(entry->text, pattern) == NULL) {
        matches = 0;
    }
}

if (matches) {
    /* Entry passes all filters */
}

This "whittle down" approach applies filters incrementally.


cncount - Aggregation

Purpose: Count entries, optionally grouped by category or date.

Tracking Unique Values

typedef struct {
    char key[32];
    int count;
} CountEntry;

CountEntry counts[MAX_CATEGORIES];
int num_categories = 0;

void increment_count(const char *key) {
    int i;

    /* Look for existing key */
    for (i = 0; i < num_categories; i++) {
        if (strcmp(counts[i].key, key) == 0) {
            counts[i].count++;
            return;
        }
    }

    /* Add new key */
    if (num_categories < MAX_CATEGORIES) {
        strncpy(counts[num_categories].key, key, 31);
        counts[num_categories].key[31] = '\0';
        counts[num_categories].count = 1;
        num_categories++;
    }
}

This is a simple associative array. For small datasets, linear search is fine. Larger datasets would benefit from a hash table.


cndel - File Rewriting

Purpose: Remove entries by moving them to an archive file.

The Challenge

You cannot delete lines from the middle of a file in C. Instead:

  1. Read all entries into memory
  2. Write non-deleted entries to a temporary file
  3. Append deleted entries to archive
  4. Replace original with temporary

Safe File Replacement

/* Read all entries */
Entry entries[MAX_ENTRIES];
int count = read_all_entries(entries, source_path);

/* Open files */
FILE *temp = fopen(temp_path, "w");
FILE *archive = fopen(archive_path, "a");

/* Write entries to appropriate files */
for (i = 0; i < count; i++) {
    if (should_delete(&entries[i])) {
        write_entry(archive, &entries[i]);
        deleted_count++;
    } else {
        write_entry(temp, &entries[i]);
    }
}

fclose(temp);
fclose(archive);

/* Replace original with temp */
remove(source_path);
rename(temp_path, source_path);

Why archive instead of delete? The immutable-log philosophy means data is never truly lost - it's just moved to a different file.

Confirmation Prompts

char response[10];

printf("Delete %d entries? (y/n): ", count);
fflush(stdout);  /* Ensure prompt appears before input */

if (fgets(response, sizeof(response), stdin) != NULL) {
    if (response[0] == 'y' || response[0] == 'Y') {
        /* Proceed with deletion */
    }
}

fflush(stdout) ensures the prompt is displayed before waiting for input. Without it, buffered I/O might delay the prompt.


Memory Management

The Golden Rules

  1. Check malloc() return value

    ptr = malloc(size);
    if (ptr == NULL) {
        /* Handle error */
    }
    
  2. Free what you allocate

    Entry *entries = malloc(...);
    /* ... use entries ... */
    free(entries);  /* Always free before return */
    
  3. Don't use after free

    free(entries);
    entries = NULL;  /* Prevent accidental use */
    
  4. Match allocations to deallocations Every malloc() needs exactly one free().

Stack vs Heap

Stack Heap
Automatic allocation Manual allocation
Fixed size (~64KB DOS, ~1MB modern) Limited by system memory
Fast allocation Slower allocation
Automatic cleanup Must call free()
void function(void) {
    char buffer[100];      /* Stack - automatic */
    char *data = malloc(100);  /* Heap - manual */

    /* buffer freed automatically when function returns */
    free(data);  /* Must free explicitly */
}

String Handling in C90

String Basics

C strings are arrays of char terminated by '\0' (null character).

char str[10] = "Hello";
/* Memory: ['H','e','l','l','o','\0',?,?,?,?] */
/*          0   1   2   3   4   5   6 7 8 9  */

Safe String Functions

Unsafe Safe Notes
gets() fgets() Always use fgets
strcpy() strncpy() Specify max length
sprintf() snprintf()* *Not in C90

strncpy() gotcha:

char dest[10];
strncpy(dest, source, 9);
dest[9] = '\0';  /* strncpy may not null-terminate! */

If source is longer than 9 characters, strncpy() won't add a null terminator. Always add it manually.

String Length vs Buffer Size

char buffer[100];  /* Buffer SIZE is 100 */
strcpy(buffer, "Hello");
/* String LENGTH is 5 (not counting '\0') */
/* strlen(buffer) returns 5 */

Always allocate strlen(str) + 1 bytes for a copy.


Cross-Platform Considerations

Line Endings

System Line Ending
Unix/Linux/macOS \n (LF)
Windows \r\n (CRLF)
Classic Mac \r (CR)

When reading with fgets(), the line ending is included. You may need to strip it:

char *newline = strchr(line, '\n');
if (newline) *newline = '\0';

char *cr = strchr(line, '\r');
if (cr) *cr = '\0';

Path Separators

Handled by PATH_SEPARATOR and PATH_SEP_STR macros in platform.h.

Environment Variables

System Home Directory
Unix HOME
Windows USERPROFILE
DOS None standard

The HOME_ENV macro abstracts this.

Integer Sizes

C90 only guarantees minimums:

  • char: at least 8 bits
  • short: at least 16 bits
  • int: at least 16 bits
  • long: at least 32 bits

For portable code, don't assume int is 32 bits (it's 16 bits on DOS).


Summary

The cnotes codebase demonstrates several important C90 patterns:

  1. File I/O: Opening, reading line-by-line, writing formatted data, closing
  2. Parsing: State-machine approach with pointer advancement
  3. Memory: malloc/free for large data, stack for small buffers
  4. Strings: Careful length tracking, null termination
  5. Portability: Preprocessor conditionals for platform differences
  6. Error Handling: Check every return value

These patterns form the foundation of robust C programming and are still relevant in modern systems programming.