(** * Cochabamba Quechua Plurals

Below is a morphological fragment of in Lexical Proof Morphology of an
overabundant plural pattern found in Cochabamba Quechua.

In some dialects of Cochabamba Quechua the borrowed Spanish plural -s
occurs on all nominal stems that end in a vowel. There is also a
native plural -kuna, that occurs on nominal stems generally. These
plural may also co-occur in varying order on the stem. So for a vowel
stem, like warmi, 'woman', warmi-s, warmi-kuna, warmi-s-kuna,
warmi-kuna-s are all predicted to occur, though not with equal
likelihood, necessarily. A stem that ends in a consonant, like sipas,
'young woman', is predicted to have the plural forms sipaskuna and
sipaskunas for some speakers. There is also a possible semantic
constraint on the distribution of the Spanish plural, where the -kuna
plural is more appropriate for contrastive subjects. Of course, there
are also other dialectal patterns and speaker variation. The fragment
below captures one observable pattern. *)

Require Import Coq.Lists.List.
Require Import Coq.Strings.String.
Import ListNotations.

Open Scope type_scope.
Open Scope string_scope.

(** * Initial axioms of the fragment

Below I introduce the basic lexemes and categories of the
morphological fragment.

** Lexemes

First I define some basic lexemes. Lexemes have no inherent
properties. They serve a role similar to foriegn keys in a relational
database. They are a key for retrieving and defining collections of
information. For instance ( LEXEME_1, x ), ( LEXEME_1, y ) puts both x
and y in the same collection, where the collection is defined in terms
of the second elements of tuples where the first element is the same
lexeme.

The lexeme WARMI corresponds to a nominal with a stem ending in a
vowel. The lexeme SIPAS corresponds to a lexeme ending in a
consonant. *)

Inductive lexeme : Set :=
| WARMI : lexeme (* woman *)
| SIPAS : lexeme. (* young woman *)

(** ** Morphological categories

The type matom is for atoms used in m-category names or mcats,
which I compose like lists. Mcats are morphological categories, which
categorize observable analogical patterns in morphological form. For
instance, though warmis, warmikuna, warmiskuna and warmikunas
correspond to two syntactic categories, one for basic plural and one
for a contrastive subject, but there are, at minimum, four
morphological categories, one for each form. In practice, there are
more categories to capture the "morphotactics" or the allowable
patterns of affix coocurence within words of the system and to allow
for generalizations when mapping between syntactic categories.

In fact, what one "sees" in this system are not the categories,
themselves, which are conceptually atomic but a means of referring to
them with names. Think of the name John Smith as a category name and
Smith as analogous to an matom. Just as the last name Smith can belong
to an entire family or even multiple people in multiple families, each
person with Smith in their name is an individual and could,
potentially be named something else. Likewise, having Smith as a last
name, one of the most frequent last names in the English language,
does not entail family membership. Despite this, the system of naming
children using their parents' last names does simplify the naming of
children. Therefore if it were a rule that every child needed to be
named as a composition of the last name of a particular parent and
some individuated name, the rules of naming are greatly simplified,
even if nothing can be said to be entailed about the properties of
that individual by having a particular last name. The category naming
scheme used here, where names are lists of matoms allows for
flexibility and simplification in naming categories using rules but,
at least at this time, no rule applies because it has a particular
sub-element or sub-list, i.e.\ nothing is entailed about the category
just because a sub-element of its name is one thing or another. This
is a fundamental difference between this scheme and a feature system,
for instance. *)

Inductive matom : Set :=
| nbase : matom
| mplural : matom
| s : matom
| kuna : matom
| kbase : matom
| sbase : matom
| ta : matom.

(** As stated above, mcat is a list of matoms. *)

Definition mcat := list matom.

** Mcats for the fragment *)

(* [ nbase ] is a basic noun stem.  [ kbase ] is a stem that can take
   a -kuna affix.  [ sbase ] is a stem that can take a -s affix.  [
   kuna ] is any form that contains -kuna.  [ mplural ] is any form
   with a plural affix.  [ s , nbase ] is a basic stem with only an -s
   affix.  [ kuna, nbase ] is a basic stem with only a -kuna affix.
   The other two mcats contain both suffixes in differing order.

The kuna forms are hypothesized to have a different distribution than
the -s only form. They are (possibly) more compatible with focus
constructions in subject position.

A take away for the above is that nearly every form type has its own
category, at least in this analysis of this phenomenon. Labeling form
types is not the only role of these categories. Some categories, such
as [ sbase ] and [ kbase ] are only directly relevant to
morphotactics. Other categories, such as [ kuna ] are more relevant to
the interface with the syntactic paradigm. *)

(** ** Morphological forms

An mform, or morphological form, is a string *)

Definition mform := string.

(** Below are some string functions to handle suffixation. *)

Definition suffix_s (stem:mform) : mform :=
  stem ++ "s".

Definition suffix_kuna (stem:mform) : mform :=
  stem ++ "kuna".

Definition suffix_ta (stem:mform) : mform :=
  stem ++ "ta".

(** ** The structure of morphological paradigm entries

An mtrip is a data structure used to define Fentrys, which are
"form entries" aka morphological paradigm entries. There could
potentially be many non-Fentrys that have the type of an
mtrip. Nothing constrains this type to only be reasonable
entities. One of the purposes of the theory is to specify what
morphological forms are valid in a particular language. This is
similar to the formal language notion of defining the characteristic
function of a subset of all string combinations that correspond to a
grammar.

Here the '*' corresponds to ×, for defining the types of pairs. *)

Definition mtrip := mcat * mform * lexeme.

(** The check below demonstrates why simply saying that we're working
with mtrips is not good enough. Non-grammatical triples check. We need
to be able to specify which are legal triples. *)

Check ( [ kuna ] , "foot" , WARMI ).

(** Here I define Fentry, which is a predicate that is proveable when
an mtrip is a valid lexical entity. *)

Axiom Fentry : mtrip -> Prop.

(** Here I define the mtrip for the "warmi" stem. I am not handling
the phonological dimension that "sipas" requires, where -s cannot be
suffixed to a stem that ends in a consonant because I haven't built a
phonological abstraction into this example, yet. I could specify the
needed categorical distinction with a predicate. Such a predicate
would be true for some specified set of mforms and false for others
but it is more interesting to do it correctly and pattern match on the
string. *)

Definition warmi_nbase := ([ nbase ], "warmi", WARMI) : mtrip.

(** This requires an explicit axiom to be treated like an
Fentry. Other Fentries will be derived from it. *)

Axiom f_warmi_nbase : Fentry warmi_nbase.

(** * Morpho-Lexical relations

lem, for "less than or equal to morphological category" is the
order over morphological categories, or mcats. What you see below are
the basic properties of an order. These state the relation is
reflexive, meaning that x ≤ x; antisymmetric, meaning that if x ≤ y
and y ≤ x then x = y; transitive, meaning that if x ≤ y and y ≤ z then
x ≤ z. *)

Axiom lem : mcat -> mcat -> Prop.
Axiom lem_refl : forall x : mcat, lem x x.
Axiom lem_antisym : forall x y : mcat, lem x y -> lem y x -> x = y.
Axiom lem_trans : forall x y z : mcat, lem x y -> lem y z -> lem x z.

(** The order is not natural. There may be natural orders on the
underlying list type based on length but this is not how I use
mcats. As mentioned, though their structure is complex, they are
*names* of atomic categories.

Due to the fact that the order is not natural, each relation must be
explicitly specified, except those that are predictable from the
properties of the order. Below I specify those needed for this
fragment. *)

Axiom nbase_lem_kbase : lem [ nbase ] [ kbase ].
Axiom s_nbase_lem_kbase : lem [ s ; nbase ] [ kbase ].
Axiom nbase_lem_sbase : lem [ nbase ] [ sbase ].
Axiom kuna_nbase_lem_sbase : lem [ kuna ; nbase ] [ sbase ].
Axiom kuna_nbase_lem_kuna : lem [ kuna ; nbase ] [ kuna ].
Axiom kuna_s_nbase : lem [ kuna ; s ; nbase ] [ kuna ].
Axiom s_kuna_nbase : lem [ s ; kuna ; nbase ] [ kuna ].
Axiom kuna_lem_mplural : lem [ kuna ] [ mplural ].
Axiom s_nbase_lem_mplural : lem [ s ; nbase ] [ mplural ].

(** ** Morphotactic rules

In using the word "morphotactic", it may imply an imperative form
building interpretation. Though rules in this system tend to take as
input simpler forms and output more complex forms, the opposite is
possible. Defining rules in a form building direction is done for
practical reasons. Longer more complex forms are often less
frequent. If one wishes to base the assumed forms of a fragment on
something that is independently observable and to be consistent about
which forms are assumed as axioms across lexemes, frequently used
forms are preferable -- or perhaps something like the principal parts
of a paradigm. The intended interpretation of these rules is that they
declare that if it is the case that Fentry x exists, then Fentry y
exists but x does not underly y in any sense. Nor is it necessarily
contained as a subpart of y.

Below one can see the first morphotactic rule, which I call form-form
mappings. It specifies that anything within the category of [ sbase ]
can have its form suffixed with -s. The accessors that look like (fst
(fst mt)) are because the mtrip triple is actually a pair of a pair
and a lexeme, the structure is w=((x,y),z) so (fst (fst w)) is x. In
English the rule says, for any mtrip that is an Fentry, if the mcat is
less than or equal to [ sbase ] then there is a Fentry such that the
category is whatever was provided with matom s added to the front, an
mform with an affixed -s and an unchanged lexeme. *)

Axiom add_s : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ sbase ]) ->
                  Fentry (cons s (fst (fst mt)),
                          suffix_s (snd (fst mt)),
                          (snd mt)).

(** The rule below is very similar except that it is for the -kuna
containing forms. *)

Axiom add_kuna : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ kbase ]) ->
                  Fentry (cons kuna (fst (fst mt)),
                          suffix_kuna (snd (fst mt)),
                          (snd mt)).

(** The below specifies the morphological category for accusative
marked plurals. *)

Axiom add_plta : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ mplural ]) ->
                  Fentry ([ s ; ta ],
                          suffix_ta (snd (fst mt)),
                          (snd mt)).

(** The below specifies the morphological category for accusative
marked singulars. *)

Axiom add_ta : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ nbase ]) ->
                  Fentry ([ ta ],
                          suffix_ta (snd (fst mt)),
                          (snd mt)).

(** * Mock-up of the interface

Below I define some types for syntactic categories but I do not supply
full entries. This is because for the purposes of this simple
exposition it doesn't make sense to embed the actual types and terms
needed for Linear Categorial Grammar, which is the syntactic theory
that I assume.

In order to hold somewhat to the spirit of the complete theory, I
define Sentry which is the predicate for valid lexical entries. *)

Inductive syncat : Set :=
| nom : matom
| nom_pl : matom
| nom_foc_pl : matom
| acc : matom
| acc_pl : matom.

Axiom Sentry : syncat -> Prop.

(** ** Interface rules

I call these form-sign mappings because the lexical entries are called
signs in LCG.

The first maps uninflected forms to the unmarked nominative. The
second, assuming that the -kuna forms are not always focal, which is
consistent with corpora, maps all mplurals to nom_pl. The third maps
the kuna forms to nom_foc_pl. The last two map -ta marked forms to the
acc and acc_pl, respectively. *)

Axiom nbase_to_nom : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ nbase ]) -> Sentry nom.

Axiom mplural_to_nom_pl : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ mplural ]) -> Sentry nom_pl.

Axiom kuna_to_foc : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ kuna ]) -> Sentry nom_foc_pl.

Axiom ta_to_acc : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ ta ]) -> Sentry acc.

Axiom s_ta_to_acc_pl : forall
    (mt : mtrip), (Fentry mt) ->
                  (lem (fst (fst mt)) [ s ; ta ]) -> Sentry acc_pl.