Ontologies I am digressing a little in this post. One of the things I want to get out of this exercise is to learn more about Ontologies and Ontology editors, and on the principle that you can never learn something unless you build something with it (aka bone knowledge), so this is gathering my thoughts to get started on creating an Ontology for package building. Perhaps this has been done before, and better, but I'll probably learn more trying to create my own. Also, I am playing around with code, an odd melange of my package building porcelain, and gitpkg, and other ideas bruited on IRC, and I don't want to blog about something that would be embarrassing in the long run if some of the concepts I have milling around turn out to not meet the challenge of first contact with reality. I want to create a ontology related to packaging software. It should be general enough to cater to the needs any packaging effort in a distribution agnostic and version control agnostic manner. It should enable us to talk about packaging schemes and mechanisms, compare different methods, and perhaps to work towards a common interchange mechanism good enough for people to share the efforts spent in packaging software. The ontology should be able to describe common practices in packaging, concepts of upstream sources, versioning, commits, package versions, and other meta-data related to packages. I am doing this ontology primarily for myself, but I hope this might be useful for other folks involved in packaging software. So, here follow a set of concepts related to packaging software: software is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system. software is what we are trying to package software has names software has associated with it source code source code is any collection of statements or declarations written in some human-readable computer programming language. source code is usually held in one or more text files (blobs). A large collection of source code files may be organized into a directory tree, in which case it may also be known as a source tree. The source code may be converted into an executable format by a compiler, or executed on the fly from the human readable form with the aid of an interpreter. executable format is the form software must be in in order to be run. Running means to cause a computer "to perform indicated tasks according to encoded instructions." software source code has one or more lines of development. Some Common specific lines of development for the software to be packaged are: upstream line of development feature branch is a line of development related to a new feature under development. Often the goal is to merge the feature branches into the upstream line of development usually, all feature branches are merged into the integration branch, and the package is created from the integration branch. integration branch is the line of development of software that is to be packaged some software lines of development have releases releases have release dates some releases have release versions source code may be stored in a version control repository, and maintain history. Trees are a collection of blobs and other trees (directories and sub-directories). A tree object describes the state of a directory hierarchy at a particular given time. Blobs are simply chunks of binary data - they are the contents of files. a tree can be converted into an archive and back In git, directories are represented by tree object. They refer to blobs that have the contents of files (file name, access mode, etc is all stored in the tree), and to other trees for sub-directories. Commits (or "changesets") mark points in the history of a line of development, and references to parent commits. A commit refers to a tree that represents the state of the files at the time of the commit. A working directory is a directory that corresponds, but might not be identical, to a commit in the version control repository Commits from the version control system can be checked out into the working directory uncommitted changes are changes in the working directory that make it different from the corresponding commit. Some call the working directory to be in a "dirty" state. uncommited changes be checked in into the version control system, creating a new commit The working directory may contain a ignore file ignore file contains the names of files in the working directory that should be "ignored" by the version control system. In git, a commit may also contains references to parent commits. If there is more than one parent commit, then the commit is a merge If there are no parent commits, it is an initial commit references, or heads, or branches, are movable references to a commit. On a fresh commit, the head or branch reference is moved to the new commit. lines of development are usually stored as a branch in the version control repository. a patch is a file that contains difference listings between two trees. A patch file can be used to transform (patch) one tree into another (tree). A quilt series is a method of representing an integration branch as a collection of a series of patches. These patches can be applied in sequence to the upstream branch to produce the integration branch. A tag is a named reference to a specific commit, and is not normally moved to point to a different commit. A package is an archive format of software created to be installed by a package management system or a self-sufficient installer, derived by transforming a tree associated with an integration branch. packages have package names package names are related to upstream software names packages have package versions package versions may have an upstream version component a distribution or packaging specific component package versions are related to upstream software versions helper packages provide libraries and other support facilities to help compile an integration branch ultimately yielding a package Author: Manoj Srivastava manoj.srivastava@stdc.com