The file structure of a software project
Abstract: You know, software engineering is a discipline. Here's what I know about structuring a software project.
Topics: software engineering
© Copyright Daniel Krajzewicz, 02.04.2020 23:50, cc by-nc-nd
I have written different application along the time. I usually try to set them up in most professional way — I am a software engineer. In the following, you will find some notes about what components belong to a complete software project. In consequence, you'll find a proposal for a set-up of folders to structure your software project.
A software project repository should contain the following things/items/artifacts:
- the source code, usually abbreviated as src;
- a documentation (docs); if something is not documented, it does not exist;
- tools that help to build/compile the code;
- tests, as a software's only good if it's tested;
- an additional data directory;
- external libraries (libs) or other information that has not been generated by you;
- maybe your software needs additional tools?
- A final destination / deployment folder that usually contains the final binaries.
These folders should be almost sufficient for structure a project. In the following, I will describe the purpose of each of those entries in the following sub-sections.
Of course, the source code, or “src” for short is what it's all about. It's what we are aiming for — to write an application that fulfils things for us. One could argue that only code is needed. This would be the case if every one of us had unlimited time and learning, not production, would be what our culture would be built upon. That's not the case. So yes, code's the most important, but not the only thing.
There are some qualities the code should fulfil:
- Compile: A code in a repository should be always compilable. It may have bugs, but it has to compile. If you really have to store an in-between version of something, which cannot be compiled (yet) then use a branch or a fork.
- Documentation: The code itself should be documented, both, using Doxygen or any other method for describing the tasks and parameters of the functions / methods / modules, as well as using in-line documentation which outlines what is currently done.
- Modules: If you have more than ten classes, you probably should start to think about having modules. Each module should have a specific purpose, e.g. reading data, writing data, computation. The different modules are usually stored in individual subfolder of the src folder with the code containing the application's/applications' main function(s) remaining at the top level. Please note that the module structure may be hierarchic, including sub-modules.
Everything that describes what the hell you've done belongs to the documentation, the “docs”. You should think in terms of:
- user documentation, because it's not you for whom you are writing the software and a potential user should be allowed to know what your software is about, why, and maybe also how;
- developer documentation — not for the others, for you!
- ads, web pages, and other public relations as you have to find and motivate a user before he reads the documentation.
Many programming languages need some kind of a compilation compilation system for building an executable application from a source code and external libraries. A good practice is to use a “build” system for this purpose. And, to allocate it in an own folder.
You should have “tests”. Tests are great, tests are fun. Maybe the tests will be the only one who knows that your application really works.
You should supply some example “data”, located consequently in data/examples. And sometimes, you will have an application that needs a certain data set every time. So, you may need a data folder.
In the case your application uses modules that have been generated by you, you should include them directly via SVN or GIT externals or whatever. Note that both, these modules as well as your application should support the same build settings for a seamless build.
Yet, you will often need some external libraries as well. In this case, you should offer a direct access to them. The most convenient solution for a user is to have them as a direct download, in batch. For development purposes, you may have them available in your project tree. “libs” is a good place. When using a revision system, you should add the original binaries, as well as the information where they are from. The building system should be set up in a way that after extracting the libraries within the libs subfolder you may directly link them.
missing/todo: installed libraries under Linux.
Especially when working on rather unknown types of applications, it is often convenient for the author to support some functionalities, e.g. the import or preparation of specific files via additional scripts. The “tools” folder is a good place for them. Please note that the tools should be included within the user documentation.
As said, it makes for different reasons to have an own folder for the built binaries. The “bin” folder is the proper place. Yet, you may think of setting up a “deployment” folder instead where besides the binaries, you put all the stuff that shall be distributed when releasing the software.
Well, a good practice is to a) only to improve code, not committing experiments done for code improvement, b) to use a revision system for keeping a clean, compilable, executable and tested code base, c) to use branches for dealing with different aspects or experimenting with the code. Doing so, you should not be capable to sit on some out-of-date files.
If you nonetheless have something you want to keep, but it's not capable to be included or even outdated, you may — MAY! SHOULD NOT! — put them into an “attic” folder.
So that was a small guideline on how to step up a software project structure. Of course, you may deviate from this, but it has proved to be valid across a large set of projects I've written during the last 20 or so years.
May the code run!