From 9c0850daf77374988990bee4d2b419e902e66480 Mon Sep 17 00:00:00 2001 From: Manuel Plank Date: Thu, 27 Jul 2023 11:30:00 +0200 Subject: [PATCH 1/7] add llama2.c-android to readme --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7b6251e..12b4be5 100644 --- a/README.md +++ b/README.md @@ -189,6 +189,7 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg - [llama2.rs](https://github.com/gaxler/llama2.rs) by @gaxler: a Rust port of this project - [go-llama2](https://github.com/tmc/go-llama2) by @tmc: a Go port of this project +- [llama2.c-android](https://github.com/Manuel030/llama2.c-android): by @Manuel030: adds Android binaries of this project ## unsorted todos From abfcdf141ea5014262d00ede7d85021f0c5713cb Mon Sep 17 00:00:00 2001 From: Mathias Arens Date: Thu, 27 Jul 2023 13:05:32 +0200 Subject: [PATCH 2/7] Improve readme: clarify dependencies and other things to install --- README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7b6251e..3274da0 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,8 @@ There is also an even better 110M param model available, see [models](#models). ## Meta's Llama 2 models -As the neural net architecture is identical, we can also inference the Llama 2 models released by Meta. Sadly there is a bit of friction here due to licensing (I can't directly upload the checkpoints, I think). So Step 1, get the Llama 2 checkpoints by following the [Meta instructions](https://github.com/facebookresearch/llama). Once we have those checkpoints, we have to convert them into the llama2.c format. For this we use the `export_meta_llama_bin.py` file, e.g. for 7B model: +As the neural net architecture is identical, we can also inference the Llama 2 models released by Meta. Sadly there is a bit of friction here due to licensing (I can't directly upload the checkpoints, I think). So Step 1, get the Llama 2 checkpoints by following the [Meta instructions](https://github.com/facebookresearch/llama). Once we have those checkpoints, we have to convert them into the llama2.c format. +For this we need to install the python dependencies (`pip install -r requirements.txt`) and then use the `export_meta_llama_bin.py` file, e.g. for 7B model: ```bash python export_meta_llama_bin.py path/to/llama/model/7B llama2_7b.bin @@ -50,7 +51,7 @@ The export will take ~10 minutes or so and generate a 26GB file (the weights of ./run llama2_7b.bin ``` -This ran at about 4 tokens/s compiled with OpenMP on 96 threads on my CPU Linux box in the cloud. (On my MacBook Air M1, currently it's closer to 30 seconds per token if you just build with `make runfast`.) Example output: +This ran at about 4 tokens/s compiled with [OpenMP](#OpenMP) on 96 threads on my CPU Linux box in the cloud. (On my MacBook Air M1, currently it's closer to 30 seconds per token if you just build with `make runfast`.) Example output: > The purpose of this document is to highlight the state-of-the-art of CoO generation technologies, both recent developments and those in commercial use. The focus is on the technologies with the highest merit to become the dominating processes of the future and therefore to be technologies of interest to S&T ... R&D. As such, CoO generation technologies developed in Russia, Japan and Europe are described in some depth. The document starts with an introduction to cobalt oxides as complex products and a short view on cobalt as an essential material. The document continues with the discussion of the available CoO generation processes with respect to energy and capital consumption as well as to environmental damage. @@ -141,7 +142,9 @@ gcc -Ofast -o run run.c -lm You can also experiment with replacing `gcc` with `clang`. -**OpenMP** Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention. You can compile e.g. like so: +### OpenMP +Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors. +You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). Then you can compile e.g. like so: ```bash clang -Ofast -fopenmp -march=native run.c -lm -o run From bddde3398a6e7a3bc388b2c07f4f4ce45603b265 Mon Sep 17 00:00:00 2001 From: Aydyn Tairov Date: Thu, 27 Jul 2023 14:38:28 +0100 Subject: [PATCH 3/7] add Makefile option to support builds on amazon linux & centos --- Makefile | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/Makefile b/Makefile index c906b08..41f6ef9 100644 --- a/Makefile +++ b/Makefile @@ -36,6 +36,15 @@ runomp: run.c win64: x86_64-w64-mingw32-gcc-win32 -Ofast -D_WIN32 -o run.exe -I. run.c win.c +# compiles with gnu99 standard flags for amazon linux, coreos, etc. compatibility +.PHONY: rungnu +rungnu: + $(CC) -Ofast -std=gnu11 -o run run.c -lm + +.PHONY: runompgnu +rungnu: + $(CC) -Ofast -fopenmp -std=gnu11 run.c -lm -o run + .PHONY: clean clean: rm -f run From 2566ddf74498125ef47a6f15421bbaa5ac264288 Mon Sep 17 00:00:00 2001 From: Aydyn Tairov Date: Thu, 27 Jul 2023 15:00:27 +0100 Subject: [PATCH 4/7] add README section for centos 7 & amazon linux make target --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 7b6251e..365acbf 100644 --- a/README.md +++ b/README.md @@ -159,6 +159,8 @@ Depending on your system resources you may want to tweak these hyperparameters. On **Windows**, use `build_msvc.bat` in a Visual Studio Command Prompt to build with msvc, or you can use `make win64` to use mingw compiler toolchain from linux or windows to build the windows target. MSVC build will automatically use openmp and max threads appropriate for your CPU unless you set `OMP_NUM_THREADS` env. +On **Centos 7**, **Amazon Linux 2018** use `rungnu` Makefile target: `make rungnu` or `make runompgnu` to use openmp. + ## ack I trained the llama2.c storyteller models on a 4X A100 40GB box graciously provided by the excellent [Lambda labs](https://lambdalabs.com/service/gpu-cloud), thank you. From acf1e18e8ffa7cd841c4abbc9f19d2799fcb33e1 Mon Sep 17 00:00:00 2001 From: Aydyn Tairov Date: Thu, 27 Jul 2023 15:38:45 +0100 Subject: [PATCH 5/7] remove second ifdefs for windows timing by introducing ported version of clock_gettime --- run.c | 6 ------ win.c | 8 ++++++++ win.h | 11 ++++++++--- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/run.c b/run.c index 80b0b00..eb344e2 100644 --- a/run.c +++ b/run.c @@ -365,15 +365,9 @@ int argmax(float* v, int n) { // ---------------------------------------------------------------------------- long time_in_ms() { -#if defined _WIN32 - // windows specific way to get time - return GetTickCount(); -#else - // linux specific way to get time struct timespec time; clock_gettime(CLOCK_REALTIME, &time); return time.tv_sec * 1000 + time.tv_nsec / 1000000; -#endif } int main(int argc, char *argv[]) { diff --git a/win.c b/win.c index 7f87265..cba8c0c 100644 --- a/win.c +++ b/win.c @@ -176,3 +176,11 @@ int munlock(const void *addr, size_t len) return -1; } + +// Portable clock_gettime function for Windows +int clock_gettime(int clk_id, struct timespec *tp) { + DWORD ticks = GetTickCount(); + tp->tv_sec = ticks / 1000; + tp->tv_nsec = (ticks % 1000) * 1000000; + return 0; +} diff --git a/win.h b/win.h index 1a66a83..11345d8 100644 --- a/win.h +++ b/win.h @@ -3,6 +3,7 @@ #define WIN32_LEAN_AND_MEAN // Exclude rarely-used stuff from Windows headers #include +#include // Below code is originally from mman-win32 @@ -12,9 +13,9 @@ * mman-win32 */ -#ifndef _WIN32_WINNT // Allow use of features specific to Windows XP or later. -#define _WIN32_WINNT 0x0501 // Change this to the appropriate value to target other versions of Windows. -#endif +#ifndef _WIN32_WINNT // Allow use of features specific to Windows XP or later. +#define _WIN32_WINNT 0x0501 // Change this to the appropriate value to target other versions of Windows. +#endif /* All the headers include this file. */ #ifndef _MSC_VER @@ -47,12 +48,16 @@ extern "C" { #define MS_SYNC 2 #define MS_INVALIDATE 4 +/* Flags for portable clock_gettime call. */ +#define CLOCK_REALTIME 0 + void* mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off); int munmap(void *addr, size_t len); int mprotect(void *addr, size_t len, int prot); int msync(void *addr, size_t len, int flags); int mlock(const void *addr, size_t len); int munlock(const void *addr, size_t len); +int clock_gettime(int clk_id, struct timespec *tp); #ifdef __cplusplus }; From 343572675f502656798e1af7ff43f3c1b1594fc7 Mon Sep 17 00:00:00 2001 From: Aydyn Tairov Date: Thu, 27 Jul 2023 16:30:22 +0100 Subject: [PATCH 6/7] minor whitespaces cleanup --- win.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/win.h b/win.h index 11345d8..12c7f39 100644 --- a/win.h +++ b/win.h @@ -13,9 +13,9 @@ * mman-win32 */ -#ifndef _WIN32_WINNT // Allow use of features specific to Windows XP or later. -#define _WIN32_WINNT 0x0501 // Change this to the appropriate value to target other versions of Windows. -#endif +#ifndef _WIN32_WINNT // Allow use of features specific to Windows XP or later. +#define _WIN32_WINNT 0x0501 // Change this to the appropriate value to target other versions of Windows. +#endif /* All the headers include this file. */ #ifndef _MSC_VER From b4b9ef5c6cae7658f7dcfff3d99428022101d8fe Mon Sep 17 00:00:00 2001 From: Aydyn Tairov Date: Tue, 25 Jul 2023 22:20:07 +0100 Subject: [PATCH 7/7] add github actions workflow to validate builds on changes in *.c, *.h files --- .github/workflows/build.yml | 96 +++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 .github/workflows/build.yml diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml new file mode 100644 index 0000000..97fd677 --- /dev/null +++ b/.github/workflows/build.yml @@ -0,0 +1,96 @@ +name: Continuous Integration + +on: + push: + branches: + - master + paths: ['.github/workflows/**', '**/Makefile', '**/*.c', '**/*.h'] + pull_request: + types: [opened, synchronize, reopened] + paths: ['**/Makefile', '**/*.c', '**/*.h'] + +env: + BRANCH_NAME: ${{ github.head_ref || github.ref_name }} + +jobs: + # check basic builds to avoid breaking changes + ubuntu-focal-make: + runs-on: ubuntu-20.04 + + steps: + - name: Clone + id: checkout + uses: actions/checkout@v3 + + - name: Dependencies + id: depends + run: | + sudo apt-get update + sudo apt-get install build-essential -y + + - name: Build + id: make_build + run: | + make + + - name: Build runfast + id: make_build_runfast + run: | + make runfast + + macOS-latest-make: + runs-on: macos-latest + + steps: + - name: Clone + id: checkout + uses: actions/checkout@v3 + + - name: Dependencies + id: depends + continue-on-error: true + run: | + brew update + + - name: Build + id: make_build + run: | + make + + - name: Build runfast + id: make_build_runfast + run: | + make runfast + + - name: Build clang + id: make_build_clang + run: | + make run CC=clang + + windows-latest-make: + runs-on: windows-latest + + strategy: + matrix: + arch: + - amd64 + - amd64_x86 + - amd64_arm64 + + steps: + - name: Clone + id: checkout + uses: actions/checkout@v3 + + - name: Setup MSBuild + uses: microsoft/setup-msbuild@v1 + + - name: Setup MSVC ${{ matrix.arch }} + uses: ilammy/msvc-dev-cmd@v1 + with: + arch: ${{ matrix.arch }} + + - name: Build ${{ matrix.arch }} + id: build_msvc + run: | + .\build_msvc.bat