Skip to content

size of repo and generated files/translated_images from Co-op Translator #1048

@bmerkle

Description

@bmerkle

I recently had to get a clean fork of the project again and was a bit puzzled about the grown size I did a bit analysis and want to share the following obervations:

I am curious to understand why there is such a heavy generation of translated_images (see the folder)
IMO, these are generated files which should not necessarily be put under version control but rather into the output (maybe we should think about a publishing pipeline which then included the whole translations)

The same question is about translations (lot of files, it does not cause size problems (as IMO translated_images) but do we have to store generated files in the repo ? I always tell my students not to store them, because they can be reproduced via the tool chain and they are mainly output (like obj files or dll or lib files or clang generated tables).

So here are some numbers:

  • the main repo has 1GB data (without translated images)
  • the "translated_images" folder has 3 GB ! it can not be even displayed in github webpage anymore (you get an error message because > 6000 files, it can not display any git information anymore)
  • the "translated" folder has about 4000 files, while the whole rest of the project has 1800 files for the content.

I really like Co-op Translator as it save a lot of work, however i think we should not store its output in git.
Rather push it into a container or a some binary which we store in artifactory or the github registry..

what do you think about this ?

Below are some screenshots from WizTree which shows that "translated_images" and "translation" contribute significantly to the size. There are also my downloaded LLMs for chapter 19-slnm, the python venv and the git repo store itself above those numbers, but then these two output folder immediatly follow.

Image

Here is a picture of the github UI which has problems.
https://github.com/microsoft/generative-ai-for-beginners/tree/main/translated_images

Image

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions