9 minute read

Parallel Execution���������������������������������������������������������������������������������������������������������

Chapter 13 ■ Version Control

You can explore the web page and the features implemented there later—it is a good idea to know what it supports—but for now you can just use the repository here as a remote global repository. To clone it, you need the address in the field next to the button that says SSH. In my test repository, it is git@github. com:mailund/test.git. This is an address you can use to clone the repository using the SSH protocol.

Advertisement

git clone git@github.com:mailund/test.git

This is a protocol that you will have access to on many machines, but it involves you having to deal with a public/private key protocol. Check the documentation at https://help.github.com/articles/ generating-ssh-keys/ for setting up the SSH key at GitHub for learning more about this. It is mostly automated by now, and you should be able to set it up just by making a push and answering yes to the question you get there.

It is not the easiest protocol to work with, though, if you are on a machine that has HTTPS, which is the protocol used by your web browser for secure communication. You will almost certainly have that on your own machine, but depending on how firewalls are set up, you might not have access to it on computer clusters and such and then you need to use the SSH protocol. To use HTTPS instead of SSH, just click the SSH drop-down and choose HTTPS instead. This gives you a slightly different address, which you can use to clone. In my repository, I get https://github.com/mailund/test.git.

git clone https://github.com/mailund/test.git

If nothing goes wrong, you should be able to use the cloned repository just as the repositories you looked at previously, when you made your own bare/global repository.

You can also check out the repository and make an RStudio project at the same time by choosing File ➤ New Project in RStudio and selecting Version Control (the third option) in the dialog box that pops up. In the next window, choose Git and then use the HTTPS address as the Repository URL.

Moving an Existing Repository to GitHub

If you have already used git locally in a project and want to move it to GitHub, there is a little more you must do—t least if you want to move your repository including all the history stored in it and not just the current version of the source code in it.

First, you need to make a bare version of your repository. This is, as you saw a while ago, just a version of the repository without the source code associated.

If your repository is called repo, you can make a bare version of it, called repo.git, by cloning it:

git clone --bare repo repo.git

To move this to GitHub, create an empty repository there and get the HTTPS address of it. Then go into the bare repository we just made and run the following command:

cd repo.git git push --mirror <https address at github>

Now just delete the bare repository used to move the code to GitHub and clone the version from GitHub. Now you have a version from there that you can work on.

rm -rf repo.git git clone <https address at github>

299

Chapter 13 ■ Version Control

Installing Packages from GitHub

A very nice extra benefit you get from having your R packages on GitHub—in addition to having version control—is that other people can install your package directly from there. The requirements for putting packages on CRAN are much stricter than for putting R packages on GitHub, and you are not allowed to upload new versions to CRAN very often, so for development versions of your R package, GitHub is an excellent alternative.

To install a package from GitHub, you need to have the devtools package installed, as follows:

install.packages("devtools")

After that, you can install a package named packagename written by the GitHub user username with this command:

devtools::install_github("username/packagename")

Collaborating on GitHub

The repositories you make on GitHub are by default only editable by yourself. Anyone can clone them to get the source code, but only you can push changes to the repository. This is, of course, useful to prevent random people from messing with your code but prevents collaborations.

One way to collaborate with others is to give them write permissions to the repository. On the repository home page, you must select the Settings entry in the toolbar and then pick Collaborators in the menu on the left. After that, you get to a page where you can add collaborators identified by their user account on GitHub. Collaborators can push changes to the repository just as you can yourself. To avoid too much confusion, when different collaborators are updating the code, it is useful to have some discipline in how changes are merged into the master (and/or the develop) branch. One approach that is recommended and supported by GitHub is to make changes in separate branches and then use so-called pull requests to discuss changes before they are merged into the main branches.

Pull Requests

The workflow for making pull requests is to implement your new features or bug fixes or whatever you are implementing on separate branches from develop or master. Then, instead of merging them directly, you create what is called a pull request. You can start a pull request by switching to the branch on the repository home page and clicking the big green New Pull request button, or if you just made changes, you should also see a green Compare & Pull Request button that lets you start a pull request.

Clicking the button takes you to a page where you can name the pull request and write a description of what the changes in the code are doing. You also decide which branch you want to merge the pull into. Above the title you give the pull request, you can select two branches—the one you want to merge into (Base) and the branch you have your new changes on (Compare). You should pick the one you branched out of when you made the new branch. After that, you can create the pull request.

This simply creates a web interface for having a discussion about the changes you made. It is possible to see the changes on the web page and comment on them and make comments to the branch in general. At the same time, anyone can check out the branch and make their own modifications. As long as the pull request is open, the discussion is going, and people can improve on the branch.

When you are done, you can merge the pull request (using the big green Merge Pull Request button on the web page that contains the discussion about the pull request).

300

Chapter 13 ■ Version Control

Forking Repositories Instead of Cloning

Making changes to separate branches and then making pull requests to merge in the changes still requires writing access to the repository. This is excellent for collaborating with a few friends, but not ideal for getting fixes from random strangers—or for making fixes to packages other people write; people who won’t necessarily want to give you full write access to their software.

Not to worry, it is still possible to collaborate with people on GitHub without having write access to each other’s repositories. The way that pull-requests work, there is actually no need for branches to be merged to be part of the same base repository. You can merge branches from anywhere if you want to.

If you want to make changes to a repository that you do not have write access to, you can clone it and make changes to the repository you get as the clone, but you cannot push those changes back to the repository you cloned it from. And other users on GitHub can’t see the local changes you made (they are on your personal computer, not on the GitHub server). What you want is a repository on GitHub that is a clone of the repository you want to modify and that is a bare repository so you can push changes into it. You then want to clone that repository to your own computer. Changes you make to your own computer can be pushed to the bare repository you have on GitHub—because it is a bare repository and because you have writing access to it—and other users on GitHub can see the repository you have there.

Making such a repository on GitHub is called forking the repository. Technically, forking isn’t different from cloning—except that you’re making a bare repository—and the terminology is taken from open source software where forking a project means making your own version and developing it independently of previous versions.

Anyway, whenever you go to a repository home page on GitHub, you should see the Fork button at the top right—to the right of the name and branch of the repository you are looking at. Clicking the Fork button will make a copy of the repository that you have writing access to. You cannot fork your own repositories, although I’m not sure why you are not allowed to and, in most cases, you don’t want to do that anyway. You can also fork any repository at other user’s accounts.

Once you have made the copy, you can clone it to your computer and make changes to it, as you can with any other repositories. The only way this repository is different from a repository you made yourself, is that when you make pull requests, GitHub knows that you forked it off another repository. So when you make a pull request, you can choose not only the Base and Compare branches, but also the base fork and the head fork—the former being the repository you want to merge changes into, and the latter the repository where you made your changes. If someone forks your project and you make a pull request in the original repository, you won’t see the base fork and head fork choices by default, but clicking on the Compare Across Forks link when you make pull requests will enable them there as well.

If you make a pull request with your changes to someone else’s repository, the procedure is exactly the same as when you make a pull request to your own projects, except that you cannot merge the pull request after the discussion about the changes. Only someone with permission to write to the repository can do that.

The same goes if someone else wants to make changes to your code. They can start a pull request with their changes to your code but only you can decide to merge the changes into the repository (or not) following the pull discussion.

This is a very flexible way of collaborating—even with strangers—on source code development and one of the great strengths of git and GitHub.

Exercises

Take any of the packages you wrote earlier and create a repository on GitHub to host it. Push your code there.

301

This article is from: