Hacktoberfest Retrospective (OSD600)

Over the past month, I have taken part in an event called Hacktoberfest, taking some first steps into the collaborative world of open source programming. The goal was to make 5 pull requests on github.com over the month of October, earning a T-shirt and credit toward my Open Source Development course. A list of Pull Requests and their associated Issues is provided below.

Issue 1        PR 1           Blog Post 1

Issue 2        PR 2           Blog Post 2

Issue 3        PR 3           Blog Post 3

Issue 4        PR 4           Blog Post 4

 

Unfortunatley, I did not finish all 5 Pull Requests. Nevertheless, I think the entire process was still quite positive. There were a lot of things I learned along the way:

 

  1. Open Source isn’t scary. (You can contribute too!)

It can be pretty intimidating to get started in the open-source world. For someone who doesn’t have much experience in building anything real  or useful, it can feel like there is always more you need to learn before you get to work on such projects. But it turns out, you can always find projects that need help, whether those contributions are large or small, they will be appreciated. If you make a mistake, things can be reversed, and people will help you understand what went wrong. Furthermore, people will hold off on solving certain problems so that a beginner might have the chance to use it as a learning opportunity. This makes everything more welcoming, and gives programmers of all skill levels the freedom to find their own way to contribute

 

2.  Get your life in order

It goes without saying that if I couldn’t finish all the required PRs, then something went wrong. It wasn’t a time management problem in the sense that I forgot the work or overlooked it, but in the end it is the results that matter. If I were more strict with my time for even one day, I think I could have avoided this issue. Even if you are feeling tired and unproductive, working at 30% capacity can be better than not working at all.

 

3. Don’t expect things to be as quick or as simple as you expect

Whether it is due a typo, or a deep misunderstanding of how the code functions, there will be times where you think requires 30 seconds or less of work. Complications can appear at every turn. The code rewrite might be just one line, but installing the package in order to run the build locally could take forever (if you get stuck behind a wall of errors). Making a small change might break things that it shouldn’t, and some things might be intentionally written a certain way for a certain reason. You might make what seems like an obvious upgrade only to find out that it needs to be adjusted. You might end up making 8 commits instead of 1, even though you thought it was so easy. Which leads me to…

 

4. The PR isn’t the end of the story

Although for this event, the goal was to make Pull Requests, it is possible that you may need to make adjustments before it is merged into the project. Although I knew this going in, it was surprising how even a minor change could lead to a long back-and-forth or significant rewrite of code. This turned into an amount of debt going forward, as  I needed to continue and find more issues despite not having fixed the Pull Requests that lay waiting to be fixed. There are still some that I intend to go back to, but it may take some time.

 

Having learned from this experience, there are a few things I would change if I had to do this all over again.

The first is finding an easier way to find the types of issues I wanted. I found it quite difficult to search when half of the issues in the search were on repositories for “easy hacktoberfest PRs”, or for code challenges. Although code challenges are not without value, a lot of repositories using the hacktoberfest label didn’t seem to be in the spirit of an open-source software celebration. Many repos seemed to be created to break the event’s rules, too. My only solution to this would be to craft better searches, which I began to do towards the end of the month.

The other thing I would do is maximize the amount of time I could dedicate to it. Although I didn’t need an incredible amount of extra time to work on it, I think if I had fewer courses and better management of sleep and energy levels, I would have a more positive experience. I have found that I have many interesting courses this semester, all of which have an incredible amount of depth to them.  Beyond just completing my work, I would have loved to  play around with things in several of them, including my work in open source. By getting deeper involvement on these projects, and with less changing of gears as I switch my focus back and forth, I could have grown more.

 

Conclusion

I really enjoyed taking part in Hacktoberfest, and I will likely continue to find projects to work on in my spare time. I do however, expect to be very picky when picking my projects. Working on various types of projects, and getting more experience with git make me feel like more of a programmer, and this first push has made it easier to continue later on.

Advertisement

Hacktoberfest Part 4 (OSD600)

Hacktoberfest continues! Finding issues to fix on github.com projects has been surprisingly challenging. The amount of startup time on getting involved with a project really depends on a lot of things. Large projects, either in number of files or number of people, or even small ones in a language you don’t understand take some research before you can get involved. That isn’t to say it is impossible, but I have found it very easy to dismiss projects on these grounds unless there was something bigger attracting me to a project and making it seem worth the burden.

This time my issue was writing documentation for Android JetPack, explaining how to use EmojiCompat. EmojiCompat allows android apps to use Emojis, including ones that would normally not render because the font package included with the operating system version does not include the newer Emojis that have been released. A video was provided with a rundown on how it is used, and I based the documentation on that.

While not technically challenging, I did find it difficult to describe what was happening in the mandated commenting of code examples. Perhaps I stuck too much to writing code and not enough prose.

Hacktoberfest Part 3 (OSD600)

Continuing with finding issues for Hacktoberfest , I came across an issue  on a repository  for a project that allows users to manipulate images in a way that resembles works of the artist Kensuke Koike.  In creating the project, the author decided to limit the size of user loaded images via cropping. Since most users are unlikely to want their images cropped, it was requested that this be replaced with image scaling.

I installed the project files, and necessary packages, however it didn’t build properly. So I ended up manually installing parcel-builder globally on my computer, which allowed me to build the project and test it out. Finding a large enough photo, I did in fact confirm that cropping was an issue and that it hadn’t secretly been fixed already. Creating a new git branch, I started to get to work.

My expectation was that I would have to find a new library, or to write up an entirely new method to get this done. However, it seemed that, bluimp-load-image, the library already being used already had this functionality.

The original code looked something like this:

document.getElementById('file-input').onchange = function (e) {
  loadImage(
      e.target.files[0],
      imageSetup,
      {maxWidth: 2000} // Options
  );
};

After finding scaling methods in the library that loadImage was from, I found that it could be fixed with very few changes:

document.getElementById('file-input').onchange = function (e) {
  loadImage(
      e.target.files[0],
      imageSetup
  ).scale(2000);
};

Getting ready to post my pull request, I paused for a moment. Scaling it down to a width of 2000 was the request, but is that what this was doing? If the image was smaller, would it be scaled up? That is probably a waste of time, as the limit was imposed to prevent slowdown in the first place.

So I replaced it with this:

document.getElementById('file-input').onchange = function (e) {
  loadImage(
    e.target.files[0],
    imageSetup
  ).scale({maxWidth: 2000});
};

Checking again that everything worked properly, I tested with a few more images and was confident that my solution was satisfactory.

Hacktoberfest Part 2 (OSD600)

Continuing my Hacktoberfest journey, I kept searching for new projects to contribute to. I did find a handful of promising prospects, such as mtgatracker, but I decided to hold off on starting on it until I could take a better look at how it works.

In the meantime, I found a small project that aimed to remake various board games, startin with Tic Tac Toe.  The issue I found was still relatively simple, but not to the point of being trivial. The task was to find places in the code where hard-coded symbols and magic numbers could be removed, thus making the code easier to maintain (and in my opinion, more readable). I wasn’t sure how hard I should be trying to do this, though. Were there cases where it is better leave as-is? Is there such thing as going overboard?

I began with the replacement suggested by the person who posted the issue. The character ‘.’ was used to represent empty squares that don’t yet hold an X or an O. I began by pulling it out and adding it in at the top of the file. I realized then that it was used in other files as well, so I had to move my defined constant elsewhere in the header file hierarchy. I tried to come up with a good name as well, so that the reading the code would flow well. That way it went from something like this:

c != '.'

to

c != EMPTY_SQUARE

I think it’s easier to figure out what the code here is trying to do this way, though it may be more difficult to see exactly how it is done (by comparing a character). With that done, I ended up replacing a few things with existing constants.

After that, I replaced a few more symbols, added a a use of an existing constant, and separated game states from -1, 0, 1 to human readable ones.

There was also a part where I thought I found a place to speed up the code (involving the ordering of conditions in an if statement. After some time, I realized that there would be no savings after all. All conditions were equally likely, so there was no exiting early by rearranging them.

Both before and after working on this issue, I tried the Tic Tac Toe game for myself. At first I thought it was broken, since it seg faulted whenever I put in my commands (separated by commas). It wasn’t until I realized that they needed to be separated by a newline instead of a comma that I got it to work properly. This could be either a failure in communicating to the user the expectation (shown as (row, column) in the instructions) or a failure in accepting a larger variety of user inputs. Either way, I may end up raising the issue myself, but only after confirming that it is one.

I submitted a Pull Request and sent it out to be judged. I still have to go back and make some changes to the pull request I made for part 1, but I may end up finishing that up later this week.

 

 

 

 

 

In-line assembly Lab (SPO600 Lab 6)

In this lab, we inspect an open source project that uses in-line assembly language in C files. The open source project I have selected is ardour.

This project contains 43 instances of assembly code used in-line (in .c, .h, .ccp files). In some this file, behaviors are defined on x86, ia64, s390, i386 i586 systems. The file uses assembly to get a count of CPU cycles. For each architecture, this is done in a different way and it decides which to use based on if/else logic. On platforms not included, such as sparc, arm, and m68k it gives a warning that the project does not support the platform yet.

Example of in-line assembly for x86_64

inline assembly blog

 

Since the information being accessed here is related to CPU itself, I think it makes sense to use assembly (which depends on the architecture of the CPU). However, I did find one source that said that this particular use is out of date. Nowadays, there are ways to do this that are built-in to compilers. If this is true then perhaps the project is simply old enough that it was useful to use assembly at the time.

Comparing Open Source Projects (SPO600 Lab 1)

This lab is about comparing two open source projects with different licenses, and comparing the process of contributing to them. I will be comparing the Bootstrap web framework (under the MIT license) and Mozilla FireFox (under the Mozilla Public License), by analyzing one software “patch”.

Bootstrap

The issue and patch. From start to finish, 4 people were involved in this patch. It began with someone claiming a bug, and other people verifying that it exists. The person who posted the bug posted a fix, and others commented on it and voiced any issues. One other person changed the code with their own commit. The issue was posted on may 23 and was resolved on July 15, totaling 63 days.

 

Mozzilla FireFox

This is the bug fix I chose to analyze. From the posting of the bug to resolution took 7 days in total, and took 3 people: the person working on the fix and two people reviewing it.

According to mozilla, the official process of contributing is:

  1. Build the project
  2. Find a bug
  3. Fix the bug
  4. Get code reviewed
  5. Respond to the review
  6. Automated testing

In this case, it does seem to match these steps. The review itself involved a reviewer pointing out errors and requesting certain changes to be made. A second review then came in and clarified that some of the issues the first reviewer had were not problems with the code being submitted, but a version problem on the reviewer’s side. The process seems relatively quick, and with enough collaboration to catch issues well.

Hacktoberfest part 1 (OSD600)

For the month of October, I will take part in an event called Hacktoberfest  for an assignment credit in a course I am taking at Seneca College in open source development. The event is held in order to support open source software and get people to contribute to the global programming community. Over the course of the month, if participants make at least 5 pull requests to any public repositories on github, they receive a free Hacktoberfest T-shirt.

Given a great amount of freedom, it was difficult for me to choose what project to begin working on. Not only do I want to pick a project that I think is worth contributing to, but it also needs to be something where I can find an issue that is small enough for me to jump into relatively quickly. I’m not against dabbling in languages I don’t know very well, but I would prefer to do it with good reason.

After some searching, and a lot of being picky, I decided to fix the bug I found in filer for our 0.1 release. I already had some familiarity with the method that had the bug, and I was happy to continue in the same programming language for a bit longer.

Starting with where I left off, I had written two tests for appendFile, which is designed to function in a way that matches appendFile for fs. One test passed when passing { encoding: ‘utf8’, flag: ‘a’ } as the options parameter, but the other failed when passing only { flag:’a’ }. so I carefully looked through appendFile in implementation.js. What I found is that the options parameter is first checked by a function called validate_file_options. But the way it validates it is only to check the type. If it is null, it sets the options to default values. If it is a function, it assumes it was left empty (and sets default values). If it a string, it assumes it is the encoding, and sets a default value for the flag.

But if it is an object, it leaves it alone and passes it along. This is why the test I wrote turned out the way they did. If an object is passed, but that object does not have an encoding specified, then its later checks on the encoding fails since there is no such attribute.

I resolved this by altering the check to account for this, and it ended up passing all tests, including the new ones. After a lot of fiddling with git, I submitted my first Hacktoberfest pull request.

After this I will go back to searching for a new project to work on. I hope to look up various projects that I have used and see if there are any issues appropriate for me.

Software Profiling Investigation (SPO600)

Today’s investigation is into the world of software profiling, with the open source profiling tool gprof. Gprof is a tool for unix systems that analyzes the performance of a another program. It does this by inserting additional code when compiling that other program, so that it can provide a great level of detail in its analysis.

I chose to run this test gprof with one of my programs made for lab 5 of SPO600. The program scales fake “sound samples” (random integers in this case).  I have prepared the program to run extra samples, for a longer run time and better profiling.  With 1,000,000,000 samples, it takes about 53 seconds to run. After compiling with -g -pg -O3 options on the gcc compiler. After compiling, once we run the program we find an extra file in the directory called gmon.out.

Running gprof on our executable, we gave me this:

picgprof1

Unfortunately, there isn’t any content in the profile. Some searching led me to recompile it differently to avoid a bug in gcc. Now get one new line to show up.

picprof2

The bulk of this is an explanation of what each stat means.

The name is main, the name of the function being profiled in this line. It seems that you can look at each function individually and its impact. Since there is only one function defined in this program, it takes up 100% of the time. And since no functions are being called inside of main, self seconds and cumulative seconds are the same. The other categories (involving the function being called) are left empty, perhaps because main is a special function, and doesn’t get profiled in this way.

Algorithm Selection (SP0600 Lab5)

In this lab we take a short C program that scales a set of random int16_t values representing the amplitude of sound waves for audio encoding, and multiplying them by 0.75 (lowering the volume). This type of operation occurs often when using audio since the computer must do these calculations in the background.

We begin with the naive algorithm (taking each value and performing a multiplication operation using floating-point arithmetic 0.75. Performing this over 10 000 000 samples on the aarch64 server supplied by our school gives gave our entire program a runtime of 514ms. Subtracting the setup and teardown, as well as unrelated code (which took 496ms to run), we get 18ms of runtime for the operations we care about.

 

Lets look at two alternative algorithms to see if we can speed things up:

 

Alternative Algorithm 1: Pre-scaled array

Since we are performing the same operation many times, and since we are likely to get repeat inputs and outputs, perhaps it would save time to do each possible calculation once (about 65,000 times, for all possible values of an int16_t). This way instead of doing a new calculation, we can just look up the output value that we already know tot be correct.

Unfortnately, this did not result in a speedup. It actually slowed things down to a total of 42ms (vs the original 18). I was expecting it to be faster, and I think it would have, except for the problem of our int16_t’s being unsigned. The subtracting and adding of the negative portions (since you cannot use a negative index in an array) may have slowed down my implementation here. This method also requires quite a bit of memory overhead (an array storing around 65000 16-bit integers). I tested this with 1 billion samples after 10 million seemed to be slower, and found that it was still slower. The payoff didn’t seem become better in the long game either.

 

Alternative Algorithm 2: Fixed-point arithmetic

Instead of using floating point arithmetic to multiple by 0.75, we can use regular integer multiplication and bit-shifting to get the same results. Here, my algorithm multiplied input sample values by 6, and shifted the result to the right by 3 bits (truncating the last 3 digits, thus dividing by 2 three times). The end result is that we get an integer value that is 3/4 of the original.

Now, this algorithm has given us a significant speedup (at least, around the specific operations being done) and is the best of the three. 13ms compared to the naive algorithm’s 18ms puts us at 28% faster with the same memory overhead (or lack of it, compared with the pre-calculated array).

 

 

Are they fast enough to process CD data? – At 44100 samples per second, and 2 channels, I think each algorithm is sufficient. If it can process 10 million samples in a few milliseconds, it should easily be able to keep up with a CD. I will note however, that we should use the best algorithm regardless, since it won’t be the only program running on the machine.

 

Further Optimizations

As stated before, I think treating the values as unsigned integers may prove to speed things up, especially in the pre-calculated array algorithm. The addition and subtraction waste processing of a repetitive and useless task that could be avoided.

Getting Familiar with Open Source (OSD600 0.1 Release)

Introduction

As part of the Topics in Open Source Development course I am taking, we must make contributions to real open source projects. We are free to choose what kind of projects we would like to work on, and the type of contributions we would like to make. The first release however, is involves contributing to node package called filer as a chance to get our feet wet with git, github, and open source development in general. This is the journey I took in my small contribution to filer.

A rough outline of the process I took:

  1. Filing an Issue
  2. Setting up a Local Repository
  3. Running Existing Tests Locally
  4. Working on the Issue
  5. Making a Pull Request
  6. Responding to feedback
  7. Getting involved in the Community
  8. Reflecting on the Process

 

Some Background

Filer is a package available in node for a file system that mimics the built-in file system fs in nodejs. However,  fs only functions in a command-line environment, but does not work when run in a browser. Filer is intended to run on both.

 

Filing an Issue

The first thing to do was to pick an issue to file. Namely, something worth adding to or changing in the project. I opted to make some unit tests for fs.appendFile() since my Lab1 post involved analyzing how the method works. After surveying the existing tests, and the related issues already filed, I noticed that none of the tests involved manually setting file system flags such as ‘r’ (read), ‘w’ (write), ‘a’ (append). Perhaps this is because there probably isn’t really any reason to ever use a flag other the default. Nevertheless, it is worth checking that the actual implementation matches the documentation.

I filed Issue #463, requested permission to work on the change and got to work.

 

Setting up a Local Repository

I’ve had a github account for over a year, and yet my only repositories were the Hello World repository and a newly created repository for my Capstone Project. I had always been meaning to get more comfortable using both git and github, but without a project to work on it always got postponed. Fortunately, I now have a a good reason to use and master it.

I learned that I should fork a repository first, then clone my own fork. After fiddling with git commands a bit, I had a local version of filer. After stumbling a bit, I realized that even getting everything up and running is a bit of a min-challenge itself. I had to make sure all the packages were installed, since they don’t come automatically (npm install) and learn how to run the tests (npm run test).

 

Running Existing Tests Locally

This lead to the first big hurdle. Because of the difference in line endings between unix and windows machines, I kept getting the following errors on every single test being run (1000+):

Expected linebreaks to be 'LF' but found 'CRLF'

However, I had set up git properly for line endings (as far as I knew), and done the same with Visual Studio Code as well. After a lot of troubleshooting (including reading links posted in the slack by people with similar problems), I got… nowhere. Eventually I gave up for the night. A restart of my computer seemed to have solved the problem. My guess is that one of the many things I did in troubleshooting ended up kicking in after the restart. I’m still not happy with my understanding of how line endings work, but at least I am a little less scared by it now. At some point I believe I installed everything to my desktop as well, and was met with the same problems, but in the end I was able to get everything running on my laptop for school as intended.

Then came more issues. The test command ended up slowly going through tests and stopping around 80 (out of over 1000). After some more fiddling, it ended up going up to around 380, but still stopped after that. Taking some cues from “CHROME HEADLESS” that I kept seeing when I ran the tests, I installed Google Chrome. That seemed to help.

Overall, getting everything to work was surprisingly difficult. Furthermore, considering my goal was to create a simple test, the fact that getting the existing package to run took so long was somewhat frustrating. Hopefully as I gain more experience with git, and whatever environment I am using, this step will take less time in the future.

 

Working on the issue

Finally able to begin, I took a look at the other tests available for appendFile() and found a test that was mostly similar to what I needed (it passed { encoding:’utf8′ } as the option parameter instead of { flag : ‘a’ } (for append). But the output should have been the same, since both the ‘utf8’ encoding and ‘a’ flag are default parameters for appendFile. There was very little to change. And yet…

The test failed. How? The theory behind my test was very simple, very straightforward, and very hard to get wrong. I searched high and low to make sense of the error code involving a mysterious copy_t method that caused it to break. Not much luck on that front. I spent a lot of time trying to find what was wrong with my test.

In the end, I don’t think it was wrong. When I took a look through the implementation of appendFile(), I found that the validation of options worked differently for the flag compared to the encoding. It seemed that if no flag was passed, it was taken care of by the following line:

flag = options.flag || 'a';

That meant that for a partially formed options object, you didn’t need to include the flag attribute. But the same wasn’t true for the encoding attribute. This was proven when {flag: ‘a’} caused the operation to fail, while { encoding: ‘utf8’, flag: ‘a’ } did not.

I decided to submit two tests instead of one, to demonstrate this difference (and prove that the bug exists and was what I claimed it to be).

First Pull Request

I took my findings, gathered my tests, and submittied Pull Request #494, unsure if a test that failed could be accepted into the project. Although I was confident that I had actually found a bug in the implementation of appendFile(), the concept was new to me. Would I have to fix the bug before the test got accepted? Would I continue working on filer for the second release, or change to something else? Would anyone end up finding a flaw in my argument? Only time will tell.

 

Peer Feedback

So far, none of the comments on my pull request have invalidated my work or pointed to any revision. I believe this may take more time.

 

Getting Involved in the Community

I reviewed the pull request of two other students. The first was a straightforward test with no errors. The only suggestions I made were with regards to whitespace and grammar.

The second was a bit more complicated. The test did not pass, though I was not 100% sure about why. The intended goal of what was being tested was unclear , though I did find something to be improved nonetheless.

In both cases, the pull requests did not overlap with other issues or existing files.

 

Conclusion

I enjoyed getting involved with open source projects. I’ve waited a long time to get involved with real projects, and I’m glad to have done it. It a good way to learn, and the community is friendly. I’m excited to continue with it in this course, and hopefully beyond.

As far as the work itself, I found that it is hard to predict how long something will take, and where the trouble will come from. Hopefully by planning ahead and managing my schedule I can stay ahead of the problems.